Design a Distributed Message Queue: System Design Mock Interview

System Design
Exponent TeamExponent TeamLast updated

In this example, Neeraj Gupta answers the system design question, "Design a distributed message queue."

👋
Check out more mock interviews like this one in Exponent's complete system design interview course, trusted by 19,000+ software engineers, engineering managers, technical program managers, and other tech professionals to ace their interviews.
Neeraj Gupta, an eBay engineering manager, discusses his solution to the system design interview question, "Design a distributed message queue." 

What are Message Queues?

Message queues are a common and widely used component in distributed systems. They help different parts of a system talk to one another, even if those parts don't operate at the same time.

Messages are stored in a queue until they are used by a part of the system that needs them. This can help make sure that information doesn't get lost.

Message queues are like a temporary storage room for information. They help different parts of a system exchange information with one another.

RabbitMQ is a popular message queue that helps with these information exchanges.

Message queues are part of a distributed system with many servers. These servers are also called brokers. Brokers form a cluster that works to keep the whole system reliable.

Message queues can help different parts of a system work more independently and efficiently.

Message queues can be used in many situations, like:

  • Processing orders for a store,
  • Dealing with money,
  • and keeping information secure.

Some popular message queues include RabbitMQ, Apache Kafka, Apache ActiveMQ, Google Pub/Sub, AWS SNS/SQS, and Azure Queue.

Framework for this Question

Learn more about using a system design interview framework to answer questions like these.

  • Step 1: Understand the problem. The problem is to design a distributed message queue system that can handle many messages while minimizing each operation's failure rate.
  • Step 2: Design the system. The system should have features such as topic-based queues, a pull model for consumers, and a message structure including a topic, payload, and key for partitioning purposes. The message queue itself should be highly scalable and able to handle abrupt spikes in traffic. Storage solutions such as SQL, NoSQL, or Write Ahead Log (WAL) can be utilized. Fault tolerance and successful writing of messages should be ensured through approaches such as acknowledgments and replicating to followers.
  • Step 3: Explore the design. Interesting components such as metadata and state storage can be discussed in more detail.
  • Step 4: Improve the design. Issues such as a single file bloating up can be fixed by segmenting the file into multiple segments and splitting based on buyer ID. Back-end jobs can also be created to improve the system's efficiency.
  • Step 5: Wrap up. Make sure that the design meets all the requirements. Provide suggestions for improvement. You can improve the design by following best practices, like implementing a leader-follower approach, using metadata and state storage, and designing storage solutions for a read-and-write-heavy system.

Understanding Queue Functionality

A queue's primary function is to insert and remove messages, also known as producing and consuming messages.

However, it's essential to consider the system's scalability and other non-functional requirements.

Some types of queues include:

  • topic-based queues,
  • fan-out-based broadcasting,
  • and direct way.

For scalability, it's best to consider the pull model for consumers, where they pull messages from the queue instead of producers pushing.

To improve scalability, adding more servers or batching operations can be helpful.

The message structure should include a topic, payload, and key for partitioning purposes.

The message queue itself should be highly scalable and able to handle abrupt spikes in traffic.

Topic-based queues provide the flexibility of having up to 10k topics with an estimated 10 million messages daily, requiring 800 GB of storage per day with a 30-day retention period.

Storage Solutions: SQL, NoSQL, or Write Ahead Log (WAL)

Storage is a crucial component in designing a message queue system. You can use SQL, NoSQL, or Write Ahead Log (WAL) when dealing with a large volume of messages and a read-and-write-heavy system.

The Write Ahead Log approach is advised because it is an append-only log system.

Each message is added to the end of the file. However, appending to a single file can cause the file to become too large.

To handle this situation, divide the file into multiple segments and split it based on buyer ID.

Each segment can be stored on different servers to support scalability.

Metadata storage and state storage are also necessary.

Metadata storage contains configuration information and state storage contains information about where each consumer last read.

Fault Tolerance and Successful Writing of Messages

Fault tolerance is the ability of a system to continue operating even in the presence of a system fault.

A leader-follower approach is suggested with coordinated service and a zookeeper to store and interact with leaders and followers.

Different approaches to successfully writing messages, including acknowledgments and replicating to followers, are also possible.

Implementing such an approach can help ensure fault tolerance and the successful writing of messages.

Learn everything you need to ace your system design interviews.

Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.

Create your free account