Event-Driven Architecture - Broker Topology

Familiarizing with common event-driven patterns

Published on 2024/07/16

I got distracted by so many book clubs that I lost track of my review of software architecture. It's no big surprise that event-driven systems are omnipresent at scale. The need to handle highly scalable and high-performance applications comes as a need as your product grows. In Atlas App Services we've seen that with Triggers and their growth, we're now seeing that with Atlas Streams as well, with just as much (if not more) interest.

An event-based model is reactive. Your system will behave differently based on the event it needs to react to. This is different from the more common request/response model where the behavior is deterministic. The event-driven model is fairly intuitive, I can use Atlas Triggers as an example. You can decide to react to specific MongoDB events (e.g. insert, update), and run a function based on that. The sky is the limit at that point as your function logic can go from sending a text notification for every new user sign-up to writing some metadata to a different collection. Given the user case, I'm more curious to explore (lightly) how the brokerage topology can help design reactive systems. This is something I haven't worked with before and thought I could use a refresher.

The brokerage model is highly scalable and responsive. It comes at the price of error handling. You would use a brokerage system when you don't need event orchestration (this is something a mediator topology would do). The mediator helps "move things along" by being in charge of how processing events are moved through the topology. For example, if you have steps 1 to 4, the mediator might expect an acknowledgment of successful processing before generating a processing event for the next step.

In a brokerage model, we have an initiating event that starts the workflow (e.g. a user signing up). In this case, the event broker handles different event channels (e.g. user sign-up channel). An initiating event would be sent to an event channel for processing. Event processors that are subscribed to this channel would then complete a task (processing) for any event sent to that channel (e.g. every time a user signs up). Once they're done, they send a processing event to the broker so that any other event processors interested in that event can execute their tasks. This can repeat multiple times until there's no event processor subscribed to a processing event. If you find this confusing that's ok, a visual would help with this (sorry!). Think of it as a group of task executors that only initiate when an event they're interested in comes in. Each of these executors will announce when they are done with their processing, this allows others to pick up that event. This design allows for simple extensibility as any other processor can be added to listen to any channel.

The problem? Error handling. If any of these event processors fails, there's no replay and no other processor is made aware of it. The mediator topology helps address these issue but sacrifices some of the scalability and responsiveness we mentioned earlier. There are ways to address error-handling in broker topologies but they add a layer of complexity.

Thought

This can be confusing without visual cues but hopefully, this is enough to get you interested in exploring more about event-driven systems. I'll follow up with other insights and limitations and I feel like it can be a good exercise to build a small broker-like event-driven system on your own. Don't worry about scalability, throughput and all that jazz. A simple implementation can help clarify how it all comes together. I've mostly worked with systems that need reliable processing of events, where failures need to be addressed and events retried. As usual, your choice of a broker topology over a mediator one depends on the system requirements.

Final disclaimer, I only gave a superficial overview of a broker topology. There are several other challenges (e.g. bottlenecks, single point of failure) that need addressing to design a robust system. I might touch on some of those in the future!

← Go Back