This is a follow up to Hrishi’s “Eventually things will be consistent”. Having worked with Hrishi for close to six years during his tenure at Yieldstreet, it shouldn’t be a surprise that I really like his point that software should be designed for the people that ultimately will benefit from it. If you haven’t done that yet, go read that post now.
Hrishi writes:
Software is a tool, a means to provide a service to some user or stakeholder. It is not the end.
Yes, and distributed microservices come at a cost. You need to understand what the costs are, and most importantly, how that affects the user experience. Your job as a system designer is figuring out the balance — what do you need, what’s nice to have, and what are you willing to sacrifice for it. There’s always a trade-off.
Every modern system is distributed; there’s no such thing as a self-contained system anymore. Even a trivial service exposed to the web today will connect to a remote database, run behind a reverse proxy, etc. And yet, there are degrees of distribution. Dealing with a web server and a database is one thing, dealing with dozens of microservices, each with its event store, relational projections, exchanging messages through log-replicated distributed brokers is completely different. The latter sounds cool, but the former is much simpler to reason, especially when things don’t work as expected.
Be humble, and keep it simple
Be humble. You’re not Google, you’re not Amazon. Don’t make your system any more complex than it absolutely needs to be. In fact, be proud of solving a problem in the simplest way possible.
Simple isn’t the same as easy, though — most of the time, it’s not. Finding the simplest solution to a problem is hard work, and don’t expect to get it right all the time. Do try to keep it as a goal though, and keep looking for ways to simplify your system.
One Database To Rule Them All
Don’t reach for tools you don’t need yet, and if anything, err on the side of waiting too long. In particular, really, really try to keep a single data store, preferably a relational database, preferably PostgreSQL or one of its modern children like Neon or Timescale, for as long as you can.
If you’re having trouble scaling read operations, reach for replicas and caches. It’s only when write operations are your bottleneck that you might need to look into a different data stores, and even then, try to postpone that as much as possible. PostgreSQL can scale a lot when used properly. If you think you’re reaching the limits of how far you can scale, I guarantee you that you can grow at least another order of magnitude by optimizing how you use it. The time you spend on these optimizations is going to be much, much less than the time you’d spend dealing with consistency issues introduced by having multiple canonical data stores on the long run.
Why do it? Why should you go to great lengths to keep all the data in one place, when using different data stores could make things less coupled and more elegant? Because of the end user. The moment in introduce multiple data stores you pretty much give up maintaining a perfectly consistent view of your data. You’ll have to compromise, and that’s where things like eventual consistency comes in. You know what’s the best way to deal with eventual consistency? Not having to deal with it.
Using different stores for different purposes is fine, of course. You can have your main relational database and use CDC or event streaming to get that data over to a warehouse or analytics database, or to power third-party integrations. This is perfectly okay, as long as you keep your end user experience consistent: if they submit a purchase request, they should see that purchase in their request history immediately, even if the application that ultimately processes purchases isn’t the same that displays the order history. Don’t gaslight your users.
But Marcus, you’ll say, it only takes a few milliseconds for events to propagate, the user won’t notice the delay. Maybe, if things are working perfectly. Until they don’t — until you have to deal with heavy processor lag, or network outages, or bugs that make that lag between posting the event and the record appearing in the order history go from milliseconds to minutes if not hours.
Separate intent from outcome
Mixing intent and outcome is frequently what creates bottleneck — user submits a purchase request and you need to record the order details, reduce inventory, dispatch payments, emails to warehouse, etc. If you do all that in a single request of course that operation won't scale. That's where the usual wisdom is to reach for solutions like a message queue and eventual consistency — submitting a purchase request now just posts a message to the queue, and the message will be processed eventually. What we need to realize is that queues are one specific technical solution to a scalability problem, and they're only a small part of the overall solution. To keep the user experience consistent and scalable, make sure you separate intent from outcome; when the user submits a purchase request, capture and persist the intent consistently, so you can report that intent back to the user. Then you can worry about processing that intent and producing an outcome, and that's where message queues come in handy; that outcome can happen eventually.
Note that this isn't just a technical solution, but part of product design. Capturing intent before outcome implies delaying checks that can't be efficiently performed, and affects user experience: for example, a purchase request might have to be canceled because you received two concurrent purchases for your last inventory item. Decisions like this have to be discussed with product managers, designers, and other relevant stakeholders.
Clients are part of the system
You can’t design a distributed, eventually consistent system without making the edge-most application a part of the whole. The application that drives the user experience — the front-end web UI, or the mobile application, must be an integral, active part of the solution, including having its own local state, and not just a dumb client.
Using the Buy Foo scenario as an example, a flow that would preserve a good user experience would look like this:
- User clicks the “make purchase” button;
- A purchase request containing all the required data is generated and saved to the client local storage. The request must have a unique id generated by the client;
- After the request is saved to persistent local storage, submit it to the server API;
- From here on there are multiple options: the submit API itself might be a long poll that completes when the purchase request is successfully processed by the server, or it might return an simple acknowledgement, with a different API or WebSocket used for status updates.
- APIs must be idempotent, so if the client re-submits a request with the same id, the server should not create a second purchase.
The most important property is this: the intent to submit a request is captured and persisted to local storage before any remote calls happen; the client knows for sure that the user submitted the purchase request. If the app crashes, it can load that pending request from storage; if the API call fails, it can resubmit the purchase request, and so on. The user will never be left to guess what happened to their purchase request, and the client application can always show some information about it, even if it’s “pending”.
Modern frontends are complex applications
One important lesson to realize from the pattern above is that modern frontend applications aren’t just about rendering a user interface. They’re full-blown applications, participants in a distributed system, and that application needs its own domain model, database (what’s the Web Storage API if not a Key-Value database), etc. The idea that your frontend is only the View in an MVC architecture is long, long dead.
I’m at the point where I’m starting to loath the term “frontend”, or at least the charged meaning that frontend development is easier or less important than developing backend applications. If anything, designing and developing the edge-most part of your system, the part that interacts with most valuable people — your customers — is more important than anything else.
I wanted to write more about this backend and frontend separation, and about applications versus systems, but I have feelings about this topic, and I’d easily write a full length post about it. So, until then.