Durability and Journaling

A nuance on durability and how journaling helps

Published on 2024/02/17

Yesterday I briefly shared some thoughts on How I Use ChatGPT and wanted to bring up a recent example. There was an interesting article about Database Fundamentals and I found it to be a good write up. I liked the approach inspired by "Designing Data-Intensive Applications" starting with a naive implementation and going over how its limitations lead to the next iteration.

At MongoDB I don't directly work with the storage engine or any of the database internals but I was still curious to understand in more depth why it was not considered 100% durable by the author. The reason why I was thrown off a bit initially was because I thought journaling was really fast. A fast write guarantees that in the event of an incident your data is not lost and operations can resume without issues. This is a desirable quality in system design to build resilient products. Incidents are more commonly just crashes or even maintenance restarts.

I tried to validate my mental model making sure I wasn't biased by some naive assumption. With the support of ChatGPT I was able to have a conversation-like investigation that felt like a natural way to explore an idea. I was able to confirm that journaling is in fact quite fast, being in binary format and using sequential writes makes it performant (how performant I don't know exactly, could be interesting to explore that). Here's the catch though, once your data is written to journal, its binary nature doesn't make it human readable (but we knew that). The problem is that since it's not human readable it means that it's not consumable.

The fact that data is not consumable is what makes MongoDB not 100% durable. It is a balance between performance and durability and it actually offers configurations to change the default value of 100ms (but again your performance to durability balance would be affected). So that's why, the gap of time in which files are saved to disk but not committed to data files is at the core of the issue, if the system were to crash in that 100ms window the storage engine would leverage the journal and the checkpoints to recover the operations that were not committed. And this gap plays a role in how the 100% durability guarantee is not met with default config.

Thoughts

I had fun digging deeper about things I find interesting and with the habit of double checking anything ChatGPT says, it was one of many great learning experiences. In this specific case I had additional internal resources to confirm or not my intuition. This exercise helped me have a more accurate definition of durability in which it is important to account for when data is actually committed to data files and available for consumption.

← Go Back