Boosting Efficiency: Terrarium Team Slashes Disk and RAM Usage by 20%

I was a fresh-faced newcomer to the tech world when I joined Synerise 1.5 years ago, with no commercial experience and just a few small projects done by myself. I vividly remember the immense stress I experienced before that interview, particularly given that it was my very first IT job interview ever :) Luckily for me, my team leader saw potential in me and gave me a chance. It turned out to be the perfect launch for my professional career as a software developer. To my surprise, within just one month, my first production code was approved and merged into the master branch. Synerise thrives on speed, but it also offers tremendous learning opportunities for those who are willing to step up and give their best.

Was it all tough at the beginning?

Of course. Do the thousands of lines of code push me to the brink of questioning my career choice? Absolutely, especially during those initial days. But thanks to the support I got from the team, I was able to develop myself fast and learn new things everyday, which led me to the point where I could make some significant improvements in our database engine - the migration process that I want to explain a bit.

In Synerise, we collect 2 billion events everyday, some of them are stored longer than the other but the general trend of the number of stored events is increasing. While TerrariumDB excels at handling and processing such a massive volume, it's also important to acknowledge that no database possesses infinite storage capabilities. For this reason, we are constantly monitoring disk usage(storing events) and RAM usage(processing events).

As we observed higher RAM usage alongside an increase in stored events, our entire team brainstormed ideas and convened multiple times to find a solution to minimise this effect. Finally, an idea took root: revamp the storage system. But translating this idea into concrete action required the migration of all our data.

Over the past two months, I've had an opportunity to create the Terrarium Migrator app and strongly focus and contribute to the success of the Terrarium team. Our collective mission? To tackle the challenge of optimising disk and RAM usage. We have taken a series of actions that included:

  • Expanding number of types of our own data storing unit which we call chunks - more about them later
  • Changing part of our primary key for data type that is more effectively stored by new chunk types
  • We created the Terrarium Migrator app, which processed all data and applied changes we made in the database engine. This app will come in handy in the future as we adapt and modify it to suit our needs across different storage versions

With the concept of chunks..

..we aim to save as much information as possible using minimal disk space. We can think of them as metadata or simplified compression algorithms. These chunks are an essential part of our unique file format and they are responsible for mapping values from dictionaries to specific positions in columns. This approach allows us to store the same data with a significantly reduced space requirement. To illustrate, instead of storing 10 strings with only two unique values, we can store just 2 strings accompanied by smaller integers indicating their positions in columns. By doing so, we efficiently save disk and RAM space.

Let's take a look at two examples that demonstrate how we can use chunks effectively. On the first diagram, we have a column with more elements than in our dictionary (set of unique values). With the help of chunks, we can map these dictionary values to dChunks and then construct a single scf column using vChunks:

First diagram - more common situation in which dictionary size is smaller than actual count of values

On the second diagram, we have an example of a new type of chunks, in this situation we can take advantage of the fact that values are already sorted, and store only min and max values from vChunks and dChunks:

Second diagram - the size of the dictionary is the same as the count of actual values; here, the new chunk type comes in handy

Migrating data can be quite an endeavour

It's a time-consuming process that requires significant resources. On average, it takes between 7-10 hours and can consume up to 400-500GB of RAM. For each production cluster in our database, we need to simultaneously handle 18 migrations as there are 18 nodes per one cluster in our database. So, in addition to figuring out how to build our migration app, we also had to carefully plan and execute the migration process itself. It was a big challenge that tested both our technical skills and our ability to plan things carefully.

The migration process impacts every piece of data in our storage system. It traverses through each table across all databases, updates their metadata and saves the changes on the disk. The process continues until the last table is updated, and then we replace the old files with the new ones.

The initial tests in our testing environment showed promising results, but we needed to verify them. To do this, we used our Terrarium Comparator tool, which compares two sets of data: one with the migrated data and the other with the old data. We tested the engine thoroughly by trying out analytical and user facing queries as well as inserting new data to both instances of the engine. The tests showed that we have kept all data as it was and also increased query processing times by reducing total time utilisation of queries by around 26 percent.

My responsibilities in the migration process was creation of the Terrarium Migrator app and implementation of new chunk type, which at that time seemed like a big challenge but it turned out to be a very interesting task and considering all my previous assignments, this was the most developing and rewarding. What you must realise when writing that kind of application is that what you are writing could have a huge impact on real datasets that have to remain untouched in terms of data consistency and that is the biggest challenge.

I have learned to write applications from scratch that operate on huge amounts of data, handle all kinds of possible errors and on top of that use multithreading. To be honest, it's pretty rare to find a place where you can grow in so many areas in such a short time.

I can't help but feel a sense of pride when I see the impact my work has had on a big distributed system so early on in my professional journey. It's an amazing feeling, and it's what drives me to keep pushing myself and expanding my skills.

Author: Wiktor Kubski | C++ Developer in Synerise