Leveldb Vs Redis Benchmark

Warning! Please note that this blog is over ane twelvemonth old, please review the latest on InfluxDB and product comparison

For quite some fourth dimension nosotros've wanted to test the performance of different storage engines for our utilise case with InfluxDB. We started off using LevelDB because it's what we had used on before projects and RocksDB wasn't around all the same. We've finally gotten around to running some basic tests against a few dissimilar engines. Going forward it looks like RocksDB might be the best option for united states of america.

Nevertheless, nosotros haven't had the time to tune whatever settings or refactor things to accept advantage of specific storage engine characteristics. We're open up to suggestions so read on for more detail.

Before we get to results, let's look at the test setup. We used a Digital Body of water droplet with 4GB RAM, 2 Cores, and 60GB of SSD storage.

The next release of InfluxDB has a clearly defined interface for adding unlike storage engines. You lot'll exist able to choose LevelDB, RocksDB, HyperLevelDB, or LMDB. Which one y'all utilise is prepare through the configuration file.

Under the covers LevelDB is a Log Structured Merge Tree while LMDB is a mmap copy on write B+Tree. RocksDB and HyperLevelDB are forks of the LevelDB projection that accept dissimilar optimizations and enhancements.

Our tests used a benchmark tool that isolated the storage engines for testing. The test does the following:

Write N values where the key is 24 bytes (3 ints)
Query N values (range scans through the fundamental space in ascending order and does compares to run into if it should stop)
Delete N/ii values
Run compaction
Query North/two values
Write N/2 values

At various steps we checked what the on disk size of the database was. Nosotros went through multiple runs writing anywhere from ane million to 100 million values. Which implementation came out on pinnacle differed depending on how many values were in the database.

For our use case we want to test on databases that have more values rather than less so nosotros'll focus on the results for the biggest run. We're also not benchmarking put operations on keys that already exist. It's either inserts or deletes, which is virtually ever the utilize case with fourth dimension serial data.

The keys consist of iii unsigned integers that are converted into big endian bytes. The first is an id that would normally stand for a fourth dimension series column id, the 2d is a time postage stamp, and the third is a sequence number. The benchmark simulates values written into a number of different ids (the first 8 bytes) and increasing time stamps and sequence numbers. This is a common load pattern for InfluxDB. Single points written to many serial or columns at a time.

Writes during the test happen in batches of ane,000 key/value pairs. Each key/value pair is a unlike serial cavalcade id upwardly to the number of series to write in the exam. The value is a serialized protobuf object. Specifically, information technology's a FieldValue with an int64 set up.

Here are the results of a run on 100 million values spread out over 500k columns:

ode> method of each of the storage engines.

A few interesting things come out of these results. LevelDB is the winner on deejay space utilization, RocksDB is the winner on reads and deletes, and HyperLevelDB is the winner on writes. On smaller runs (30M or less), LMDB came out on peak on most of the metrics except for disk size. This is really what we'd expect for B-copse: they're faster the fewer keys you have in them.

I've marked the LMDB compaction fourth dimension every bit a loser in red because information technology'south a no-op and deletes don't actually reclaim disk infinite. On a normal database where you're continually writing data, this is ok considering the quondam pages get used up. Still, it ways that the DB volition ONLY increase in size. For InfluxDB this is a problem considering we create a separate database per time range, which we call a shard. This means that after a time range has passed, information technology probably won't be getting whatsoever more than writes. If nosotros do a delete, nosotros need some grade of compaction to reclaim the deejay space.

On deejay space utilization, it's no surprise that the Level variants came out on superlative. They shrink the information in blocks while LMDB doesn't use compression.

Overall it looks like RocksDB might exist the best selection for our use case. Withal, there are lies, damn lies, and benchmarks. Things can alter drastically based on hardware configuration and settings on the storage engines.

Nosotros tested on SSD because that's where things are going (if not already there). Rocks won't perform also on spinning disks, but it's not the master target hardware for us. You could besides potentially create a configuration with smaller shards and use LMDB for screaming fast performance.

Here'due south a gist of more of the results from unlike benchmark runs.

We're open to updating settings, benchmarks, or adding new storage engines. In the concurrently we'll keep iterating and try to get to the best possible performance for the use instance of time series data.

Side by side Steps: