Understanding challenges and best practices when benchmarking cloud native storage solutions.
It’s natural to want to perform benchmarking when evaluating different cloud native, software-defined storage options. It’s also natural to ask a vendor about performance and to hear “it depends.” It can be incredibly challenging to get meaningful information from any benchmarking process, due to the sheer complexity and variability of the systems whose performance we are trying to measure.
There are two main types of benchmarking – synthetic benchmarks and application benchmarks.
Synthetic benchmarking involves using purpose built tools such as fio to produce IO load and stress the system in a predictable, measurable way. While not as representative as application benchmarks, synthetic benchmarks do give us the ability to stress specific parts of a system in repeatable ways, and tweak variables in a more controlled fashion than application benchmarks.
Application benchmarks come in two forms. The first is where some applications have benchmark suites that measure the time taken to perform certain operations. A good example is pgbench for PostgreSQL. Such tools exercise a system holistically, showing us how an application performs with our storage system.
The other way, with even better results, is to run an application using conditions as close to your production workload as possible – for example using production datasets and client access applications.
Running your own application as a benchmark can present some challenges:
- Running a production workload or dataset may require writing tooling to drive the application, or simulate other parts of a production environment
- Sometimes, those responsible for benchmarking do not have the skills or experience to deploy a specific application
- Production data may be sensitive and not accessible for testing
- For some applications, licensing may be a concern
Both synthetic and application benchmarks have value, and the ideal approach is a combination of both. In the end however, as useful as synthetic benchmarks are, the final question to be answered is ‘how does my application perform with this storage system’?
Controlling the variables
Benchmarking requires keeping everything as consistent as possible. The more you can isolate individual variables, the more useful your benchmarking results. Here are some example variables:
- Block size. The amount of data in each IO request varies by application.
- Data compressibility. Systems which compress data will perform differently depending on the compression ratios that are achieved with the data.
- Percentage of read vs write. Different applications have very different mixes of reading and writing data.
- Portion of hot data in the cache. Some applications work primarily with a small piece of a larger dataset, and as such respond well to caching. Others require access to an entire dataset which will render most caching strategies ineffective.
- Number of parallel readers/writers of data.
- Number of nodes in a multi-master system such as Elasticsearch.
All of these factors influence how a storage solution performs. For example, one solution might be the fastest for read intensive MongoDB workloads, but not perform so well for a write-intensive application such as Kafka. This is why it’s important to run workloads as similar to your actual application as possible in your benchmarking tests.
7 Benchmarking Best Practices
Here are some things to keep in mind when running benchmarking tests.
- Incorporate application benchmarking. If you want to run Kafka with Elasticsearch, the only way to get accurate information about how those two applications will work together is to run them and time it.
- Ensure that all the variables are kept constant and only one variable is changed at a time.
- Don’t look at averages, look at percentiles. You need to know not just how fast something is at the 50th percentile (the median), but also out of 100 times, what is the slowest time you record (the 99th percentile)? If an operation is too slow one time out of a hundred, that may not be acceptable for a production scenario.
- A picture tells a thousand words. Where possible, graph time series using tools like Prometheus and Grafana and observe to look for patterns. Are there any repeating highs/lows for latency or throughput? If benchmarking Ondat, consider using our Grafana dashboard to easily visualize data on IOPS and bandwidth used.
- Automate the entire process to ensure repeatable tests.
- Benchmark on volumes that are a significant multiple (2-3x) of available RAM to lower the impact of cache at all levels in the system.
- When running benchmarks in the cloud, benchmarks need to be run multiple times and nodes should be destroyed and recreated so that the underlying machine changes. This should be done to reduce the impact that noisy neighbors might have on benchmark results.
While effective benchmarking is challenging, it is still a key tool to use in comparing performance both between products as well as understanding how individual variables (like block size or compressibility) influence performance.
To help those who want to check out Ondat, we’ve put together a self-evaluation guide that includes advice on benchmarking. It contains recipes for both synthetic benchmarking using fio and a sample application benchmark using Postgres and pgbench.
Check out the self-evaluation guide to read more.