Database Analytics

Choosing the right database — Reflect Blog
October 11, 2022 – 11:36 am
In-Database Analytics - Improve Data Movement Processes, Remove

Analytics 101: Choosing the right database

When you’re new to the analytics ecosystem, it’s easy to get overwhelmed by all of your options for getting up and running. This is especially true when it comes to picking the right foundational technology: the database.

Choosing properly: four rules of the road

When it comes to finding the right database for the job, there are a few good rules of thumb.

Start with the end in mind

Unpredictable queries are almost always the root cause of database performance problems. If no one queried your database, then your performance metrics would be great. To make unpredictable queries predictable you need to take as much on-the-fly computation out of the system as you can.

To do that, you’ve got to anticipate what your users are going to ask for and have it ready (or as close to ready as possible) ahead of time. If you’re consistently able to have those answers ready, then consider a database with semantics to match.

Choose the right data model

Each database is built for its own unique type of workload. Its authors have made intentional trade-offs to make their database good at some things while sacrificing flexibility or performance in other categories.

For example, Riak was built to get large chunks of data in and out very quickly. It isn’t necessary for Riak to understand the semantics of the data in order to do that, so it isn’t optimized to do so. That’s why Riak isn’t a good choice if you need to do aggregations like SUM, MIN, MAX, and so on at query time.

When you start with the end in mind and have a good notion of how your data is going to be accessed, you should be able to pick a database that best matches that pattern.

Remember that disks are fast—and memory is faster

Networks are getting faster every day. Even SAN-based disks like EBS are a significantly wider bottleneck than they used to be. That said, the key to building performant services involving disk access is to read from the disk in a predictable way.

Consider, for example, that you’re using Postgres. If your query pattern calls for aggregating a few hundred rows that are neatly organized

