Time Series Databases

I recently looked into current TSDB solutions for keeping historical status and performance data. InfluxDB looked promising initially but I ended up using TimescaleDB though I’m still not certain it’s the right solution for me.

In my current project, I have a traditional SQL database storing an inventory of numerous related objects. It’s behind a REST API implemented using the Symfony PHP framework which works great with the Doctrine ORM.

My initial approach to storing the historical state and performance metrics for the items in inventory was to add a couple new entities to represent them. As you’d expect, no problems in dev and test but as things scaled up and those tables got larger, performance suffered. I knew it would happend but I’m in the “don’t optimize early” camp let it go initially. Well, it’s no longer early.

InfluxDB

As I mentioned, the first TSDB I looked at was InfluxDB. I rewrote the API services for status and measurements to keep them separate from the SQL database in Influx. Right from the start, the obvious lack of integrity constraints between the two databases bothered me. Keeping things alligned without transactions to isolate things would be an issue. That said, inserts were blazingly fast.

The show-stopper with using InfluxDB for my system came with I needed the API to expose the combined status of a parent inventory item that contains multiple descendant items. Handling these requests required getting the item lists from the SQL database, using InfluxDB’s WHERE IN (...) equivalent and performing the aggregation logic in memory to produce the response. It was all very slow and brittle so I started looking around for another solution.

TimescaleDB

Since I’m already using PostgreSQL for the database, it made sense to look at TimescaleDB since it works as an extension instead of a standalone instance. I was able get things working and make the queries work efficiently without much trouble. I manually tweaked the Doctrine migration to add the create_hypertable() calls so the schema is automatically built correctly.

There were a few gotchas along the way. First, the timestamp property has to be included in any unique keys on the table. This should not have been a problem except that my $id property had @ORM\Column(type="guid", unique=true), the unique=true was leftover from back when we used integers as primary keys and kept separate UUIDs. It resulted in a separate unique key that was redundant with the primary key. Oops… Just removed the unique=true bit and added @ORM\Id to the timestamp property and we’re good.

Second, TimescaleDB doesn’t support foreign key constraints into hypertables. My data model keeps many-to-many associations between some inventory items and the status samples that are combined to come up with an aggregate status. I’m unable to let the database ensure the integrity of these associations on itself. I ended up with entity listeners that handle logic that would normally be done for me with cascade={"persist","remove"} settings. The result is still safe since it’s all done within a transaction but it took more effort to code and test that I’d expected.

My last concern is with production deployments of TimescaleDB. Local staging and testing enviroments can use a Docker image to fire up the database without much trouble at all. However, we’ll not be able to take advantage of AWS’ fully managed Aurora databases which are simpler and cheaper than running our own instances. I guess when ready to go live, reviews of TimescaleDB’s Cloud and Forge offerings will be needed.

TSDB