A nuanced way to model time in relational databases

Discover how to model temporal data the right way, using postgres.

Naveen Negi
11 min readJan 30, 2024

This comment from Hackernews sums it up well: Working with temporal data is a double-edged sword. It opens up incredible opportunities but can easily lead to complexity if not handled carefully

As we are working on complex trading systems it is frequently necessary to correct past record and then be able to make consistent reports based on “as of” date. This means answering questions like “I want to get the state of this population of trades as it was yesterday at 5 pm but based on our knowledge yesterday at 10 pm.”

Valid time and transaction time

Valid-time refers to the period during which a piece of information is true in the real world. It denotes the actual time span that an event occurred or a fact was in effect.For example, a parking tariff might be set at $2 per hour from January 1, 2024, to March 31, 2024.

Transaction-time, on the other hand, captures when the data was stored in the database. It indicates the time period during which a database transaction has recorded a particular state of information. Extending the above example, Suppose parking tariff was set to $2 per hour and recorded in the database on December 20, 2023. This date marks the beginning of the transaction-time for this tariff, indicating when the database first logged this rate. Later, on February 15, 2024, the database was updated with a new tariff of $3 per hour, set to start from April 1, 2024. Consequently, the transaction-time “to” date for the $2 tariff is February 15, 2024, which shows when this rate was last recognized in the database before the update.

Valid-Time for $2 Tariff:
January 1, 2024 --------------------------------- March 31, 2024
|<------------------ $2 per hour ---------------->|


Transaction-Time for $2 Tariff Record:
December 20, 2023 ------------------------------- February 15, 2024
|<-- Recorded in DB ------------------ Updated in DB -->|

If you have ever wondered the difference between temporal tables and event sourcing, valid time and transaction time is the key difference. Event sourcing is primary focused on Transaction time (among other things, like capturing intend via domain events), however temporal data modelling is more focused on Valid time.

Valid time can be updated by transaction time cannot. Valid time reflect our belief about the history, and those beliefs can change; transaction times, by contract, reflect history as such, and history can not change.
–Temporal data and the relational model

In this post, we will only focus on valid-time.

Why “To” and “From” columns are wrong way to model temporal Validity

A better way to model temporal data is use interval/Period type. Here are the reasons why:

  1. Atomicity: A interval/range type stores the start and end of the range as a single atomic value. This ensures that the range is always consistent, preventing scenarios where only one part of the range (start or end) is updated, leading to potential data inconsistencies. If you are into functional programming, Interval are actually Monoids.
  2. Without the concept of intervals, accurately defining “closed” and “open” intervals becomes challenging, significantly limiting data modeling capabilities. Additionally, it raises questions about whether the “to” and “from” columns include or exclude the specified times.
  3. Determining whether the “to” or “from” column should be part of the primary key is another consideration. Utilizing a daterange type circumvents the issue of having to arbitrarily decide which of two potential keys should be considered primary.
  4. Imagine if relations r1 and r2 both included separate FROM and TO attributes, each with different names, instead of a unified DURING attribute. Joining r1 and r2 to create r3 would result in r3 having two pairs of FROM-TO attributes. It would then fall on the user, not the system, to correctly align the FROMs and TOs.
  5. Integrity Constraints: PostgreSQL’s range types support constraints like exclusivity, allowing you to easily prevent overlapping ranges without complex checks. This is particularly useful in scenarios like booking systems, where overlapping bookings are not allowed.
  6. Simplified Queries: Queries involving range operations (e.g., checking for overlap, containment, or adjacency) are more straightforward and concise with range types. This can lead to increased readability and maintainability of your SQL code.
  7. Indexing: PostgreSQL offers specialized GiST (Generalized Search Tree) indexes for range types. These indexes can efficiently handle queries that involve range operations, potentially leading to better query performance compared to using separate columns and standard B-tree indexes.
  8. Built-in Functions and Operators: PostgreSQL provides a rich set of functions and operators for working with range types, making it easier to perform complex temporal queries, such as finding gaps, overlaps, or adjusting ranges. This reduces the need for custom logic and complex SQL.
  9. Type Safety: Using a date range type enforces that the data always represents a valid range, with the start being before the end. This is enforced by the type system itself, reducing the risk of invalid data.

In summary, intervals are the fundamental abstraction we need for dealing with temporal data in a error free manner. And interval concept has much more applicability.

Problem 1: Validity of what ?

Let look at the below table. Can you find what is wrong with it ?

┌────┬────────────┬────────────┬──────────────┬────────────┬────────────┐
│ Id │ LocationId │ TariffType │ PricePerHour │ From │ To │
├────┼────────────┼────────────┼──────────────┼────────────┼────────────┤
│ 1 │ 101 │ Standard │ 5.00 │ 2024-01-01 │ 2024-06-30 │
├────┼────────────┼────────────┼──────────────┼────────────┼────────────┤
│ 2 │ 101 │ Standard │ 6.00 │ 2024-04-01 │ 2024-09-30 │
└────┴────────────┴────────────┴──────────────┴────────────┴────────────┘

Well, it is not clear what is “to” and “from” column referring to, it is the valid time for price, for location or for whole entity.

Problem 2: Lost Information

Let’s suppose that “From” and “To” column refers to validity of PricePerHour and you had below data.

┌────┬────────────┬────────────┬──────────────┬────────────┬────────────┐
│ Id │ LocationId │ TariffType │ PricePerHour │ From │ To │
├────┼────────────┼────────────┼──────────────┼────────────┼────────────┤
│ 1 │ 101 │ Standard │ 5.00 │ 2024-01-01 │ 2024-06-30 │
└────┴────────────┴────────────┴──────────────┴────────────┴────────────┘

Also, Suppose that today is 2024–01–30 and that, effective from today, Location of above tariff was changed from 101 to 999 and so we update the row, shown by this one.

┌────┬────────────┬────────────┬──────────────┬────────────┬────────────┐
│ Id │ LocationId │ TariffType │ PricePerHour │ From │ To │
├────┼────────────┼────────────┼──────────────┼────────────┼────────────┤
│ 1 │ 999 │ Standard │ 5.00 │ 2024-01-31 │ 2024-06-30 │
└────┴────────────┴────────────┴──────────────┴────────────┴────────────┘

Now we have lost (among other things) the information that Tariff 1 have been active on Location 101 since 2024-01-01 .

Replacing “To” and “From” column with Interval type

To avoid all of the above issues, we will use PriceSince column instead of To and From columns. It serves two purpose

  1. It uses daterange type from Postgres, which give us a lot of functionality out of the box. It also avoids above mentioned issues with two different columns.
  2. The name PriceSince explicit conveys that the validity refers to validity of PricePerHour.

So, after few more changes, our table schema is as shown below:

  • Id: A unique identifier for each record.
  • LocationId: Identifies the location to which the tariff applies.
  • TariffType: The type of tariff, e.g., standard, off-peak.
  • PricePerHour: The price applied for this tariff type and location.
  • PriceSince: A date range (daterange type in PostgreSQL) indicating the period during which the specified price is valid.

Preventing Redundancy in data

A general approach for avoiding duplicate is to create an index involving multiple fields. In case of temporal data, we might be tempted to create an index involving the daterange column. For example, an index as an combination of TariffType and PriceSince .

+-----+------------+------------+--------------+---------------------------+
| Id | LocationId | TariffType | PricePerHour | PriceSince |
+-----+------------+------------+--------------+---------------------------+
| 001 | L001 | Standard | $10.00 | [2023-01-01, 2023-06-30] |
+-----+------------+------------+--------------+---------------------------+

So, if we try to insert another row with same value as TariffType and PriceSince , it will fail.

However, below row will not fail (since interval are only parially overlapping).

+-----+------------+------------+--------------+---------------------------+
| Id | LocationId | TariffType | PricePerHour | PriceSince |
+-----+------------+------------+--------------+---------------------------+
| 002 | L001 | Standard | $10.00 | [2023-05-01, 2023-12-31] |
+-----+------------+------------+--------------+---------------------------+

As you can see, these two rows display a certain redundancy, because same tariff is stated/recorded twice (for overlapped range).

Timeline: 2023-01-01 ------------------------------------------------------------------------------------------------- 2023-12-31

Row 1: 2023-01-01 ------------------------------- 2023-06-30
Row 2: 2023-05-01 ---------------------------------------- 2023-12-31

Overlap: 2023-05-01 --------------- 2023-06-30

The Adjacency problem

This is a similar scenario as above, but slightly different. Here ranges are not overlapping but they are adjacent to each other.

For example below two rows are same, expect second row start the day after first first row’s validity ends.

+-----+------------+------------+--------------+---------------------------+
| Id | LocationId | TariffType | PricePerHour | PriceSince |
+-----+------------+------------+--------------+---------------------------+
| 001 | L001 | Standard | $10.00 | [2023-01-01, 2023-06-30] |
+-----+------------+------------+--------------+---------------------------+
+-----+------------+------------+--------------+---------------------------+
| Id | LocationId | TariffType | PricePerHour | PriceSince |
+-----+------------+------------+--------------+---------------------------+
| 002 | L001 | Standard | $10.00 | [2023-07-01, 2023-12-31] |
+-----+------------+------------+--------------+---------------------------+

Here these two ranges are not overlapping but are adjacent to each other:

Timeline: 2023-01-01 -------------------------------------------------------------------------------- 2023-12-31

Row 1: 2023-01-01 --------------------------------- 2023-06-30
Row 2: 2023-07-01 ----------------------------------------- 2023-12-31

Solution: PACKING Intervals

Both of the above issues can be solved by merging overlapping intervals in the PriceSince column.

Interestingly postgres provides functions which lets you do this every easily.

The way you can achieve this in postgres is by using a trigger. A database trigger is a set of instructions that automatically runs when data in a table is changed, like adding, updating, or deleting records.

Before we look into the implementation of trigger, let look at two important Operators in Postgres for dealing with Range/Interval type.

  1. && operator is used to check overlapping ranges, for example in below code. Below code is part of a trigger (which I’ll explain soon).
Range1 && ""Range2

2. and -|- is how you can detect if ranges are adjacent to each other

Range1 -|- NEW.Range2

Also, it is important to know that packing a column is allowed only when duration are overlapping and rest of the row is same. Otherwise we don’t know how to merge other columns which doesn’t have same values.

Trigger for Packing the daterange

The goal of PACKING is to ensure that for any given LocationId, TariffType, and PricePerHour, there are no overlapping PriceSince intervals. If new data introduces an overlap, the intervals should be merged to maintain a single, continuous range. This trigger will execute on every INSERT and UPDATE in CurrentTariffs table.


CREATE OR REPLACE FUNCTION pack_price_since()
RETURNS TRIGGER AS $$
Declare rec RECORD;
BEGIN
-- Loop through existing records to find and merge overlapping ranges
FOR rec IN
SELECT * FROM ""CurrentTariffs""
WHERE ""LocationId"" = NEW.""LocationId""
AND (""PriceSince"" && NEW.""PriceSince"" OR ""PriceSince"" -|- NEW.""PriceSince"")
AND ""TariffType"" = NEW.""TariffType""
AND ""PricePerHour"" = NEW.""PricePerHour""
AND ""Id"" != NEW.""Id""
LOOP
-- Merge ranges and delete the old record
NEW.""PriceSince"" := daterange(
LEAST(lower(NEW.""PriceSince""), lower(rec.""PriceSince"")),
GREATEST(upper(NEW.""PriceSince""), upper(rec.""PriceSince""))
);
DELETE FROM ""CurrentTariffs"" WHERE ""Id"" = rec.""Id"";
END LOOP;

-- Insert or update the record with the merged range
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

The FOR loop iterates over existing records in the CurrentTariffs table where the LocationId, TariffType, and PricePerHour match those of the new or updated record (NEW), but the Id is different, and where the PriceSince intervals overlap.

  • Overlapping is checked using && for strict overlap and -|- for adjacency (meaning the intervals touch but do not overlap).
  • For each overlapping record, the NEW.PriceSince interval is expanded to encompass both the new record's interval and the existing record's interval. This is done by setting NEW.PriceSince to the least lower bound and the greatest upper bound of the two intervals.
  • The existing record with the now redundant interval (rec.Id) is then deleted from the table.

Final Step: The function concludes by returning NEW, which represents the modified new or updated record, now with a potentially expanded PriceSince interval that merges any overlaps found.

This approach ensures that the CurrentTariffs table maintains non-overlapping, continuous intervals for pricing, simplifying queries and analysis related to pricing over time.

And this is how you can setup the trigger

DROP TRIGGER IF EXISTS pack_price_since_trigger ON ""CurrentTariffs"";
CREATE TRIGGER pack_price_since_trigger
BEFORE INSERT OR UPDATE ON ""CurrentTariffs""
FOR EACH ROW EXECUTE FUNCTION pack_price_since();

GIST Index: Making it faster

GiST (Generalized Search Tree) indexes significantly improve PostgreSQL’s handling of temporal data, especially for range overlap and adjacency searches. Traditional B-tree indexes, while effective for exact matches, fall short with complex queries like determining time intervals that intersect or are adjacent to a given range.

In practice, a GiST index allows for efficient pruning of the search space in range overlap queries. It quickly identifies relevant time periods, avoiding a full table scan. For adjacency searches, it speeds up the process of finding consecutive events, enhancing query performance.

The strength of GiST indexes lies in their tree structure, which organizes data to optimize query execution. This structure minimizes I/O and CPU demands, particularly beneficial for databases with extensive temporal data, ensuring scalability without sacrificing speed.

So, we need to enable GiST Index on the PriceSince column.

CREATE INDEX ""PriceSince"" ON ""CurrentTariffs"" USING GIST (""PriceSince"")

Preventing Conflict data

This could arise when you have two different values for partially overlapping ranges.

┌────┬────────────┬────────────┬──────────────┬──────────────────────────┐
│ Id │ LocationId │ TariffType │ PricePerHour │ PriceSince │
├────┼────────────┼────────────┼──────────────┼──────────────────────────┤
│ 1 │ 101 │ Standard │ 5.00 │ ['2024-01-01' '2024-06-30'] │
└────┴────────────┴────────────┴──────────────┴──────────────────────────┘
┌────┬────────────┬────────────┬──────────────┬──────────────────────────┐
│ Id │ LocationId │ TariffType │ PricePerHour │ PriceSince │
├────┼────────────┼────────────┼──────────────┼──────────────────────────┤
│ 2 │ 101 │ Standard │ 6.00 │ ['2024-04-01' '2024-09-30'] │
└────┴────────────┴────────────┴──────────────┴──────────────────────────┘
  • The first row (Id 1) specifies a PricePerHour of 5.00 valid from January 1, 2024, to June 30, 2024.
  • The second row (Id 2) specifies a PricePerHour of 6.00 valid from April 1, 2024, to September 30, 2024.

The PriceSince date ranges for these rows overlap between April 1, 2024, and June 30, 2024. During this overlapping period, there's ambiguity about which price applies.

This can be avoided in postgres with help of Exclusion Constraint

Exclusion Constraint

An exclusion constraint in PostgreSQL ensures that if any two rows are compared on the specified columns or expressions using the specified operators, at least one of these comparisons will return false or null.

In this context, for any pair of rows where the LocationId and TariffType are the same, and their PriceSince ranges overlap, the PricePerHour must not be different.

This means that if there are two rows with the same LocationId and TariffType, and their PriceSince ranges overlap, they must have the same PricePerHour. If an attempt is made to insert or update a row such that it would create an overlap with a different PricePerHour, the database would reject the operation due to the violation of the NO_OVERLAPPING_PRIORITIES constraint.

ALTER TABLE "CurrentTariffs" ADD CONSTRAINT "NO_OVERLAPPING_PRIORITIES"
EXCLUDE USING GIST (
"LocationId" WITH =,
"TariffType" WITH =,
"PriceSince" WITH &&,
(CASE WHEN "PricePerHour" IS NOT NULL THEN "PricePerHour" END) WITH <>
);
  • "LocationId" WITH = ensures that the constraint considers rows with the same LocationId.
  • "TariffType" WITH = adds the condition that rows must also have the same TariffType.
  • "PriceSince" WITH && ensures that the constraint checks for overlapping date ranges in the PriceSince column.
  • (CASE WHEN "PricePerHour" IS NOT NULL THEN "PricePerHour" END) WITH <> ensures that the rows being compared must have same PricePerHour value if PriceSince date-range is overlapping (partially or full).

But before we add above constraint, we need to enable btree_gist in PostgresSQL f it's not already enabled.

CREATE EXTENSION IF NOT EXISTS btree_gist;

This extension is necessary because PostgreSQL’s default B-Tree indexes do not support the exclusion constraints on range types or with custom operators. The btree_gist extension provides GiST (Generalized Search Tree) indexing support for B-Tree equivalent behavior, which is required for implementing exclusion constraints with operators like && (overlaps), = (equal), and <> (not equal).

Further:

There is still lot more to do in temopral modelling. For example, we didn’t discuss how to deal with historical data, or data integrity when temporal data lives in multiple tables. Hopefully, we’ll discuss this in future posts.

CODE:

You can find the relevant code here. It contains full setup in a docker compose file (and c# project). For instructions, read the Readme.

--

--