Building a Date Dimension Table in Redshift

4 min readJun 28, 2017

What Is It?

It is common practice in data warehousing and reporting to use date and time dimension tables.

A date dimension table assigns an index to each day from an arbitrary starting point. Where that starting point is depends on how far back you will need to go. Usually picking a date at or slightly before the earliest records in your database is a good choice.

A time dimension table is similar in that it will assign an index to individual seconds for one day (but have no date component). Used in combination the date dimension table these two integer indexes represent an exact second for any date.

Each will have their own separate table that contains the date/time ID and any information about that particular day or second.

Why Is It Needed?

Let’s say we have a table that contains sales information; lot’s of it. How would you go about handling reports that needed to filter or summarise (group) on?

Only business days (Mon-Fri).
All days except the last day of each month.
Specific week numbers of the year.
A fiscal quarter.

Trying to do these date calculations in your queries raises a few problems:

Date calculations are complicated in the easiest cases and making sure you handle all the edge cases is very complicated and error prone.
Once you start performing date calculations it’s very unlikely the database will be able to make optimisations or smart decisions that allow it to use an index. If a full table scan is required and each record needs to be calculated for the date function it can make queries extremely expensive.

I See. How Do I Set This Up?

Although Redshift is advertised as having almost the same features as PostgreSQL there is one big missing feature that make it very difficult to generate a date dimension table:

Only the leader node can perform any useful date calculations. This returns a lot of errors when we do any calculation involving a current timestamp.

Despite the obstacles we can still do it, here is the table definition:

CREATE TABLE date_dimension (
  "date_id"               INTEGER                     NOT NULL PRIMARY KEY,  -- DATE
  "full_date"             DATE                        NOT NULL,
  "au_format_date"        CHAR(10)                    NOT NULL,
  "us_format_date"        CHAR(10)                    NOT NULL,  -- YEAR
  "year_number"           SMALLINT                    NOT NULL,
  "year_week_number"      SMALLINT                    NOT NULL,
  "year_day_number"       SMALLINT                    NOT NULL,
  "au_fiscal_year_number" SMALLINT                    NOT NULL,
  "us_fiscal_year_number" SMALLINT                    NOT NULL,  -- QUARTER
  "qtr_number"            SMALLINT                    NOT NULL,
  "au_fiscal_qtr_number"  SMALLINT                    NOT NULL,
  "us_fiscal_qtr_number"  SMALLINT                    NOT NULL,  -- MONTH
  "month_number"          SMALLINT                    NOT NULL,
  "month_name"            CHAR(9)                     NOT NULL,
  "month_day_number"      SMALLINT                    NOT NULL,  -- WEEK
  "week_day_number"       SMALLINT                    NOT NULL,  -- DAY
  "day_name"              CHAR(9)                     NOT NULL,
  "day_is_weekday"        SMALLINT                    NOT NULL,
  "day_is_last_of_month"  SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id);

And we populate it:

INSERT INTO date_dimension
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,-- DATE
    datum                                                         AS full_date,
    TO_CHAR(datum, 'DD/MM/YYYY') :: CHAR(10)                      AS au_format_date,
    TO_CHAR(datum, 'MM/DD/YYYY') :: CHAR(10)                      AS us_format_date,-- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,
    cast(to_char(datum + INTERVAL '6' MONTH, 'yyyy') AS SMALLINT) AS au_fiscal_year_number,
    cast(to_char(datum + INTERVAL '3' MONTH, 'yyyy') AS SMALLINT) AS us_fiscal_year_number,-- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,
    cast(to_char(datum + INTERVAL '6' MONTH, 'Q') AS SMALLINT)    AS au_fiscal_qtr_number,
    cast(to_char(datum + INTERVAL '3' MONTH, 'Q') AS SMALLINT)    AS us_fiscal_qtr_number,-- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,-- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,-- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for the next ~20 years starting from 2011.
    (
      SELECT
        '2011-01-01' :: DATE + generate_series AS datum,
        generate_series                        AS seq
      FROM generate_series(0, 20 * 365, 1)
    ) DQ
  ORDER BY 1;

Now we can easily answer the previous examples:

Only business days (Mon-Fri): day_is_weekday = 1
All days except the first day of each month: day_is_last_of_month <> 1
Specific week numbers of the year: year_week_number BETWEEN 32 AND 36
A fiscal quarter: fiscal_year_number = 2017 AND fiscal_qtr_number = 3

Here is a complete example:

SELECT SUM(amount)
FROM sales
JOIN date_dimension ON dateid = saledateid
WHERE fulldate BETWEEN '2017-02-01' AND '2017-02-31'
  AND day_is_weekday = 1;

You should use saledateid as a part of your SORTKEY. If you are not using Redshift there should be an index on saledateid.

Originally published at http://elliot.land on June 28, 2017.

Building a Date Dimension Table in Redshift

What Is It?

Why Is It Needed?

I See. How Do I Set This Up?

Written by Elliot Chance