TPC-DS (2012)

Similar to the Star Schema Benchmark (SSB), TPC-DS is based on TPC-H, but it took the opposite route, i.e. it expanded the number of joins needed by storing the data in a complex snowflake schema (24 instead of 8 tables). The data distribution is skewed (e.g. normal and Poisson distributions). It includes 99 reporting and ad-hoc queries with random substitutions.

References

The Making of TPC-DS (Nambiar), 2006

First, checkout the TPC-DS repository and compile the data generator:

Then, generate the data. Parameter -scale specifies the scale factor.

Then, generate the queries (use the same scale factor):

Now create tables in ClickHouse. You can either use the original table definitions in tools/tpcds.sql or "tuned" table definitions with properly defined primary key indexes and LowCardinality-type column types where it makes sense.

The data can be imported as follows:

Then run the generated queries.

Danger

TPC-DS makes heavy use of correlated subqueries which are at the time of writing (September 2024) not supported by ClickHouse (issue #6697). As a result, many of above benchmark queries will fail with errors.