TPC-DS (2012)
Similar to the Star Schema Benchmark (SSB), TPC-DS is based on TPC-H, but it took the opposite route, i.e. it expanded the number of joins needed by storing the data in a complex snowflake schema (24 instead of 8 tables). The data distribution is skewed (e.g. normal and Poisson distributions). It includes 99 reporting and ad-hoc queries with random substitutions.
References
- The Making of TPC-DS (Nambiar), 2006
First, checkout the TPC-DS repository and compile the data generator:
Then, generate the data. Parameter -scale
specifies the scale factor.
Then, generate the queries (use the same scale factor):
Now create tables in ClickHouse. You can either use the original table definitions in tools/tpcds.sql or "tuned" table definitions with properly defined primary key indexes and LowCardinality-type column types where it makes sense.
The data can be imported as follows:
Then run the generated queries.
TPC-DS makes heavy use of correlated subqueries which are at the time of writing (September 2024) not supported by ClickHouse (issue #6697). As a result, many of above benchmark queries will fail with errors.