Can you PIVOT in ClickHouse?
Introduction
ClickHouse has no pivot operator, but we can achieve similar behavior using aggregate function combinators and, in particular, the ones with the -Map
suffix.
In this article, we're going to learn how to do that. There is also a video covering the same material, which you can see below:
Understanding aggregate function combinators
Let's start with a simple example. We're going to use clickhouse-local, which you can launch by running the following:
The following query calls the sumMap
function, which takes in a map and sums the values of each key:
This isn't a particularly interesting example as it returns the same map that we passed in.
Let's now call sumMap
over multiple rows of maps;
The key ClickHouse
appeared on both rows and its values have been summed. The key ClickBench
was only present on one line, so it sums a single value, which returns that value!
We can also use maxMap
to find the maximum values per key:
Or we can use avgMap
to find the average value per key:
Hopefully that's given you an idea of how these function combinators work.
Real-World Application: UK housing prices dataset
Now we're going to use them on a bigger dataset in the ClickHouse SQL playground.
We can connect to the playground using clickhouse-client:
We're going to query the uk_price_paid
table, so let's explore the data in that table:
We can see above that the table contains various fields related to property sales in the UK.
Grouping and aggregating by decade
Let's work out the median prices grouped by county for each decade in the dataset:
Filtering results
We can filter the results to only include data from 2010 and on:
Combining multiple aggregations
And if we want to find the maximum price per decade we can do that using the maxMap
function that we saw earlier:
Applying functions to map values
Alternatively, we can compute the average price using avgMap
.
Those values have a lot of decimal points, which we can clean up by using the mapApply
function to call the floor
function on each value in the map:
Flexible grouping: counties, districts, and postcodes
Let's try to group by some different fields. This time we're going to compute the median price per decade grouped by county and district:
We could also choose to group by year and then concatenate postcode1
and postcode2
in the map: