When Supermetrics transfers data into Google BigQuery, it creates sharded tables (one table for each day in your dataset). These tables help to increase query speed, prevent data duplication, and offer more flexible options for schema management.
- Sharding stops a day’s data from being duplicated, as each day in your query has its own table.
- All tables in a set don’t need to have the same schema, so you can easily add and remove columns from your schema while preserving the original data structure.
Note that you can’t query sharded tables using a wildcard if you have first removed a column and then added a different column to replace it.
- Looker Studio (formerly Data Studio) connects to sharded tables in BigQuery natively. This reduces BigQuery’s costs and increases the speed of queries made to it.
Find your tables
A set of sharded tables will look like this in your dataset:
Each sharded table contains 365 different tables, each of which contains a single day of data. Listed out individually, they look like this:
Sharded tables vs. partitioned tables
Partitioned tables in BigQuery accomplish similar goals to sharded tables. They ensure that your data is stored in a way that allows it to be queried as efficiently as possible.
For most datasets, sharded and partitioned tables will perform as well as each other. For very large datasets, however, partitioned tables offer these advantages:
- Unlike sharded tables, partitioned tables have only one set of metadata per set. This makes them smaller and more efficient than sharded tables.
- Partitioned tables support clustering, which allows you to designate specific fields that BigQuery’s query optimizer can use to reduce the number of partitions scanned by each query.