As the Google Analytics (Universal Analytics) API is sunsetting on July 1, 2024, you should backfill your data if you wish to still use it. This article shares some of the best practices with backfilling your data in your data warehouse destination, as well as some limitations when working with historical data.
Running a backfill for extensive historical data in Google Analytics can be challenging, especially when dealing with multiple accounts and segments over a long period of time. High transfer speeds can quickly hit API limits, which will slow down the process or cause errors. When running a backfill, Supermetrics balances between the transfer speed and adhering to the data source's API quotas.
To ensure a successful and efficient backfill, you can split data into smaller chunks and make sure that you only pull the data you need.
Instructions
1. Deselect all segments
Unless you have a specific segment you want to transfer, deselect all segments to ensure all data is transferred in the most efficient way.
2. Split the backfill process
Fetching data for an extended period, such as 10 years, for multiple accounts and segments can overwhelm the system data source API. Instead, break down the backfill process into manageable chunks.
Fetch data in short increments
Instead of attempting to retrieve a decade's worth of data in one go, request data year by year or even month by month, depending on how many views you're pulling data from.
If you are facing issues with the volume of the data, we recommend you start by fetching data for example for one month only, starting from the oldest month you want to backup. If that works, fetch the next months 2 months at a time. After each successful backfill, double the date range and work your way up to a date range that works with your data. This process helps you find the sweet spot for your data.
For example:
- Run a backfill for the year 2020.
- Once completed, run a backfill for the year 2021.
- Continue this process until all desired years are covered.
Split Google Analytics views or segments into separate pulls
If you're fetching data from multiple Google Analytics views or segments (such as New users or Retained users), process each view or segment separately. This approach reduces the complexity of each request and helps avoid API limitations.
For example, first run a backfill for Account A and then run a backfill for Account B.
Note that you should push the data to separate tables in your data warehouse to avoid overwriting the data.
3. Reduce data dimensionality
Minimizing the amount of data you request can significantly improve the backfill process. You can select what data to use by using Custom table groups.
Drop unnecessary user segments
Simplify your data request by excluding segments that are not crucial for your analysis.
For example, if you are mostly interested in overall traffic, splitting the data up by user segments like New or Recurring users will unnecessarily double the number of requests Supermetrics makes to Google Analytics.
Drop unnecessary dimensions
Evaluate which dimensions are essential for your analysis. Removing non-critical dimensions can streamline the data retrieval process.
For example, if you're not going to use operating system information, excluding that dimension from your table may reduce the data volumes a lot.
Following these guidelines will help you successfully complete your backfill and obtain the historical data you need for your analysis. If you encounter any issues or need further assistance, please reach out to our support team.
Limitations
- Google Analytics limits and quotas on API requests: Google Analytics API has different layers of quota limits, including hourly and daily ones.
- Google data retention policy
- Field limit on dimensions: In data warehouse transfers, it allows fetching 10 dimensions together at maximum when Date field is presented and a single date range data is requested.
- Certain combinations of fields can't be backfilled beyond the data retention period, and users need to remove them to pull very high-level "older" historical data.