Data Discrepancies in Google Analytics and Data Studio Extract - This is a problem!
COMMENT: June 18 2020.
The technical reason for the problems described in this blog can be explained by the following:
The various Google stack connectors in Data Studio such as Google Analytics and Search Console act as interfaces to the said underlying system. Hence they can be expected to return the same figures as that system. Those systems perform their own agggregation, cleaning, grouping etc etc before returning results.
Using a data extract means you miss out on all of this "underlying system functionality".
Now - does this mean the Extract Data connector is broken? No. Its not.
Does this mean you can rebuild your very slow Google Analytics report in Data Studio using Extract Data? Unless your report has the simplest layout, ie one scorecard with one number in it eg "Users", which of course excludes every practical dashboard in reality - then no you cant. 😀🤨
This tweet ended up in a bit of a firestorm. Scroll through it to see where I and others got involved. On the back of it I decided to really prove the data frustrations I was getting with Data Studio and "Data Extract".
Ralph Spandl has designed some awesome custom add ons for data studio including "Data Studio Chart Library" just released (which is very good but pricey) (See here for more)
Data Studio - Data Extract - How its supposed to work (The Promise)
Anybody who has used Data Studio will have encountered its most crushing problem... SLLLOOOWWWNNNEEESSSS...
Its not a stretch to say that Data Studio's main purpose is to provide a convenient link to report on Google Ads and Analytics Data. Or anything in the Google Advertising stack (Youtube etc).
The problem is every chart you link to these data sources (or any data source) is actually an API call to the said data source. Whats more - each and every chart/table makes its own API call... so if you have a dashboard with 5 scorecards, a sole table and a sole line graph - this is still 7 seperate calls to the data source. This very often leads to excrutiating wait times - esepcially if you have a big Google Analytics or Google Ads account.
We are living in 2020!!!! People want to see stuff quickly. Latency is the biggest turn off going round. Especially if you are tying to prove a point to someone via a dashboard!
Enter Data Extract to the rescue!!! 😃🚑🚨 This Data Studio connector allows you to connect to any data source and choose a subset of it to be "stored" and refreshed on a daily basis <- allowing you to pull super quick data to your reports.
Wow. amazing. All my problems are solved! Instead of connecting to the APIs above, Ill create a Data Extract for my reports and use that..
Data Studio - Data Extract - How its really working (Reality)
Help documentation is scant on the ground for this. There is no real explanation on how Data Extract actually works. What we do know is that extracts must be under 100MB in size... ok then.
The huge, massive, GINOMORMOUS problem with Data Extract - is that it DOESNT WORK PROPERLY. Or maybe thats harsh - it doesnt work as it SHOULD. The exact same extract of data will be completely different to the underlying data source in some cases.
Not great eh.
This means the nifty new report you just built with extracts... yeah well thats just completely wrong and garbage and you may as well throw in the bin.
Check out the Data Studio dashboard we created that highlights the problems here. For a fairly small and reasonablish/common extract of data we got horrific diffferences between GA and the extract.
MAKE SURE YOU OPEN THE REPORT IN DATA STUDIO ITS REALLY GOOD :-)))
See below for our methodology:
1) We made an extract from GA for the Demo account. It is fixed length: Mar 1, 2020 - Jun 15, 2020. (Note: the final day is in the past - ie not "today" to avoid issues where the extract is out of date)
2) Our testing is for May 20 - Jun15, but any dates up to Jun 15 will highlight the same problem.
3) The data extract is:
Dimensions - Default Channel Grouping, Device Category, Country, Page, Gender, City
Metrics - Pageviews, Sessions, Users, Avg. Session Duration, Bounce Rate, New Users
4) Things that make no difference
- Making the extract smaller <- we tried everything all the way back to a 1 day extract. All yielded broken results. So its not extract size (unless this one day subsample of the demo store is > 100MB???)
- We tried to change the extract in increments Problems occur when only using Default Channel Grouping, Device Category, Country and the same metrics - see bottom extra table
Conclusion - Data Extract for GA is Broken. Badly. And only useable (maybe) if you use very small extracts with small number of dimensions and metrics. UNFORTUNATELY for any big report, this would mean tonnes and tonnes of extracts making it unusable practically...
So whats left to do???
…or move everything to BigQuery instead.