I follow EU Eurostat on twitter (link – https://twitter.com/EU_Eurostat ) and often see a lot of interesting facts and infographics like this one.
And I have for a long time wanted to see if I could use the webservices that Eurostat also provides (link – https://ec.europa.eu/eurostat/data/web-services) to import the data directly into Power BI.
So here is a guide on how you can do it – and the example will try to extract the data for the Orange production Infographic.
There is a LOT of different datasets in Eurostat and this guide should work on most of them – you just need to find the dataset (https://ec.europa.eu/eurostat/data/database) in the catalogue.
Construct the Query
The REST request we need to construct is defined like this
So, we need to find the datasetCode and specify the filters.
You can find the dataset code by browsing the data catalogue – and the dataset code is stated at the end.
If you need specific items the data explorer you need to specify the code of the items in the request and the Dataexplorer is a good way to find these.
Using the codes we have found we can now use the Query builder to construct our query (click on the picture to try it out)
So after entering the dataset code we get the option to specify the filter and select measures under strucpro
Notice that I have selected Yes to Exclude EU aggregates
The query option will vary from dataset to dataset but the principles are the same.
Clicking the “Generate query filter” will give you
And you can copy the dataset code to clipboard
Now we have the filter part and this can of course be parametrized in your Power Query.
And we must add
before so the full web query is
In order to get the data into Power BI we choose get data and select the Web connector
And press OK
This will return the following to the query editor
Convert the JSON response to tables in Power Query
We get a JSON response returned from the Web query and this has to be transformed into a table – and in order to get that we need to understand what and how the data is returned.
When we use the query in the browser you can see that its not a normal structured JSON document.
And what we want is a table with the following fields – Country, Year, Area, Production
I start by renaming the query to “QueryResult” and disable the load – and the following query will use this as a reference
So lets create the Geo dimension
By clicking the Record on the dimension field
Drill down to geo
Down to category
And to label
Turn it into a table
And then add an index column starting from 0 and rename the columns
I then do the same for the other dimensions and end up with the following queries
Getting the values
Now in order to get the values we need to do a little more
The web query returns a one long list of values when converted into a table.
Values contains all the “values” in the grid when
Each cell in the table is referred by a running index starting from 0 to Facts multiplied by Geo by Time.
So when we have 2 facts, 38 countries and 4 years will give us 2 * 39 * 4 = 304 – this should be the number of rows in our Values table.
But when we look at the Values before we do anything we have only 282 rows.
The missing Values in because cells with missing values (represented by a : or :z ) is located in the Status field in the QueryResult.
So we have to add the missing rows from this Status – this gives us the missing 22 cells (304-282)
And we then convert these Values to null
In our values Query we want to append these rows to our table – and we can do this by modifying the step – by modifying the expression to
= Record.ToTable(value) & MissingRows
And we rename and change the Name column to Index and change the data type to an integer.
The index column is our number reference to each cell in the table/matrix from EuroStat.
Next step is to calculate the keys for time, geo and Facts.
To calculate the TimeKey we add a column
And divide it by the number to time periods
This gives a step with the following expression
= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], 4), type number)
And after click OK we can make it more dynamic by modifying the 4 to the number of rows in
= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], Table.RowCount(Time)), type number)
And now we have the TimeKey for each row.
To add the FactKey we add another calculated column
= Table.AddColumn(#”Inserted Modulo”, “FactKey”, each Number.IntegerDivide([Index], Table.RowCount(Geo)*Table.RowCount(Time)), Int64.Type)
This will give us the factkey and we can see it shifts when we reach row 152 – time count = 4 and geo count = 38 = 152
Now we need the final key column is the geoKey
= Table.AddColumn(#”Inserted Integer-Division”, “GeoKey”, each Number.IntegerDivide([Index], Table.RowCount(Time) ) – [FactKey]*Table.RowCount(Geo), Int64.Type)
And we are now ready to load the data into our data model.
The data is ready to model
After closing the query editor we get the tables returned to Power BI Desktop and we can finalize the datamodel
And create the relationships between our tables.
And create some measures
And start to visualize the Oranges in Europe
One interesting fact is that the Orange trees are highly productive in Albania.
And the cool part is
When 2018 figures is available in EuroStat – we just modify our query to
and refresh our model.
What do you think?
You can download a copy of the pbix file – here.
Did you like it then give the post a like or a comment – let me know what you think – hope you find it useful.
And at least take a spin around Eurostat to what interesting facts that you can find in the many datasets that are provided to you for free.