Guide – How to import data from Eurostat directly into #PowerBI

I follow EU Eurostat on twitter (link – https://twitter.com/EU_Eurostat ) and often see a lot of interesting facts and infographics like this one.

And I have for a long time wanted to see if I could use the webservices that Eurostat also provides (link – https://ec.europa.eu/eurostat/data/web-services) to import the data directly into Power BI.

So here is a guide on how you can do it – and the example will try to extract the data for the Orange production Infographic.

There is a LOT of different datasets in Eurostat and this guide should work on most of them – you just need to find the dataset (https://ec.europa.eu/eurostat/data/database) in the catalogue.

Construct the Query

The REST request we need to construct is defined like this

So, we need to find the datasetCode and specify the filters.

You can find the dataset code by browsing the data catalogue – and the dataset code is stated at the end.

If you need specific items the data explorer you need to specify the code of the items in the request and the Dataexplorer is a good way to find these.

Using the codes we have found we can now use the Query builder to construct our query (click on the picture to try it out)

So after entering the dataset code we get the option to specify the filter and select measures under strucpro

Notice that I have selected Yes to Exclude EU aggregates

The query option will vary from dataset to dataset but the principles are the same.

Clicking the “Generate query filter” will give you

And you can copy the dataset code to clipboard

apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017

Now we have the filter part and this can of course be parametrized in your Power Query.

And we must add

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/

before so the full web query is

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017

In order to get the data into Power BI we choose get data and select the Web connector

And press OK

This will return the following to the query editor

Convert the JSON response to tables in Power Query

We get a JSON response returned from the Web query and this has to be transformed into a table – and in order to get that we need to understand what and how the data is returned.

When we use the query in the browser you can see that its not a normal structured JSON document.

And what we want is a table with the following fields – Country, Year, Area, Production

I start by renaming the query to “QueryResult” and disable the load – and the following query will use this as a reference

So lets create the Geo dimension

By clicking the Record on the dimension field

Drill down to geo

Down to category

And to label

Turn it into a table

And then add an index column starting from 0 and rename the columns

I then do the same for the other dimensions and end up with the following queries

Getting the values

Now in order to get the values we need to do a little more

The web query returns a one long list of values when converted into a table.

Values contains all the “values” in the grid when

Each cell in the table is referred by a running index starting from 0 to Facts multiplied by Geo by Time.

So when we have 2 facts, 38 countries and 4 years will give us 2 * 39 * 4 = 304 – this should be the number of rows in our Values table.

But when we look at the Values before we do anything we have only 282 rows.

The missing Values in because cells with missing values (represented by a : or :z ) is located in the Status field in the QueryResult.

So we have to add the missing rows from this Status – this gives us the missing 22 cells (304-282)

And we then convert these Values to null

In our values Query we want to append these rows to our table – and we can do this by modifying the step – by modifying the expression to

= Record.ToTable(value) & MissingRows

And we rename and change the Name column to Index and change the data type to an integer.

The index column is our number reference to each cell in the table/matrix from EuroStat.

Next step is to calculate the keys for time, geo and Facts.

To calculate the TimeKey we add a column

And divide it by the number to time periods

This gives a step with the following expression

= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], 4), type number)

And after click OK we can make it more dynamic by modifying the 4 to the number of rows in

= Table.AddColumn(#”Added Index”, “TimeKey”, each Number.Mod([Index], Table.RowCount(Time)), type number)

And now we have the TimeKey for each row.

To add the FactKey we add another calculated column

= Table.AddColumn(#”Inserted Modulo”, “FactKey”, each Number.IntegerDivide([Index], Table.RowCount(Geo)*Table.RowCount(Time)), Int64.Type)

This will give us the factkey and we can see it shifts when we reach row 152 – time count = 4 and geo count = 38 = 152

Now we need the final key column is the geoKey

= Table.AddColumn(#”Inserted Integer-Division”, “GeoKey”, each Number.IntegerDivide([Index], Table.RowCount(Time) ) – [FactKey]*Table.RowCount(Geo), Int64.Type)

And we are now ready to load the data into our data model.

The data is ready to model

After closing the query editor we get the tables returned to Power BI Desktop and we can finalize the datamodel

And create the relationships between our tables.

And create some measures

And start to visualize the Oranges in Europe

One interesting fact is that the Orange trees are highly productive in Albania.

And the cool part is

When 2018 figures is available in EuroStat – we just modify our query to

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/apro_cpsh1?filterNonGeo=1&precision=1&crops=T1000&strucpro=AR&strucpro=PR_HU_EU&time=2014&time=2015&time=2016&time=2017&time=2018

and refresh our model.

What do you think?

You can download a copy of the pbix file – here.

Did you like it then give the post a like or a comment – let me know what you think – hope you find it useful.

And at least take a spin around Eurostat to what interesting facts that you can find in the many datasets that are provided to you for free.

13 thoughts on “Guide – How to import data from Eurostat directly into #PowerBI

      1. I tried making selections, which gave me this code:
        ext_lt_intertrd?filterNonGeo=1&precision=1&geo=EU28&sitc06=SITC2_4&partner=EXT_EU28&time=2017

        But adding it in your line after the JSON-part just gave me an error. When I tried to use your full-line it worked without problems.
        What I’m after is monthly trade of a group of commodities to and from all EU-member states.

  1. Hello Erik,
    First of all, many thanks for sharing that. I think this could be really powerful and easy way to incorporate public data into the daily basis of a company. But there is one thing that I am not getting.

    I´ve created my Query, see below:
    http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tran_r_vehst?precision=1&vehicle=CAR&vehicle=LOR&vehicle=MOTO&vehicle=SPE&vehicle=TRC&vehicle=TRL_STRL&unit=NR

    I don´t understand how you define the order for indexing the values, I mean, which should be the first to apply the module? and the second? I can´t see this query in a table as you are showing in the photo below this text “Values contains all the “values” in the grid when”. What am I missing? Many thanks!!

    1. Hello Pablo,
      Glad to hear you find it usefull
      When your query returns values – the dimension is called vehicle and not strucpro as in my example – so you have to modify the Facts Query in order to get what you want – and then also modify the measues as well 🙂

      let
      Source = QueryResult,
      dimension = Source[dimension],
      strucpro = dimension[vehicle],
      category = strucpro[category],
      label = category[label],
      #”Converted to Table” = Record.ToTable(label),
      #”Added Index” = Table.AddIndexColumn(#”Converted to Table”, “Index”, 0, 1),
      #”Renamed Columns” = Table.RenameColumns(#”Added Index”,{{“Index”, “FactKey”}, {“Value”, “Fact”}})
      in
      #”Renamed Columns”

      and this will give you the facts with index’s

      Let me know if you want a pbix file

      /Erik

  2. Great article!
    What I was wondering is: why do you need to read the status property to discover the missing values? Status should only be necessary if you want to know the reason why some values are missing.

    1. Thx…. its needed to fill out the missing rows in the values table as they are “removed” by PowerBI when importing the data – so its filling out the “cells” in the matrix with blank values where there are no valid values

  3. Hi,
    Can you elaborate on the the calculation of the time/fact/geo Key? I’m trying to understand the calculation to adopt the code for another project but I do not understand it :). For the first key you are using MOD and for the second and third a different way of calculation. Is there a generic formula? Sorry, but this site seems to be the only source of information for this kind of problem.

    Thanks

  4. Hi Erik,
    As a non Power-Bi expert I could follow your learning quite well. I took the producer prices of industry to build a PBI-report. I get stuck with the creation of the time key, geo key, industry key in the Value table. I have the same difficulty as the earlier response of Heifi.
    Could you please explain a little bit more about the programming?
    If required I can also share my progress in my PBIX-file.
    Thanks in advance,
    Jan Roerink

Leave a comment