Using Power BI to enrich your data using public datasets – part 2

As a follow up on my previous post link I will walk through how we can access and integrate data from Statistics Denmark directly into our data models in Excel.

The data from Statistics Denmark contains data divided into different subjects

  • Population and elections
  • Living conditions
  • Education and knowledge
  • Culture and National Church
  • Labour, earnings and income
  • Prices and consumption
  • National accounts and government finances
  • Money and credit market
  • External economy
  • Business sector in general
  • Business sectors
  • Geography, environment and energy
  • Other

Each subject is then divided into different area and under each you have different tables with data

In this case – Under Population in Denmark you can see the different tables.

 

Each table has different parameters so you can select what data you want to extract.

So a table for the population divided in municipalities for a specific period gives you this result

And then you could choose to download this in different file formats.

This is ok – but I will require us to download a new file manually every time you want fresh data and if you add more parameters the format of the file will require some manual formatting and cut/pasting.

So let’s see how we can improve this with Power Query and public data exposed via a service

 

Statistics Denmark has released a API (link) that enables us to import the directly into our model. There is a description on the page – but the best help you can get is by using the Console (link) – it’s only available in Danish but not a lot of text so google/bing translate should be able to help you.

In this case I have chosen to GET the list of all available tables in XML format.

The call is – http://api.statbank.dk/v1/tables?format=XML and returns this

!!! And important thing to notice is the available variables and their names – you can use these later when we want to select the data we want to import.

If you use the same http://api.statbank.dk/v1/tables?format=XML in Power Query – you can also get the list of tables in Excel 🙂

 

But let’s try and analyze the population in Denmark

 

We can get help in constructing the URL to the data using the Console.

 

 

The variable-id refers to the filters/parameters and fields you want to include in the result set. In this case I have chosen

Tid (Time)

Køn (Sex)

Område (Provinces/Municipalities)

Alder (Age)

As I in this case I want all possible values for the fields and I can use * to select All.

The URL to use in Power Query is then

http://api.statbank.dk/v1/data/Folk1/CSV?valuePresentation=Value&Tid=*&K%C3%B8n=*&Omr%C3%A5de=*&alder=*

So in Excel – I go to the Power Query Tab and choose

 

This will open up the Power Query window and we can start to work with the data

 

First let’s fix the wrong Danish characters – click the on the first step and change file origin to –None–

Then let’s split the values – right click Column1 and choose Split Column and By Delimiter

 

Choose the Semicolon as separator

 

Make it a habit to check Advanced options and set the number of columns you expect.

The result is now

Now I will do some column renaming and removing år from the age column and remove some of the extra regions (Område) that’s returned in the CSV file.

let

Source = Csv.Document(Web.Contents(“http://api.statbank.dk/v1/data/Folk1/CSV?valuePresentation=Value&Tid=*&K%C3%B8n=*&Omr%C3%A5de=*&alder=*”)),

ChangedType = Table.TransformColumnTypes(Source,{{“Column1”, type text}}),

SplitColumnDelimiter = Table.SplitColumn(ChangedType,”Column1″,Splitter.SplitTextByDelimiter(“;”),{“Column1.1”, “Column1.2”, “Column1.3”, “Column1.4”, “Column1.5”}),

ChangedType1 = Table.TransformColumnTypes(SplitColumnDelimiter,{{“Column1.1”, type text}, {“Column1.2”, type text}}),

RenamedColumns = Table.RenameColumns(ChangedType1,{{“Column1.2”, “Sex”}, {“Column1.3”, “Municipality”}, {“Column1.4”, “Age”}, {“Column1.1”, “Time”}, {“Column1.5”, “Population”}}),

ReplacedValue = Table.ReplaceValue(RenamedColumns,” år”,””,Replacer.ReplaceText,{“Age”}),

ChangedType2 = Table.TransformColumnTypes(ReplacedValue,{{“Age”, type number}, {“Population”, type number}}),

InsertedCustom = Table.AddColumn(ChangedType2, “Custom”, each Text.Range([Time],0,4)),

InsertedCustom1 = Table.AddColumn(InsertedCustom, “Custom.1”, each Text.Range([Time],5,1)),

RenamedColumns1 = Table.RenameColumns(InsertedCustom1,{{“Custom”, “Year”}, {“Custom.1”, “Quarter”}}),

ChangedType3 = Table.TransformColumnTypes(RenamedColumns1,{{“Quarter”, type number}, {“Year”, type number}}),

InsertedCustom2 = Table.AddColumn(ChangedType3, “Custom”, each Date.AddQuarters(Date.AddDays(Date.FromText(“01-01-“&Text.Range([Time],0,4)),-1), [Quarter])),

RenamedColumns2 = Table.RenameColumns(InsertedCustom2,{{“Custom”, “QuarterDate”}}),

FilteredRows = Table.SelectRows(RenamedColumns2, each not Text.Contains([Municipality], “Region”) or not Text.Contains([Municipality], “Hele “))

in

FilteredRows

After all these steps we end up with a table that looks like this

 

This data is then loaded to the Data model.

In this example I have chosen to load Values from the API – in production you could choose to load the codes and then create lookup tables with links between the ID’s for example municipalities.

Updating/Importing of the 598.752 rows takes about 2-3 minutes depending on your internet connection.

 

Now switch to Power Pivot window. The QuarterDate column is not handled correctly show I change the datatype to Date via the formatting group.

And now we can start adding measures to our datamodel integrate with other datasources etc. – in this example I have created 4 measures

Pop:=SUM([Population])

Women:=CALCULATE([Pop];FILTER(Population;[Sex]=”Kvinder”))

Share of Women:=[Women]/CALCULATE([Pop];ALL(Population[Sex]))

Share of Men:=1-[Share of Women]

And now we can start using all the cool visualization tools in Excel – Pivot Charts, Power View and off course Power Map or even the Apps for Office – Bing Map


 

Another way could be in a Power View Report in Excel


 

Or the latest tool in our toolbox – Power Map

First quess –


But by changing the Municipality to State/Province – the match is much better


Selecting Next, adding Average age to the Value – we actually get the result visualized by Danish Regions – not by municipality – so go back


And change it to County – and now we have the Danish municipalities


 

And final result showing the 2 municipalities and that the share of women differs with 2 percentage points.

This is just one of the tables in Statistics Denmark – So if you want to analyze Denmark – Use Power Query and the API – very powerfull.

Hope you like it.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

13 thoughts on “Using Power BI to enrich your data using public datasets – part 2

  1. Pingback: Self-Service BI ressourcer | justB

  2. Pingback: Reading data from api.statbank.dk into SAS | Ryslander.com

  3. Pingback: Specifying JSON Query in Power Query – Example Statistics Sweden | Erik Svensen

      1. It’s great, that you write these posts. It might even be better, than if we wrote it ourselves, since we can use them as inspiration for our work with the StatbankAPI. We will probably compile a list of posts about using the StatbankAPI and include it in our online documentation. /Pelle, Statistics Denmark

      2. I think its a great thing that the data is so accessible.

        You are more than welcome to include the posts … And if you at some point need a live demo or at seminars etc I would gladly come as well

        Br
        Erik

  4. Pingback: Self-Service BI ressourcer – justB smart

  5. Pingback: #PowerQuery – Get the population (or any other) data from Statistics Denmark into Power BI – Erik Svensen – Blog about Power BI, Power Apps, Power Query

Leave a comment