As a follow up on my previous post link I will walk through how we can access and integrate data from Statistics Denmark directly into our data models in Excel.
The data from Statistics Denmark contains data divided into different subjects
- Population and elections
- Living conditions
- Education and knowledge
- Culture and National Church
- Labour, earnings and income
- Prices and consumption
- National accounts and government finances
- Money and credit market
- External economy
- Business sector in general
- Business sectors
- Geography, environment and energy
- Other
Each subject is then divided into different area and under each you have different tables with data
In this case – Under Population in Denmark you can see the different tables.
Each table has different parameters so you can select what data you want to extract.
So a table for the population divided in municipalities for a specific period gives you this result
And then you could choose to download this in different file formats.
This is ok – but I will require us to download a new file manually every time you want fresh data and if you add more parameters the format of the file will require some manual formatting and cut/pasting.
So let’s see how we can improve this with Power Query and public data exposed via a service
Statistics Denmark has released a API (link) that enables us to import the directly into our model. There is a description on the page – but the best help you can get is by using the Console (link) – it’s only available in Danish but not a lot of text so google/bing translate should be able to help you.
In this case I have chosen to GET the list of all available tables in XML format.
The call is – http://api.statbank.dk/v1/tables?format=XML and returns this
!!! And important thing to notice is the available variables and their names – you can use these later when we want to select the data we want to import.
If you use the same http://api.statbank.dk/v1/tables?format=XML in Power Query – you can also get the list of tables in Excel 🙂
But let’s try and analyze the population in Denmark
We can get help in constructing the URL to the data using the Console.
The variable-id refers to the filters/parameters and fields you want to include in the result set. In this case I have chosen
Tid (Time)
Køn (Sex)
Område (Provinces/Municipalities)
Alder (Age)
As I in this case I want all possible values for the fields and I can use * to select All.
The URL to use in Power Query is then
So in Excel – I go to the Power Query Tab and choose
This will open up the Power Query window and we can start to work with the data
First let’s fix the wrong Danish characters – click the on the first step and change file origin to –None–
Then let’s split the values – right click Column1 and choose Split Column and By Delimiter
Choose the Semicolon as separator
Make it a habit to check Advanced options and set the number of columns you expect.
The result is now
Now I will do some column renaming and removing år from the age column and remove some of the extra regions (Område) that’s returned in the CSV file.
let
Source = Csv.Document(Web.Contents(“http://api.statbank.dk/v1/data/Folk1/CSV?valuePresentation=Value&Tid=*&K%C3%B8n=*&Omr%C3%A5de=*&alder=*”)),
ChangedType = Table.TransformColumnTypes(Source,{{“Column1”, type text}}),
SplitColumnDelimiter = Table.SplitColumn(ChangedType,”Column1″,Splitter.SplitTextByDelimiter(“;”),{“Column1.1”, “Column1.2”, “Column1.3”, “Column1.4”, “Column1.5”}),
ChangedType1 = Table.TransformColumnTypes(SplitColumnDelimiter,{{“Column1.1”, type text}, {“Column1.2”, type text}}),
RenamedColumns = Table.RenameColumns(ChangedType1,{{“Column1.2”, “Sex”}, {“Column1.3”, “Municipality”}, {“Column1.4”, “Age”}, {“Column1.1”, “Time”}, {“Column1.5”, “Population”}}),
ReplacedValue = Table.ReplaceValue(RenamedColumns,” år”,””,Replacer.ReplaceText,{“Age”}),
ChangedType2 = Table.TransformColumnTypes(ReplacedValue,{{“Age”, type number}, {“Population”, type number}}),
InsertedCustom = Table.AddColumn(ChangedType2, “Custom”, each Text.Range([Time],0,4)),
InsertedCustom1 = Table.AddColumn(InsertedCustom, “Custom.1”, each Text.Range([Time],5,1)),
RenamedColumns1 = Table.RenameColumns(InsertedCustom1,{{“Custom”, “Year”}, {“Custom.1”, “Quarter”}}),
ChangedType3 = Table.TransformColumnTypes(RenamedColumns1,{{“Quarter”, type number}, {“Year”, type number}}),
InsertedCustom2 = Table.AddColumn(ChangedType3, “Custom”, each Date.AddQuarters(Date.AddDays(Date.FromText(“01-01-“&Text.Range([Time],0,4)),-1), [Quarter])),
RenamedColumns2 = Table.RenameColumns(InsertedCustom2,{{“Custom”, “QuarterDate”}}),
FilteredRows = Table.SelectRows(RenamedColumns2, each not Text.Contains([Municipality], “Region”) or not Text.Contains([Municipality], “Hele “))
in
FilteredRows
After all these steps we end up with a table that looks like this
This data is then loaded to the Data model.
In this example I have chosen to load Values from the API – in production you could choose to load the codes and then create lookup tables with links between the ID’s for example municipalities.
Updating/Importing of the 598.752 rows takes about 2-3 minutes depending on your internet connection.
Now switch to Power Pivot window. The QuarterDate column is not handled correctly show I change the datatype to Date via the formatting group.
And now we can start adding measures to our datamodel integrate with other datasources etc. – in this example I have created 4 measures
Pop:=SUM([Population])
Women:=CALCULATE([Pop];FILTER(Population;[Sex]=”Kvinder”))
Share of Women:=[Women]/CALCULATE([Pop];ALL(Population[Sex]))
Share of Men:=1-[Share of Women]
And now we can start using all the cool visualization tools in Excel – Pivot Charts, Power View and off course Power Map or even the Apps for Office – Bing Map
Another way could be in a Power View Report in Excel
Or the latest tool in our toolbox – Power Map
First quess –
But by changing the Municipality to State/Province – the match is much better
Selecting Next, adding Average age to the Value – we actually get the result visualized by Danish Regions – not by municipality – so go back
And change it to County – and now we have the Danish municipalities
And final result showing the 2 municipalities and that the share of women differs with 2 percentage points.
This is just one of the tables in Statistics Denmark – So if you want to analyze Denmark – Use Power Query and the API – very powerfull.
Hope you like it.
Thank you for taking time to provide this to the community!! Spledid Work 🙂
Thank you – glad you like it 🙂
Pingback: Self-Service BI ressourcer | justB
Hej Just
tak for linket 🙂
Erik
Pingback: Reading data from api.statbank.dk into SAS | Ryslander.com
Pingback: Specifying JSON Query in Power Query – Example Statistics Sweden | Erik Svensen
Thank you for the interest in our StatbankAPI. Just want to comment, that the console is in English, if your browser is not set to prefer Danish.
Hi Pelle, Thank you for the info – I am working on a post where I use the API in the Power Bi Designer as well
BR
Erik
It’s great, that you write these posts. It might even be better, than if we wrote it ourselves, since we can use them as inspiration for our work with the StatbankAPI. We will probably compile a list of posts about using the StatbankAPI and include it in our online documentation. /Pelle, Statistics Denmark
I think its a great thing that the data is so accessible.
You are more than welcome to include the posts … And if you at some point need a live demo or at seminars etc I would gladly come as well
Br
Erik
Thank you very much for your offer. We might very well contact you about that at some point. /Pelle, DST
Pingback: Self-Service BI ressourcer – justB smart