Not all data API provides you with the same help as Statistics Denmark (see a previous blog post link) where you get help to construct the URL to retrieve the data via a GET Request (http://api.statbank.dk/console#data)
In another project, I was interested in retrieving data about Sweden β and quickly found Statistics Sweden – http://www.scb.se/en_/. The site has a lot of interesting data and facts about Sweden and they also provide access the the data via and API (http://www.scb.se/en_/About-us/Open-data-API/)
Unfortunately the Swedes doesn’t provide the same help as their Danish colleagues, so we have to use a little Power Query magic to retrieve the data – J
Here is how its done.
Β
If you click the example link to retrieve the population for 2011 and 2012 – http://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy – you will get a JSON response but this will only contain the variables that can be used to query the result and no data.
So I started to search the documentation and found that in order to get the data you have to create a GET request to the API with a JSON Query to get data.
But how can we do this in Power Query.
The Help on Web.Contents() in Power Query doesn’t provide much help besides the remark about POST/GET under the content option field (http://office.microsoft.com/en-us/excel-help/web-contents-HA104112310.aspx?CTT=5&origin=HA104122363 – but I found the inspiration/solution via Chris Webb’s blogpost http://cwebbbi.wordpress.com/2014/04/19/web-services-and-post-requests-in-power-query/ – if I could specify the JSON Query as the Content it should be possible.
So lets try
First on the PowerQuery tab select “From Web” and insert the address – http://api.scb.se/OV0104/v1/doris/en/ssd
Β
This will give you a list of all the different variables you can set via the API, but not the data
In order to specify the JSON Query we have to switch to the advanced editor and modify the PowerQuery.
First we need a variable that holds the JSON Query β but remember to replace the ” with “” as the text in the query has to be stated properly.
Then we can use the content variable in the call to the Web.Contents βby adding “, [Content=Text.ToBinary(content)]))” to call.
When we click ok β we get the data J
Then we navigate the JSON result by expanding the list, convert to a table and expand the different columns in the query.
Β
The result is now a bunch of keys for the years and regions in Sweden. We could then create other queries to the API in order to get these key values but there is actually an easier way. We can ask the API to return the data returned by JSON query as CSV instead of JSON, and remember to modify the Source to only web.contents and not a Json.Document.
Example to retrieve the population for Sweden in 2013 split by region and ages
Β
let
PostContents= “{
“”query””: [
{
“”code””: “”Kon””,
“”selection””: {
“”filter””: “”item””,
“”values””: [
“”1″”,
“”2″”
]
}
},
{
“”code””: “”Alder””,
“”selection””: {
“”filter””: “”all””,
“”values””: [
“”*””
]
}
},
{
“”code””: “”ContentsCode””,
“”selection””: {
“”filter””: “”item””,
“”values””: [
“”BE0101N1″”
]
}
},
{
“”code””: “”Tid””,
“”selection””: {
“”filter””: “”item””,
“”values””: [
“”2013″”
]
}
}
],
“”response””: {
“”format””: “”csv””
}
}
“,
Source = Web.Contents(“http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101A/BefolkningNy”,[Content=Text.ToBinary(PostContents)]),
#”Imported CSV” = Csv.Document(Source,null,”,”,null,1252),
#”Removed Top Rows” = Table.Skip(#”Imported CSV”,1),
#”Renamed Columns” = Table.RenameColumns(#”Removed Top Rows”,{{“Column1”, “RegionKey”}, {“Column2”, “Age”}, {“Column3”, “Gender”}, {“Column4”, “Population”}}),
#”Split Column by Delimiter” = Table.SplitColumn(#”Renamed Columns”,”RegionKey”,Splitter.SplitTextByEachDelimiter({” “}, null, false),{“RegionKey.1”, “RegionKey.2”}),
#”Inserted Text Length” = Table.AddColumn(#”Split Column by Delimiter”, “Length”, each Text.Length([RegionKey.1]), type number),
#”Filtered Rows” = Table.SelectRows(#”Inserted Text Length”, each ([Length] = 4)),
#”Renamed Columns1″ = Table.RenameColumns(#”Filtered Rows”,{{“RegionKey.1”, “RegionKey”}, {“RegionKey.2”, “RegionName”}}),
#”Replaced Value” = Table.ReplaceValue(#”Renamed Columns1″,” Γ₯r”,””,Replacer.ReplaceText,{“Age”}),
#”Replaced Value1″ = Table.ReplaceValue(#”Replaced Value”,”+”,””,Replacer.ReplaceText,{“Age”}),
#”Filtered Rows1″ = Table.SelectRows(#”Replaced Value1″, each ([Age] <> “totalt Γ₯lder”)),
#”Replaced Value2″ = Table.ReplaceValue(#”Filtered Rows1″,” “,””,Replacer.ReplaceText,{“Population”}),
#”Changed Type” = Table.TransformColumnTypes(#”Replaced Value2″,{{“Population”, Int64.Type}, {“Age”, Int64.Type}}),
#”Inserted Custom” = Table.AddColumn(#”Changed Type”, “EndOfYear”, each Date.From(“2013/12/31”)),
#”Removed Columns” = Table.RemoveColumns(#”Inserted Custom”,{“Length”}),
#”Filtered Rows2″ = Table.SelectRows(#”Removed Columns”, each ([Population] <> 0)),
#”Changed Type1″ = Table.TransformColumnTypes(#”Filtered Rows2″,{{“EndOfYear”, type date}})
in
#”Changed Type1″
Β
This result can be returned to the datamodel
And we can start using PowerView, PowerMap to visualize the population.
OBS β As the API returns data with Swedish number format β you have to change the Power Query local to Swedish (Sweden) in order for the Query not to fail.
Β
And then we can start analyzing the data in a pivottable
Β
Or in PowerView in Excel
Β
Here is the scary scenario for men when turning 70 β¦. Looks very dangerous. And Power Map lets us plot by county easily.
Β
Or use the new PowerMap filter function to create a heatmap of the regions with 0-3 year old
Β
If you sell diapers or sell children’s shoes in Sweden, you can see where to focus.
You can download a copy of the workbook β from here, and let me know if you have questions or like it.
Β
Β
Β