Posted on March 6, 2018April 5, 2018 in Power Query, Uncategorized

Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI

When you are working with data in Excel or PowerBI the data often contains columns that is a combination of text and numbers.

One example could be like this

If you have this challenge you shouldn’t use Split Columns or Text.Range to do this but check out

Text.Select

Documentation here

And Chris Webb has good example using it for text – here.

My example demonstrates how to work with text but also works with numbers and capitals letters and symbols etc.

Here is how we can extract the House number and Zip Code – use the Custom Column from the Add Tab in the Query Editor window

= Table.AddColumn(Source, “Housenumber”, each Text.Select([Street], {“0”..”9″}))

= Table.AddColumn(#”Added Custom”, “Zip Code”, each Text.Select([Zip], {“0”..”9″}))

And now we have

And one other benefit is that the Function doesn’t return an error when there is no number in the string.

Here is an example file

Hope you find this useful

28 thoughts on “Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI”

Kasper says:

November 7, 2018 at 2:22 pm Reply

Really nice and simple solution to a common challenge. To you other noobs: remember it is Text.Select – not text.select – caps matter:-)
Rafael Moreno says:

January 23, 2019 at 12:49 pm Reply

Is there a sister function for this, that would do the same but extracting only the numbers instead of the text?
1. Erik Svensen says:
  
  January 23, 2019 at 10:34 pm Reply
  
  Hi Rafael – can you give me an example on what you want to accomplish ?
  1. Rafael Moreno says:
    
    January 24, 2019 at 9:55 am
    
    Hi Erik!
    What I am looking for is a way to get, for instance, invoice numbers out of accounting history in a ledger. The history will have, for instance: 321,75 USD related to invoice 45234. Important: the history not always is standardized, so I cannot use the location of the word “invoice” or the position of the number in the string to retrieve the information.
Lotte says:

March 5, 2019 at 1:22 pm Reply

Hi Erik.
I remember you showing this at the last User Group meeting. Nice one! I need something very similar, and was hoping you could help.
I have addresses like “Hornemanns Vænge 1 -33”, “Ellebjergvej 50 – 56 m.fl.”, “Teglholmsgade 35”, “Enghavevej 200 mfl” etc..
I need to split up the full address into the >streetfirst numberlast numbersupplement< (e.g. mfl. – if any).

Any idea?

Lotte
Erik Svensen says:

March 5, 2019 at 8:25 pm Reply

Hi Lotte,

If the pattern is the same every time with

ROADNAME – VALUE – potential text

Then these steps can do the tricks.

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“JcvBCYUwEAXAVh45fyXfEDsQLEC8hBwiriuyWUHBkmzExkQ9DxOCaddNKSfVHf11KhP+KJwz8RdMI0LDQhsftMBbFPA1cjlJ+XJHLPMqeec0Epz/jvKcDnpGZS3yJDAx3g==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Address = _t]),
#”Added Custom” = Table.AddColumn(Source, “Custom”, each Text.ToList([Address])),
#”Added Custom1″ = Table.AddColumn(#”Added Custom”, “Custom.1”, each List.PositionOfAny([Custom], {“0”..”9″})),
#”Added Custom3″ = Table.AddColumn(#”Added Custom1″, “CustomReverse”, each Text.ToList(Text.Reverse([Address]))),
#”Added Custom2″ = Table.AddColumn(#”Added Custom3″, “Custom.2”, each List.PositionOfAny([CustomReverse], {“0”..”9″})),
#”Added Custom4″ = Table.AddColumn(#”Added Custom2″, “Custom.3″, each Text.Length([Address])),
#”Added Custom5″ = Table.AddColumn(#”Added Custom4”, “Custom.4″, each [Custom.3]-[Custom.2]),
#”Added Custom6″ = Table.AddColumn(#”Added Custom5”, “Part 1″, each Text.Range([Address],0,[Custom.1])),
#”Added Custom7″ = Table.AddColumn(#”Added Custom6”, “PartNumber”, each Text.Range([Address],[Custom.1],[Custom.4]-[Custom.1])),
#”Added Custom8″ = Table.AddColumn(#”Added Custom7″, “Part 3″, each Text.RemoveRange([Address], 0, [Custom.4])),
#”Removed Columns” = Table.RemoveColumns(#”Added Custom8″,{“Custom”, “Custom.1”, “CustomReverse”, “Custom.2”, “Custom.3”, “Custom.4″})
in
#”Removed Columns”

This should be wrapped in a function that you could invoke for each row.

Let me know if this solves your issue

/Erik
Lotte Christoffersen says:

March 6, 2019 at 9:01 am Reply

Thank you Erik. It turned out there were some inconsistencies in the pattern. I need to investigate the data further. However the PositionOfAny was very useful!!
See you at the next UG. I might have some Problems/challenges to share…
Pingback: Analyzing Office 365 OCR Data using Power BI - Tumble Road
karlos O'Neill says:

April 10, 2019 at 1:25 pm Reply

This method returns an error for me on the lines that only have numbers in for example

Date 123 = 123
16 xyz = 16
50 = error
12/1/2019 = 1212019
etc…

Any suggestions on how to fix that
1. Erik Svensen says:
  
  April 10, 2019 at 1:28 pm Reply
  
  Hi Karlos,
  
  An easy way to fix this is using the try otherwise
  
  = Table.AddColumn(#”Added Custom”, “Zip Code”, each try Text.Select([Zip], {“0”..”9″}) otherwise [Zip])
  
  Erik
Pingback: Analyzing Office 365 OCR Data using Power BI - Marquee Insights
Daniel says:

December 13, 2019 at 4:15 am Reply

Is there a parameter I can add to Text.Select to only extract 6 digit numbers? The field I am extracting have may other numbers of 1 or 2 digits.
1. Erik Svensen says:
  
  December 17, 2019 at 9:07 am Reply
  
  Hi Daniel,
  
  Do you have an example of the data you work with and the result you want ?
  
  Erik
Conner McRae says:

January 30, 2020 at 5:13 pm Reply

Same question as Daniel above.
An example would be a column of comments. So something like:
WO# 1234567. Call Tony @ 623-623-6236 30 minutes prior to arrival

I need to only extract a 7 digit number. in this case, the WO#
1. Erik Svensen says:
  
  February 4, 2020 at 2:19 pm Reply
  
  Hi Conner – Will the comment always contain the WO# followed by the number ? /Erik
  1. Conner McRae says:
    
    February 4, 2020 at 3:46 pm
    
    No not necessarily. Its a free form column so anything and everything could potentially be in there
2. Emma Stanger says:
  
  February 3, 2021 at 11:10 am Reply
  
  Hi, I also need a function like this, any ideas?
Federico Lozano says:

March 6, 2020 at 3:19 pm Reply

Hello Erik. Great tutorial, thank you so much. I was wondering if there was a way to extract only the numbers before the first letter, or something similar. The reason for this is I am trying to sum up total capacity for tanks in a list of thousands of them, each ranging in capacity. The list is made in such a way that it shows ” M3″ (cubic meters), so I can´t sum them up due to the “M”. If I use this formula, it extracts all numbers (including the 3 at the end), if I exclude all 3s however, it won´t extract the 3 from 30M3 tanks :S

Do you know any way around this issue? I´m quite a noob with PowerBI.
An example of the list is something like this:

5 M3
20 M3
10 M3
50 M3
30 M3
7 M3
etc etc
1. Erik Svensen says:
  
  March 9, 2020 at 9:12 am Reply
  
  Hi Federico,
  
  You can do that by using the Split Column function and use the M as the delimiter.
  
  let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WMlXwNVaK1YlWMjKAsQzhLFM4yxjOMgczYgE=”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Data = _t]),
  #”Changed Type” = Table.TransformColumnTypes(Source,{{“Data”, type text}}),
  #”Split Column by Delimiter” = Table.SplitColumn(#”Changed Type”, “Data”, Splitter.SplitTextByDelimiter(“M”, QuoteStyle.Csv), {“Data.1″}),
  #”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“Data.1″, Int64.Type}})
  in
  #”Changed Type1″
  
  I have modified the step #”Split Column by Delimiter” to only include the first item of the split
  
  /Erik
Andrey says:

June 26, 2020 at 6:01 pm Reply

also https://docs.microsoft.com/en-us/powerquery-m/character-tonumber
Andrey says:

June 26, 2020 at 6:36 pm Reply

Ugh. Ignore that. Looks like Character.ToNumber returns an ASCII code for a character. The M specs documentation is amazingly bad in spots.
Sam says:

October 8, 2020 at 10:48 am Reply

Hi Erik, Thanks for the great content! Is there a way to extract the text and amount into two (2) separate columns? Some rows of my data have amounts only and some have currency and amounts.

USD 100,230
EUR 200, 300.25
JPY 100,000
230, 234.23
2301.23
1. Erik Svensen says:
  
  October 13, 2020 at 12:03 pm Reply
  
  Hi Sam,
  
  Here is a solution
  
  let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45Wcg0NUjAyMNBRMDYw0DMyVYrViVbyCohUMASKGRgYgPlGxkB5I2MTPSNjGN8Qxg4NdgGrBYopxcYCAA==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [test = _t]),
  #”Inserted Kept Characters” = Table.AddColumn(Source, “Kept Characters”, each Text.Select([test], {“,”, “.”, “0”..”9″}), type text)
  in
  #”Inserted Kept Characters”
  
  /Erik
2. Erik Svensen says:
  
  October 13, 2020 at 12:05 pm Reply
  
  and to get the currency
  
  = Table.AddColumn(#”Inserted Kept Characters”, “Currency”, each Text.Select([test], {“A”..”Y”}), type text)
Per Bandsholm says:

October 14, 2020 at 1:55 pm Reply

How to strip the following to separate columns ? The Text are from fixed values and always equal.

1 Not Applicable
1 Pass
13 Fail
2 Fail-Corrected
3 Fail-Repeated

Where used: It is the Result from questions in an assessment checklist.
1. Erik Svensen says:
  
  October 14, 2020 at 1:58 pm Reply
  
  Hi Per,
  
  You can use the Split Column function and use the first space as separator
  
  let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WMlTwyy9RcCwoyMlMTkzKSVWK1QEJBiQWF0OYxgpuiZk5YLYRmKnrnF9UlJpckpoCFoTI6walFqQmgsViAQ==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
  #”Changed Type” = Table.TransformColumnTypes(Source,{{“Column1″, type text}}),
  #”Split Column by Delimiter” = Table.SplitColumn(#”Changed Type”, “Column1″, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv, false), {“Column1.1”, “Column1.2″})
  in
  #”Split Column by Delimiter”
  
  /Erik
Adam Baker says:

February 8, 2022 at 8:56 pm Reply

HOW TO EXTRACT A 6 DIGIT ALPHANUMERIC FROM A STRING (TITLE)?
1. Erik Svensen says:
  
  February 9, 2022 at 8:36 am Reply
  
  Hi Adam, Could you provide some sample data ?
  Erik