Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI

When you are working with data in Excel or PowerBI the data often contains columns that is a combination of text and numbers.

One example could be like this

If you have this challenge you shouldn’t use Split Columns or Text.Range to do this but check out

Text.Select

Documentation here

And Chris Webb has good example using it for text – here.

My example demonstrates how to work with text but also works with numbers and capitals letters and symbols etc.

Here is how we can extract the House number and Zip Code – use the Custom Column from the Add Tab in the Query Editor window

= Table.AddColumn(Source, “Housenumber”, each Text.Select([Street], {“0”..”9″}))

= Table.AddColumn(#”Added Custom”, “Zip Code”, each Text.Select([Zip], {“0”..”9″}))

And now we have

And one other benefit is that the Function doesn’t return an error when there is no number in the string.

Here is an example file

Hope you find this useful

28 thoughts on “Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI

  1. Really nice and simple solution to a common challenge. To you other noobs: remember it is Text.Select – not text.select – caps matter:-)

      1. Hi Erik!
        What I am looking for is a way to get, for instance, invoice numbers out of accounting history in a ledger. The history will have, for instance: 321,75 USD related to invoice 45234. Important: the history not always is standardized, so I cannot use the location of the word “invoice” or the position of the number in the string to retrieve the information.

  2. Hi Erik.
    I remember you showing this at the last User Group meeting. Nice one! I need something very similar, and was hoping you could help.
    I have addresses like “Hornemanns Vænge 1 -33”, “Ellebjergvej 50 – 56 m.fl.”, “Teglholmsgade 35”, “Enghavevej 200 mfl” etc..
    I need to split up the full address into the >streetfirst numberlast numbersupplement< (e.g. mfl. – if any).

    Any idea?

    Lotte

  3. Hi Lotte,

    If the pattern is the same every time with

    ROADNAME – VALUE – potential text

    Then these steps can do the tricks.

    let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“JcvBCYUwEAXAVh45fyXfEDsQLEC8hBwiriuyWUHBkmzExkQ9DxOCaddNKSfVHf11KhP+KJwz8RdMI0LDQhsftMBbFPA1cjlJ+XJHLPMqeec0Epz/jvKcDnpGZS3yJDAx3g==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Address = _t]),
    #”Added Custom” = Table.AddColumn(Source, “Custom”, each Text.ToList([Address])),
    #”Added Custom1″ = Table.AddColumn(#”Added Custom”, “Custom.1”, each List.PositionOfAny([Custom], {“0”..”9″})),
    #”Added Custom3″ = Table.AddColumn(#”Added Custom1″, “CustomReverse”, each Text.ToList(Text.Reverse([Address]))),
    #”Added Custom2″ = Table.AddColumn(#”Added Custom3″, “Custom.2”, each List.PositionOfAny([CustomReverse], {“0”..”9″})),
    #”Added Custom4″ = Table.AddColumn(#”Added Custom2″, “Custom.3″, each Text.Length([Address])),
    #”Added Custom5″ = Table.AddColumn(#”Added Custom4”, “Custom.4″, each [Custom.3]-[Custom.2]),
    #”Added Custom6″ = Table.AddColumn(#”Added Custom5”, “Part 1″, each Text.Range([Address],0,[Custom.1])),
    #”Added Custom7″ = Table.AddColumn(#”Added Custom6”, “PartNumber”, each Text.Range([Address],[Custom.1],[Custom.4]-[Custom.1])),
    #”Added Custom8″ = Table.AddColumn(#”Added Custom7″, “Part 3″, each Text.RemoveRange([Address], 0, [Custom.4])),
    #”Removed Columns” = Table.RemoveColumns(#”Added Custom8″,{“Custom”, “Custom.1”, “CustomReverse”, “Custom.2”, “Custom.3”, “Custom.4″})
    in
    #”Removed Columns”

    This should be wrapped in a function that you could invoke for each row.

    Let me know if this solves your issue

    /Erik

  4. Thank you Erik. It turned out there were some inconsistencies in the pattern. I need to investigate the data further. However the PositionOfAny was very useful!!
    See you at the next UG. I might have some Problems/challenges to share…

  5. Pingback: Analyzing Office 365 OCR Data using Power BI - Tumble Road

  6. This method returns an error for me on the lines that only have numbers in for example

    Date 123 = 123
    16 xyz = 16
    50 = error
    12/1/2019 = 1212019
    etc…

    Any suggestions on how to fix that

    1. Hi Karlos,

      An easy way to fix this is using the try otherwise

      = Table.AddColumn(#”Added Custom”, “Zip Code”, each try Text.Select([Zip], {“0”..”9″}) otherwise [Zip])

      Erik

  7. Pingback: Analyzing Office 365 OCR Data using Power BI - Marquee Insights

  8. Is there a parameter I can add to Text.Select to only extract 6 digit numbers? The field I am extracting have may other numbers of 1 or 2 digits.

  9. Same question as Daniel above.
    An example would be a column of comments. So something like:
    WO# 1234567. Call Tony @ 623-623-6236 30 minutes prior to arrival

    I need to only extract a 7 digit number. in this case, the WO#

      1. No not necessarily. Its a free form column so anything and everything could potentially be in there

  10. Hello Erik. Great tutorial, thank you so much. I was wondering if there was a way to extract only the numbers before the first letter, or something similar. The reason for this is I am trying to sum up total capacity for tanks in a list of thousands of them, each ranging in capacity. The list is made in such a way that it shows ” M3″ (cubic meters), so I can´t sum them up due to the “M”. If I use this formula, it extracts all numbers (including the 3 at the end), if I exclude all 3s however, it won´t extract the 3 from 30M3 tanks :S

    Do you know any way around this issue? I´m quite a noob with PowerBI.
    An example of the list is something like this:

    5 M3
    20 M3
    10 M3
    50 M3
    30 M3
    7 M3
    etc etc

    1. Hi Federico,

      You can do that by using the Split Column function and use the M as the delimiter.

      let
      Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WMlXwNVaK1YlWMjKAsQzhLFM4yxjOMgczYgE=”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Data = _t]),
      #”Changed Type” = Table.TransformColumnTypes(Source,{{“Data”, type text}}),
      #”Split Column by Delimiter” = Table.SplitColumn(#”Changed Type”, “Data”, Splitter.SplitTextByDelimiter(“M”, QuoteStyle.Csv), {“Data.1″}),
      #”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“Data.1″, Int64.Type}})
      in
      #”Changed Type1″

      I have modified the step #”Split Column by Delimiter” to only include the first item of the split

      /Erik

  11. Ugh. Ignore that. Looks like Character.ToNumber returns an ASCII code for a character. The M specs documentation is amazingly bad in spots.

  12. Hi Erik, Thanks for the great content! Is there a way to extract the text and amount into two (2) separate columns? Some rows of my data have amounts only and some have currency and amounts.

    USD 100,230
    EUR 200, 300.25
    JPY 100,000
    230, 234.23
    2301.23

    1. Hi Sam,

      Here is a solution

      let
      Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45Wcg0NUjAyMNBRMDYw0DMyVYrViVbyCohUMASKGRgYgPlGxkB5I2MTPSNjGN8Qxg4NdgGrBYopxcYCAA==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [test = _t]),
      #”Inserted Kept Characters” = Table.AddColumn(Source, “Kept Characters”, each Text.Select([test], {“,”, “.”, “0”..”9″}), type text)
      in
      #”Inserted Kept Characters”

      /Erik

  13. How to strip the following to separate columns ? The Text are from fixed values and always equal.

    1 Not Applicable
    1 Pass
    13 Fail
    2 Fail-Corrected
    3 Fail-Repeated

    Where used: It is the Result from questions in an assessment checklist.

    1. Hi Per,

      You can use the Split Column function and use the first space as separator

      let
      Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WMlTwyy9RcCwoyMlMTkzKSVWK1QEJBiQWF0OYxgpuiZk5YLYRmKnrnF9UlJpckpoCFoTI6walFqQmgsViAQ==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
      #”Changed Type” = Table.TransformColumnTypes(Source,{{“Column1″, type text}}),
      #”Split Column by Delimiter” = Table.SplitColumn(#”Changed Type”, “Column1″, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv, false), {“Column1.1”, “Column1.2″})
      in
      #”Split Column by Delimiter”

      /Erik

Leave a comment