Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI

When you are working with data in Excel or PowerBI the data often contains columns that is a combination of text and numbers.

One example could be like this

If you have this challenge you shouldn’t use Split Columns or Text.Range to do this but check out

Text.Select

Documentation here

And Chris Webb has good example using it for text – here.

My example demonstrates how to work with text but also works with numbers and capitals letters and symbols etc.

Here is how we can extract the House number and Zip Code – use the Custom Column from the Add Tab in the Query Editor window

= Table.AddColumn(Source, “Housenumber”, each Text.Select([Street], {“0”..”9″}))

= Table.AddColumn(#”Added Custom”, “Zip Code”, each Text.Select([Zip], {“0”..”9″}))

And now we have

And one other benefit is that the Function doesn’t return an error when there is no number in the string.

Here is an example file

Hope you find this useful

10 thoughts on “Extraction of number (or text) from a column with both text and number – #PowerQuery #PowerBI

  1. Really nice and simple solution to a common challenge. To you other noobs: remember it is Text.Select – not text.select – caps matter:-)

      1. Hi Erik!
        What I am looking for is a way to get, for instance, invoice numbers out of accounting history in a ledger. The history will have, for instance: 321,75 USD related to invoice 45234. Important: the history not always is standardized, so I cannot use the location of the word “invoice” or the position of the number in the string to retrieve the information.

  2. Hi Erik.
    I remember you showing this at the last User Group meeting. Nice one! I need something very similar, and was hoping you could help.
    I have addresses like “Hornemanns Vænge 1 -33”, “Ellebjergvej 50 – 56 m.fl.”, “Teglholmsgade 35”, “Enghavevej 200 mfl” etc..
    I need to split up the full address into the >streetfirst numberlast numbersupplement< (e.g. mfl. – if any).

    Any idea?

    Lotte

  3. Hi Lotte,

    If the pattern is the same every time with

    ROADNAME – VALUE – potential text

    Then these steps can do the tricks.

    let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“JcvBCYUwEAXAVh45fyXfEDsQLEC8hBwiriuyWUHBkmzExkQ9DxOCaddNKSfVHf11KhP+KJwz8RdMI0LDQhsftMBbFPA1cjlJ+XJHLPMqeec0Epz/jvKcDnpGZS3yJDAx3g==”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Address = _t]),
    #”Added Custom” = Table.AddColumn(Source, “Custom”, each Text.ToList([Address])),
    #”Added Custom1″ = Table.AddColumn(#”Added Custom”, “Custom.1”, each List.PositionOfAny([Custom], {“0”..”9″})),
    #”Added Custom3″ = Table.AddColumn(#”Added Custom1″, “CustomReverse”, each Text.ToList(Text.Reverse([Address]))),
    #”Added Custom2″ = Table.AddColumn(#”Added Custom3″, “Custom.2”, each List.PositionOfAny([CustomReverse], {“0”..”9″})),
    #”Added Custom4″ = Table.AddColumn(#”Added Custom2″, “Custom.3″, each Text.Length([Address])),
    #”Added Custom5″ = Table.AddColumn(#”Added Custom4”, “Custom.4″, each [Custom.3]-[Custom.2]),
    #”Added Custom6″ = Table.AddColumn(#”Added Custom5”, “Part 1″, each Text.Range([Address],0,[Custom.1])),
    #”Added Custom7″ = Table.AddColumn(#”Added Custom6”, “PartNumber”, each Text.Range([Address],[Custom.1],[Custom.4]-[Custom.1])),
    #”Added Custom8″ = Table.AddColumn(#”Added Custom7″, “Part 3″, each Text.RemoveRange([Address], 0, [Custom.4])),
    #”Removed Columns” = Table.RemoveColumns(#”Added Custom8″,{“Custom”, “Custom.1”, “CustomReverse”, “Custom.2”, “Custom.3”, “Custom.4″})
    in
    #”Removed Columns”

    This should be wrapped in a function that you could invoke for each row.

    Let me know if this solves your issue

    /Erik

  4. Thank you Erik. It turned out there were some inconsistencies in the pattern. I need to investigate the data further. However the PositionOfAny was very useful!!
    See you at the next UG. I might have some Problems/challenges to share…

  5. Pingback: Analyzing Office 365 OCR Data using Power BI - Tumble Road

  6. This method returns an error for me on the lines that only have numbers in for example

    Date 123 = 123
    16 xyz = 16
    50 = error
    12/1/2019 = 1212019
    etc…

    Any suggestions on how to fix that

    1. Hi Karlos,

      An easy way to fix this is using the try otherwise

      = Table.AddColumn(#”Added Custom”, “Zip Code”, each try Text.Select([Zip], {“0”..”9″}) otherwise [Zip])

      Erik

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s