FTS parser - missing UUID token type

Przemysław Sztoch Wed, 14 Sep 2022 02:27:02 -0700

I miss UUID, which indexes very strangely, is more and more popular andpeople want to search for it.


See: https://www.postgresql.org/docs/current/textsearch-parsers.html


UUID is fairly easy to parse:

The hexadecimal digits are grouped as 32 hexadecimal characters withfour hyphens: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.The number of characters per hyphen is 8-4-4-4-12. The last section offour, or the N position, indicates the format and encoding in either oneto three bits.

Now, UUIDs parse each other differently, depending on whether theindividual parts begin with numbers or letters:00633f1d-1fff-409e-8294-40a21f565904 '-40':6 '00633f1d':2'00633f1d-1fff-409e':1 '1fff':3 '409e':4 '8294':5 'a21f565904':700856c28-2251-4aaf-82d3-e4962f5b732d '-2251':2 '-4':3 '00856c28':1'82d3':6 'aaf':5 'aaf-82d3-e4962f5b732d':4 'e4962f5b732d':700a1cc84-816a-490a-a99c-8a4c637380b0 '00a1cc84':2'00a1cc84-816a-490a-a99c-8a4c637380b0':1 '490a':4 '816a':3'8a4c637380b0':6 'a99c':5


As a result, such identifiers cannot be found in the database later.

What is your opinion on missing tokens for FTS?

--
Przemysław Sztoch | Mobile +48 509 99 00 66

FTS parser - missing UUID token type

Reply via email to