On 9/24/20 12:09 PM, J. Landman Gay via use-livecode wrote:
My original goal was to get the canonical version directly from LC somehow.

Neville Smythe contacted me privately with this brilliant solution, posted here 
with his consent:

function stripAccents pInput
  local tDecomposed
  local tStripped

  replace "'" with space in pInput -- illegal in sql queries, (my requirement)

  -- Separate the accents from the base letters
  put normalizeText(pInput, "NFD") into tDecomposed
  repeat for each codepoint c in tDecomposed
    -- Copy everything but the accent marks
    if c="Æ" then put "AE" after tStripped
    else if c="Œ" then put "OE" after tStripped
    else if codepointProperty(c, "Diacritic") is false then
      put c after tStripped
    end if
  end repeat
  return tStripped
end stripAccents

This works great for my needs and is exactly what I was looking for. I had no idea we had a codepointProperty function, which makes this all possible.

This will work for most European Latin alphabets with a few exceptions. Neville found that German, Polish and Dutch may not be completely compatible, there may be some others. There is a list of special characters that may need specific replacements here:

<https://maximilian.schalch.de/2018/05/complete-list-of-european-special-characters/>

For now I only need French, so I can probably omit the specific replacements. Maybe Neville will chime in if I've left out anything, he's done quite a bit of research into the problem.

--
Jacqueline Landman Gay         |     jac...@hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to