Re: Unicode and languages

2020-06-07 Thread David V Glasgow via use-livecode
Ha! You beat me to it, Alex. The only extra is that Paul might be able to identify very common but distinct markers to identify the language, and create a simple algorithm. Made me wonder how Google translate does it when it is set to 'detect language’ Cheers, David G > On 6 Jun 2020, at

Re: Unicode and languages

2020-06-06 Thread Jim Lambert via use-livecode
Not LC native but take a look at Google Translate API

Re: Unicode and languages

2020-06-06 Thread Alex Tweedly via use-livecode
If you simply need to protect users in the scenario you describe, then you could try a simple heuristic  - extract the first 100 (200? - 500?) characters (or first 20 words)  - spell check that  - if there are more than 10 (20? - 50??) spelling errors then flag it as a likely language mismatc

Re: Unicode and languages

2020-06-05 Thread Richmond via use-livecode
I doubt that. But if you can determine the Unicode range that is being used you can at least know which writing system is being used. You could then trap for individual glyphs (such as 'џ', which is only used in Macedonian) to narrow things down a spot. On 5.06.20 20:15, Paul Dupuis via use-li

Re: Unicode and languages

2020-06-05 Thread Paul Dupuis via use-livecode
On 6/5/2020 1:46 PM, Mark Waddingham via use-livecode wrote: On 2020-06-05 18:15, Paul Dupuis via use-livecode wrote: I don't even know if language exists in the IBM Unicode engine as some exportable property a future version of LiveCode could expose. Any clever ideas or thoughts on this proble

Re: Unicode and languages

2020-06-05 Thread Mark Waddingham via use-livecode
On 2020-06-05 18:15, Paul Dupuis via use-livecode wrote: I don't even know if language exists in the IBM Unicode engine as some exportable property a future version of LiveCode could expose. Any clever ideas or thoughts on this problem are welcome. Unicode doesn't deal in languages but 'script

Unicode and languages

2020-06-05 Thread Paul Dupuis via use-livecode
In all the added stuff the LC7 and higher Unicode engine includes, is there any way to determine the LANGUAGE of a range of text? USE-CASE We have a tool that helps researchers transcribe text from digital media. It is used internationally. We have added spell checking using lclSpell form Liv