Ha!  You beat me to it, Alex.  The only extra is that Paul might be able to 
identify very common but distinct markers to identify the language, and create 
a simple algorithm.  

Made me wonder how Google translate does it when it is set to 'detect language’

Cheers,

David G 

> On 6 Jun 2020, at 2:11 pm, Alex Tweedly via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> If you simply need to protect users in the scenario you describe, then you 
> could try a simple heuristic
> 
>  - extract the first 100 (200? - 500?) characters (or first 20 words)
> 
>  - spell check that
> 
>  - if there are more than 10 (20? - 50??) spelling errors then flag it as a 
> likely language mismatch.
>  - and if not, proceed to do the spellcheck.
> 
> Adjust the numbers until it gives protection without too many false positives.
> 
> Alex.
> 
> On 05/06/2020 18:15, Paul Dupuis via use-livecode wrote:
>> In all the added stuff the LC7 and higher Unicode engine includes, is there 
>> any way to determine the LANGUAGE of a range of text?
>> 
>> USE-CASE
>> 
>> We have a tool that helps researchers transcribe text from digital media. It 
>> is used internationally. We have added spell checking using lclSpell form 
>> Live Code Labs, a LiveCode store add-on.
>> 
>> For lclSpell, we only have Dictionaries for a small set of languages. You 
>> can build you own Dictionaries for lclSpell, but we'll still only have 
>> Dictionaries for a small subset of the languages people transcribe in. We 
>> also have people who do BOTH transcription AND translations.
>> 
>> For example, transcribing a Chinese language media recording, typing in the 
>> Simplified or Traditional Chinese characters AND then translate it to 
>> English, typing the English translation after the transcription.
>> 
>> With lclSpell (or I suspect ANY LiveCode compatible spell checker) if you 
>> try to spell check a reasonably large chunk of text that is NOT in the same 
>> language as your Dictionary, it ties up LiveCode forever, or at least such a 
>> long time and most people would force-quit. It is after all marking every 
>> word as misspelled and trying to do whatever it does to determine  that.
>> 
>> Now, you can react, that the researcher should just KNOW better than to do 
>> Spell check a text in a language that is not their loaded Dictionary! 
>> However, people are people, and will do such things and expect software to 
>> protect them from their own mistakes. Also, with mixed transcription and 
>> translation, you do want to spell check the English part and skip the 
>> Chinese (if you do not have a Chinese Dictionary)
>> 
>> So, we're looking for a way to detect the LANGUAGE of a range of text, in a 
>> LiveCode field, to be able to then determine whether it matches the current 
>> (or any available) dictionary or not and act accordingly.
>> 
>> There is a "fontLanguage" function in LC, but that seem to predate Unicode 
>> Everywhere and seem pretty useless now.
>> 
>> For example. in a new stack, with a single scrolling field, we paste in a 
>> Chinese text and then execute:
>> 
>> put the fontLanguage of (the effective textfont of char 1 to -1 of fld 1)
>> 
>> and get "ansi". Even you you set the range (char 2 to 3) that is 
>> specifically Chinese (no white space), it still returns "ansi". The textFont 
>> returns empty and the effective textFont returns "Segue UI"
>> 
>> I don't even know if language exists in the IBM Unicode engine as some 
>> exportable property a future version of LiveCode could expose.
>> 
>> Any clever ideas or thoughts on this problem are welcome.
>> 
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to