Ha! You beat me to it, Alex. The only extra is that Paul might be able to identify very common but distinct markers to identify the language, and create a simple algorithm.
Made me wonder how Google translate does it when it is set to 'detect language’ Cheers, David G > On 6 Jun 2020, at 2:11 pm, Alex Tweedly via use-livecode > <use-livecode@lists.runrev.com> wrote: > > If you simply need to protect users in the scenario you describe, then you > could try a simple heuristic > > - extract the first 100 (200? - 500?) characters (or first 20 words) > > - spell check that > > - if there are more than 10 (20? - 50??) spelling errors then flag it as a > likely language mismatch. > - and if not, proceed to do the spellcheck. > > Adjust the numbers until it gives protection without too many false positives. > > Alex. > > On 05/06/2020 18:15, Paul Dupuis via use-livecode wrote: >> In all the added stuff the LC7 and higher Unicode engine includes, is there >> any way to determine the LANGUAGE of a range of text? >> >> USE-CASE >> >> We have a tool that helps researchers transcribe text from digital media. It >> is used internationally. We have added spell checking using lclSpell form >> Live Code Labs, a LiveCode store add-on. >> >> For lclSpell, we only have Dictionaries for a small set of languages. You >> can build you own Dictionaries for lclSpell, but we'll still only have >> Dictionaries for a small subset of the languages people transcribe in. We >> also have people who do BOTH transcription AND translations. >> >> For example, transcribing a Chinese language media recording, typing in the >> Simplified or Traditional Chinese characters AND then translate it to >> English, typing the English translation after the transcription. >> >> With lclSpell (or I suspect ANY LiveCode compatible spell checker) if you >> try to spell check a reasonably large chunk of text that is NOT in the same >> language as your Dictionary, it ties up LiveCode forever, or at least such a >> long time and most people would force-quit. It is after all marking every >> word as misspelled and trying to do whatever it does to determine that. >> >> Now, you can react, that the researcher should just KNOW better than to do >> Spell check a text in a language that is not their loaded Dictionary! >> However, people are people, and will do such things and expect software to >> protect them from their own mistakes. Also, with mixed transcription and >> translation, you do want to spell check the English part and skip the >> Chinese (if you do not have a Chinese Dictionary) >> >> So, we're looking for a way to detect the LANGUAGE of a range of text, in a >> LiveCode field, to be able to then determine whether it matches the current >> (or any available) dictionary or not and act accordingly. >> >> There is a "fontLanguage" function in LC, but that seem to predate Unicode >> Everywhere and seem pretty useless now. >> >> For example. in a new stack, with a single scrolling field, we paste in a >> Chinese text and then execute: >> >> put the fontLanguage of (the effective textfont of char 1 to -1 of fld 1) >> >> and get "ansi". Even you you set the range (char 2 to 3) that is >> specifically Chinese (no white space), it still returns "ansi". The textFont >> returns empty and the effective textFont returns "Segue UI" >> >> I don't even know if language exists in the IBM Unicode engine as some >> exportable property a future version of LiveCode could expose. >> >> Any clever ideas or thoughts on this problem are welcome. >> >> >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode