[libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Not noticed that before but this might not be new. Usually I'm searching for two or three words so I wasn't really been able to notice that because of the additional context. It's not a big deal but instead two or three results, I'm getting fifty. Does it happen with other languages? I guess it's not easy to make Pootle to cope well with every existing language as it might get resource intensive? Thanks, Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
And seams that 'đ' is recognized correctly. Maybe because that letter is used in other languages? Kruno 08.04.2017 u 18:16, Krunose je napisao/la: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Not noticed that before but this might not be new. Usually I'm searching for two or three words so I wasn't really been able to notice that because of the additional context. It's not a big deal but instead two or three results, I'm getting fifty. Does it happen with other languages? I guess it's not easy to make Pootle to cope well with every existing language as it might get resource intensive? Thanks, Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Michael Wolf schrieb: Krunose schrieb: And seams that 'đ' is recognized correctly. Maybe because that letter is used in other languages? Yes, it's ASCII. It exists in Icelandic and Faroese. \u00D0 and \u00F0 (hexadecimal). -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Michael Wolf schrieb: Krunose schrieb: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Yes, it's true. I translate into Upper and Lower Sorbian, they are Slavic languages as well. -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 19:42, Michael Wolf je napisao/la: Michael Wolf schrieb: Krunose schrieb: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Yes, it's true. I translate into Upper and Lower Sorbian, they are Slavic languages as well. Thanks for confirming this. Those characters are passed to URL as 'c', 's' and 'z'. Maybe it can be fixed with percent encoding or something? Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 19:42, Michael Wolf je napisao/la: Michael Wolf schrieb: Krunose schrieb: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Yes, it's true. I translate into Upper and Lower Sorbian, they are Slavic languages as well. Then it affects Serbian, Bosnian, Montenegrin and Slovenian and possible some other Slavic languages. Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 19:56, Michael Wolf je napisao/la: Krunose schrieb: No, HTML entities probably wouldn't work. Letter 'đ' is not passed like that. Don't know if they can fix that easily. This letter works with me by Alt+numeric 240 (208 is upper case) method on Windows 10 on three Pootle projects: Mozilla, LO and Pootle 2.8.0. I tested it with Icelandic. I don't understand characters encodings at all, but if you referring to 'đ', appears it's passed plainly as 'đ' to URL from Pootle. I was wondering if 'č', 'š', 'ć', and 'ž' can be passed as percent encoding to improve Pootle's search functionality? Hope someone will give as answer. Kruno Michael 08.04.2017 u 19:38, Krunose je napisao/la: Does that mean Pootle can't be set to this to work? Don't think chines characters are ASCII but I guess they can use there script. Think it's related to what is passed to URL. These character should be passed to URL as html entities and think that would fix it. That's what happens to 'đ' when search for that letter. Can you confirm that to mailing list? Think they can fix that? Kruno 08.04.2017 u 19:33, Michael Wolf je napisao/la: Krunose schrieb: And seams that 'đ' is recognized correctly. Maybe because that letter is used in other languages? Yes, it's ASCII. It exists in Icelandic and Faroese. \u00D0 and \u00F0 (hexadecimal). -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Krunose schrieb: No, HTML entities probably wouldn't work. Letter 'đ' is not passed like that. Don't know if they can fix that easily. This letter works with me by Alt+numeric 240 (208 is upper case) method on Windows 10 on three Pootle projects: Mozilla, LO and Pootle 2.8.0. I tested it with Icelandic. Michael 08.04.2017 u 19:38, Krunose je napisao/la: Does that mean Pootle can't be set to this to work? Don't think chines characters are ASCII but I guess they can use there script. Think it's related to what is passed to URL. These character should be passed to URL as html entities and think that would fix it. That's what happens to 'đ' when search for that letter. Can you confirm that to mailing list? Think they can fix that? Kruno 08.04.2017 u 19:33, Michael Wolf je napisao/la: Krunose schrieb: And seams that 'đ' is recognized correctly. Maybe because that letter is used in other languages? Yes, it's ASCII. It exists in Icelandic and Faroese. \u00D0 and \u00F0 (hexadecimal). -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Krunose schrieb: 08.04.2017 u 19:42, Michael Wolf je napisao/la: Michael Wolf schrieb: Krunose schrieb: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Yes, it's true. I translate into Upper and Lower Sorbian, they are Slavic languages as well. Then it affects Serbian, Bosnian, Montenegrin and Slovenian and possible some other Slavic languages. Yes, e.g. the Sorbian languages, Polish, Czech, Slovak. Michael -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 20:11, Michael Wolf je napisao/la: Krunose schrieb: 08.04.2017 u 19:42, Michael Wolf je napisao/la: Michael Wolf schrieb: Krunose schrieb: Hi, facing rather strange bug (?) in Pootle. If I put 'citat' in search box, Pootle is returning words like 'čitati' and for 'čitati' is returning 'citat' also. It happens with 'š', 'ž', 'č' and 'ć'. For non-existing word 'moze' it will return 'može', which is actually a word but that's not what I searched for. Seams like it's converting diacritics to 'c', 'z', 's' internally. Yes, it's true. I translate into Upper and Lower Sorbian, they are Slavic languages as well. Then it affects Serbian, Bosnian, Montenegrin and Slovenian and possible some other Slavic languages. Yes, e.g. the Sorbian languages, Polish, Czech, Slovak. Michael Let's wait and see what happens :D Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
Krunose schrieb: Yes, e.g. the Sorbian languages, Polish, Czech, Slovak. Michael Let's wait and see what happens :D I filed a bug: https://github.com/translate/pootle/issues/6238 Michael -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 21:21, Michael Wolf je napisao/la: Krunose schrieb: Yes, e.g. the Sorbian languages, Polish, Czech, Slovak. Michael Let's wait and see what happens :D I filed a bug: https://github.com/translate/pootle/issues/6238 Michael I'll probably leave a comment latter to bring the heat. Now when I think about it, it's not just about passing strings incorrectly to URL from search, it's more complicated then that so I stop playing Sherlock Holmes here. But I kinda doubt it's easy to fix. We'll see... And thanks for quick reaction! :) Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted
Re: [libreoffice-l10n] Pootle doesn't recognize difference between c, z, s and Croatian diacritics č, ž, š
08.04.2017 u 21:54, Krunose je napisao/la: 08.04.2017 u 21:21, Michael Wolf je napisao/la: Krunose schrieb: Yes, e.g. the Sorbian languages, Polish, Czech, Slovak. Michael Let's wait and see what happens :D I filed a bug: https://github.com/translate/pootle/issues/6238 Michael I'll probably leave a comment latter to bring the heat. Now when I think about it, it's not just about passing strings incorrectly to URL from search, it's more complicated then that so I stop playing Sherlock Holmes here. But I kinda doubt it's easy to fix. We'll see... And thanks for quick reaction! :) Kruno As I suspected, seams that search query _is_ passed as percent encoding for 'čitati' as Firebug in Firefox shows ...search=%C4%8Ditati as what passed to GET and %C4%8D should be percent encoding for 'č' so something else is wrong. I'll definitely leave a comment to that bug report. Kruno -- To unsubscribe e-mail to: l10n+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/l10n/ All messages sent to this list will be publicly archived and cannot be deleted