[fpc-pascal] Unicode chars losing information
I came across a bug which was caused but a unicode character losing information and narrowed it down to this. Why doesn't the chars[1] print the same character as appeared in the string? var chars: UnicodeString; begin chars := '⌘⌥⌫⇧^'; writeln(chars); writeln(chars[1]); end. Prints: ⌘⌥⌫⇧^ ? Regards, Ryan Joseph ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
Op 2021-03-07 om 17:21 schreef Ryan Joseph via fpc-pascal: I came across a bug which was caused but a unicode character losing information and narrowed it down to this. Why doesn't the chars[1] print the same character as appeared in the string? var chars: UnicodeString; begin chars := '⌘⌥⌫⇧^'; writeln(chars); writeln(chars[1]); end. Prints: ⌘⌥⌫⇧^ ? Probably it is not in the BMP and thus needs more position than one. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
> On Mar 7, 2021, at 9:31 AM, Marco van de Voort via fpc-pascal > wrote: > > Probably it is not in the BMP and thus needs more position than one. Isn't char[1] a 2 byte wide char? Not sure I understand "more position than on" though. Regards, Ryan Joseph ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
Op 2021-03-07 om 17:38 schreef Ryan Joseph via fpc-pascal: On Mar 7, 2021, at 9:31 AM, Marco van de Voort via fpc-pascal wrote: Probably it is not in the BMP and thus needs more position than one. Isn't char[1] a 2 byte wide char? Not sure I understand "more position than on" though. Yes it is. And there are about 1114000 unicode codepoints, or about 17 times what fits in a 2-byte wide char. https://en.wikipedia.org/wiki/Code_point https://en.wikipedia.org/wiki/UTF-16 ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
> On Mar 7, 2021, at 10:11 AM, Marco van de Voort via fpc-pascal > wrote: > > > Yes it is. And there are about 1114000 unicode codepoints, or about 17 times > what fits in a 2-byte wide char. > > https://en.wikipedia.org/wiki/Code_point > > https://en.wikipedia.org/wiki/UTF-16 I thought unicode strings "just worked" but maybe that's UTF-8 and the character I want is maybe UTF-16. What are you supposed to do then? UnicodeString knows how to print the full string so all the data is there but I can't index to get characters unless I know their size. Regards, Ryan Joseph ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
> On Mar 7, 2021, at 10:21 AM, Ryan Joseph wrote: > > I thought unicode strings "just worked" but maybe that's UTF-8 and the > character I want is maybe UTF-16. What are you supposed to do then? > UnicodeString knows how to print the full string so all the data is there but > I can't index to get characters unless I know their size. Since this looks like it could be complicated here is what I was actually trying to do with the FreeType library. This works for ASCII but broke down with those unicode chars. I'm confused now because you say the character are more than 2 bytes so I don't know what the actual size of an element is. for glyph in '⌘⌥⌫⇧^' do FT_Load_Char(m_face, ord(glyph), FT_LOAD_RENDER); Regards, Ryan Joseph ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
On 3/7/21 7:21 PM, Ryan Joseph via fpc-pascal wrote: On Mar 7, 2021, at 10:11 AM, Marco van de Voort via fpc-pascal wrote: Yes it is. And there are about 1114000 unicode codepoints, or about 17 times what fits in a 2-byte wide char. https://en.wikipedia.org/wiki/Code_point https://en.wikipedia.org/wiki/UTF-16 I thought unicode strings "just worked" but maybe that's UTF-8 and the character I want is maybe UTF-16. What are you supposed to do then? UnicodeString knows how to print the full string so all the data is there but I can't index to get characters unless I know their size. It depends on what you mean by "just working". UnicodeString is an UTF-16 encoded string and a WideChar is just a UTF-16 code unit. Both UTF-8 and UTF-16 are variable length encodings. UTF-16 is just more simple to decode. Note also that, even though a single Unicode codepoint might need two UTF-16 code units (i.e. WideChars), that is still not enough to represent what users perceive as a character. There are also plenty of Unicode combining characters. What most users perceive as a character is actually called an Extended Grapheme Cluster and is actually a sequence of Unicode code points. There's an algorithm (an enumerator) that splits a string into grapheme clusters, and that's implemented in FPC trunk in the GraphemeBreakProperty unit. It implements this algorithm: http://www.unicode.org/reports/tr29/ This was done by me for the Unicode Free Vision port in the unicodekvm SVN branch, but it was already committed to trunk (the rest of the Unicode Free Vision still isn't), because it's a new unit that is relatively self-contained and provides new functionality (so, won't break existing code) that wasn't provided by the RTL before. Note that normally, most programs wouldn't actually need to split a string into grapheme clusters, unless they implement something like a UI toolkit or a text editor or something of that sort. That's why it was needed for the Unicode Free Vision. Nikolay ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
On Sun, Mar 7, 2021 at 5:31 PM Marco van de Voort via fpc-pascal wrote: > Probably it is not in the BMP and thus needs more position than one. Length(Char) is 5 according to fpc, I see 5 "graphemes", which suggest that all of them fit into 1 WideChar? -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode chars losing information
Op 2021-03-07 om 22:26 schreef Bart via fpc-pascal: On Sun, Mar 7, 2021 at 5:31 PM Marco van de Voort via fpc-pascal wrote: Probably it is not in the BMP and thus needs more position than one. Length(Char) is 5 according to fpc, I see 5 "graphemes" Indeed: .Ld1$strlab: .short 1200,2 .long -1,5 .Ld1: .short 8984,8997,9003,8679,94,0 On win32 a quick test is hard since displaying unicode in the terminal is hard. But a write for "widechar" is called: movl U_$P$PROGRAM_$$_CHARS,%eax movw (%eax),%cx movl %ebx,%edx movl $0,%eax call fpc_write_text_widechar so it should be ok then. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] Cannot write datetime field on sqlite3 database on ARM
Hi, I am developing my app on Windows and building apps for other platforms by using cross compiler. Now I have a problem only occurred on Linux ARM. The problem is that it cannot write datetime field on sqlite3 database. It can read/write other fields like int, varchar or blob, but always write zero in datetime (maybe float as well) field. Does anyone have an idea about this issue? I am not sure it is fpc issue, but better to report bug? My observations are as follows: 1. I work with Lazarus 2.0.12/FPC 3.2.0 release version. 2. Target machine is Raspberry Pi OS on Raspberry Pi 3 Model B V1.2. 3. My app consists of sqlite3conn and sqldb unit. 4. The problem occurred on Linux ARM. It does NOT on Windows i386/x86_64, Linux i386/x86_64 and Linux AArch64. 5. I installed "DB Browser for SQLite" on Raspi as a reference. It can write datetime field normally. My app can read it. Toru ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Cannot write datetime field on sqlite3 database on ARM
On Mon, 8 Mar 2021, Toru Takubo via fpc-pascal wrote: Hi, I am developing my app on Windows and building apps for other platforms by using cross compiler. Now I have a problem only occurred on Linux ARM. The problem is that it cannot write datetime field on sqlite3 database. It can read/write other fields like int, varchar or blob, but always write zero in datetime (maybe float as well) field. Does anyone have an idea about this issue? I am not sure it is fpc issue, but better to report bug? It sounds like a floating point problem. As you probably know, a TDateTime type is actually a double type. Did you try with a float value ? The DB explorer tools probably just use strings to read/write from the database, so they will not be bothere by such things, but FPC stores dataset values in 'native' formats in memory. I don't know what to advise to further investigate the issue, One thing to try would be to test whether normal float arithmetic or date arithmetic works. If not, then the compiler people will need to give more advice. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal