[fpc-pascal] json parser line numbers
Hi, the line numbering of the json parser has been changed recently. It used to say "Error at line 1"... when there was an error in the first line, but now it says "Error at line 0"... Was that on purpose, or can someone change it back? Benito ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] json parser line numbers
On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote: Hi, the line numbering of the json parser has been changed recently. It used to say "Error at line 1"... when there was an error in the first line, but now it says "Error at line 0"... Was that on purpose, or can someone change it back? It was not on purpose. Please file a bugreport. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] json parser line numbers
here: https://bugs.freepascal.org/view.php?id=37836 On 29.09.20 10:47, Michael Van Canneyt via fpc-pascal wrote: On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote: Hi, the line numbering of the json parser has been changed recently. It used to say "Error at line 1"... when there was an error in the first line, but now it says "Error at line 0"... Was that on purpose, or can someone change it back? It was not on purpose. Please file a bugreport. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] json numbers with leading dots
Hi, there are also two lines in the json scanner where it tries to repair numbers with leading dots '.123' to '0.123': If (FCurTokenString[1]='.') then FCurTokenString:='0'+FCurTokenString; They should probably be removed. Not only are those numbers invalid in json, it is also very slow to allocate a new string. And StrToFloat works with '.123', so it should not change anything. Although removing them would break programs that cannot handle '.123', they are broken anyways, because it is not adding a zero to '-.123'. Best, Benito ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] json parsing: detecting invalid escape sequences
Hi, I am supposed to find invalid escape sequences when parsing JSON and replace them with a user defined fallback. Invalid in the sense that the unicode codepoint is not defined or a missing surrogate, not syntactically invalid. For example, any occurrence of \u and \uDEAD should be replaced by \u and \udead respectively. Or alternatively with depending on the settings. I think I need to change the JSON scanner to be able to do that. I could add a callback function OnInvalidEscape: function (escapeStart: pchar): string; of object; Or perhaps OnInvalidEscape: function (unicodePoint, previousUnicodePointSurrogate: integer): string; of object; {although that would be troublesome if \uDEAD and \udead are supposed to be replaced with a different fallback} Or OnInvalidEscape: function (const escapedString: string[4]): string; of object; The function would return the unescaped value. Alternatively, the current string could be passed to it as var parameter, and the function would append its unescaped value directly. Or move all unescaping to a callback function, could be called OnUnescape or OnDecodeEscape. So the scanner does not need to decide which escapes are invalid. Then if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function else S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss? could be replaced by one function call. And if the user does not set a callback function, the scanner would set its own callback function depending on the option. Any interest in a patch that adds such a callback function? Or is there another way to do this? Best, Benito ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] json parsing: detecting invalid escape sequences
On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote: Hi, I am supposed to find invalid escape sequences when parsing JSON and replace them with a user defined fallback. Invalid in the sense that the unicode codepoint is not defined or a missing surrogate, not syntactically invalid. For example, any occurrence of \u and \uDEAD should be replaced by \u and \udead respectively. Or alternatively with depending on the settings. I think I need to change the JSON scanner to be able to do that. I could add a callback function OnInvalidEscape: function (escapeStart: pchar): string; of object; Or perhaps OnInvalidEscape: function (unicodePoint, previousUnicodePointSurrogate: integer): string; of object; {although that would be troublesome if \uDEAD and \udead are supposed to be replaced with a different fallback} Or OnInvalidEscape: function (const escapedString: string[4]): string; of object; The function would return the unescaped value. Alternatively, the current string could be passed to it as var parameter, and the function would append its unescaped value directly. Or move all unescaping to a callback function, could be called OnUnescape or OnDecodeEscape. So the scanner does not need to decide which escapes are invalid. Then if (joUTF8 in Options) or (DefaultSystemCodePage=CP_UTF8) then S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster function else S:=String(WideChar(u1)+WideChar(u2)); // WideChar converts the encoding. Should it warn on loss? could be replaced by one function call. And if the user does not set a callback function, the scanner would set its own callback function depending on the option. Such a function existed some iterations back (although not for the same purpose). You will see that this drastically reduces the speed of the scanner because of the extra exception handling frames. I think even the checking of 'valid' escape sequences will already reduce speed significantly. While I am interested in improving the scanner, I am not interested in what is essentially an error-correcting mechanism for faulty JSON. I am strengthened in by opinion by this part of the various RFCs: "However, the ABNF in this specification allows member names and string values to contain bit sequences that cannot encode Unicode characters;" So I see little point in trying to correct that. Michael.___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal