Marko Kreen wrote:
On 9/25/09, to...@tuxteam.de wrote:
On Thu, Sep 24, 2009 at 09:42:32PM +0300, Peter Eisentraut wrote:
> Good idea. This could also check for other invalid things like
> byte-order marks in UTF-8.
But watch out. Microsoft apps do like to insert a BOM at the beginning
On 9/25/09, to...@tuxteam.de wrote:
> On Thu, Sep 24, 2009 at 09:42:32PM +0300, Peter Eisentraut wrote:
> > Good idea. This could also check for other invalid things like
> > byte-order marks in UTF-8.
>
> But watch out. Microsoft apps do like to insert a BOM at the beginning
> of the text. N
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Thu, Sep 24, 2009 at 09:42:32PM +0300, Peter Eisentraut wrote:
> On Wed, 2009-09-23 at 22:46 +0300, Marko Kreen wrote:
[...]
> Good idea. This could also check for other invalid things like
> byte-order marks in UTF-8.
But watch out. Microsoft a
On Wed, 2009-09-23 at 22:46 +0300, Marko Kreen wrote:
> I looked at your code for U& and saw that you allow standalone
> second half of the surrogate pair there, although you error
> out on first half. Was that deliberate?
No.
> Perhaps pg_verifymbstr() should be made to check for such values,
>
On 9/23/09, Peter Eisentraut wrote:
> On Wed, 2009-09-09 at 18:26 +0300, Marko Kreen wrote:
> > Unicode escapes for extended strings.
>
> Committed.
Thank you for handling the patch.
I looked at your code for U& and saw that you allow standalone
second half of the surrogate pair there, although
On Wed, 2009-09-09 at 18:26 +0300, Marko Kreen wrote:
> Unicode escapes for extended strings.
>
> On 4/16/09, Marko Kreen wrote:
> > Reasons:
> >
> > - More people are familiar with \u escaping, as it's standard
> > in Java/C#/Python, probably more..
> > - U& strings will not work when stdstr
On Wed, 2009-09-09 at 18:26 +0300, Marko Kreen wrote:
> Unicode escapes for extended strings.
>
> On 4/16/09, Marko Kreen wrote:
> > Reasons:
> >
> > - More people are familiar with \u escaping, as it's standard
> > in Java/C#/Python, probably more..
> > - U& strings will not work when stdstr
Unicode escapes for extended strings.
On 4/16/09, Marko Kreen wrote:
> Reasons:
>
> - More people are familiar with \u escaping, as it's standard
> in Java/C#/Python, probably more..
> - U& strings will not work when stdstr=off.
>
> Syntax:
>
> \u - 16-bit value
> \U -
Marko Kreen writes:
> On 4/18/09, Tom Lane wrote:
>> The point has come up before, and I kinda thought we *had* changed the
>> lexer to reject \000. I see we haven't though. Curiously, this
>> does fail:
>>
>> regression=# select U&'abc\xyz';
>> ERROR: invalid byte sequence for encoding "
On 4/18/09, Tom Lane wrote:
> Sam Mason writes:
> > On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
> >> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
> >>> Btw, is there any good reason why we don't reject \000, \x00
> >>> in text strings?
> >>
> >> W
Tom Lane wrote:
> The lexer is *not* allowed to invoke any database operations
> (such as pg_conversion lookups)
I certainly hope it's not!
> so it cannot perform arbitrary encoding conversions.
I was more questioning whether we should be looking at character
encodings at all at that point
On 4/18/09, Tom Lane wrote:
> "Kevin Grittner" writes:
> > Andrew Dunstan wrote:
> >> ISTM that one of the uses of this is to say "store the character
> >> that corresponds to this Unicode code point in whatever the database
> >> encoding is"
>
> > I would think you're right. As long as th
Sam Mason writes:
> On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
>> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
>>> Btw, is there any good reason why we don't reject \000, \x00
>>> in text strings?
>>
>> Why forbid nulls in text strings?
> As far as I
"Kevin Grittner" writes:
> Andrew Dunstan wrote:
>> ISTM that one of the uses of this is to say "store the character
>> that corresponds to this Unicode code point in whatever the database
>> encoding is"
> I would think you're right. As long as the given character is in the
> user's character
Marko Kreen wrote:
On 4/17/09, Kevin Grittner wrote:
Andrew Dunstan wrote:
> ISTM that one of the uses of this is to say "store the character
> that corresponds to this Unicode code point in whatever the database
> encoding is"
I would think you're right. As long as the given charact
On 4/17/09, Kevin Grittner wrote:
> Andrew Dunstan wrote:
> > ISTM that one of the uses of this is to say "store the character
> > that corresponds to this Unicode code point in whatever the database
> > encoding is"
>
> I would think you're right. As long as the given character is in the
>
Andrew Dunstan wrote:
> ISTM that one of the uses of this is to say "store the character
> that corresponds to this Unicode code point in whatever the database
> encoding is"
I would think you're right. As long as the given character is in the
user's character set, we should allow it. Presum
Marko Kreen wrote:
+ if (c > 0x7F)
+ {
+ if (GetDatabaseEncoding() != PG_UTF8)
+ yyerror("Unicode escape values cannot be used for code point
values above 007F when the server encoding is not UTF8");
+ saw_high_bit = true;
+ }
On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
> > Btw, is there any good reason why we don't reject \000, \x00
> > in text strings?
>
> Why forbid nulls in text strings?
As far as I know, PG assumes, like mos
On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
> Btw, is there any good reason why we don't reject \000, \x00
> in text strings?
Why forbid nulls in text strings?
Have a nice day,
--
Martijn van Oosterhout http://svana.org/kleptog/
> Please line up in a tree and maintain the h
On 4/16/09, Marko Kreen wrote:
> It's up to UTF8 validator whether to consider non-characters as error.
I checked, and it did not work well, as addunicode() did not set
the saw_high_bit variable.when outputting UTF8. Attached patch fixes it.
Currently is would be NOP as pg_verifymbstr() only ch
On 4/16/09, Sam Mason wrote:
> On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
> > Seems I'm bad at communicating in english,
>
>
> I hope you're not saying this because of my misunderstandings!
>
>
> > so here is C variant of
> > my proposal to bring \u escaping into extended stri
On Thu, Apr 16, 2009 at 03:04:37PM -0400, Andrew Dunstan wrote:
> Sam Mason wrote:
> >Are you sure that this handling of surrogates is correct? The best
> >answer I've managed to find on the Unicode consortium's site is:
> >
> > http://unicode.org/faq/utf_bom.html#utf16-7
> >
> >it says:
> >
> >
Sam Mason wrote:
Are you sure that this handling of surrogates is correct? The best
answer I've managed to find on the Unicode consortium's site is:
http://unicode.org/faq/utf_bom.html#utf16-7
it says:
They are invalid in interchange, but may be freely used internal to an
implementati
On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
> Seems I'm bad at communicating in english,
I hope you're not saying this because of my misunderstandings!
> so here is C variant of
> my proposal to bring \u escaping into extended strings. Reasons:
>
> - More people are familiar wi
Seems I'm bad at communicating in english, so here is C variant of
my proposal to bring \u escaping into extended strings. Reasons:
- More people are familiar with \u escaping, as it's standard
in Java/C#/Python, probably more..
- U& strings will not work when stdstr=off.
Syntax:
\u
26 matches
Mail list logo