date:20090416

Re: [HACKERS] New trigger option of pg_standby

2009-04-16 Thread Fujii Masao

Hi Simon,

Thanks for the comments!

On Thu, Apr 16, 2009 at 2:56 AM, Simon Riggs  wrote:
>
> On Wed, 2009-04-15 at 17:02 +0900, Fujii Masao wrote:
>
>> On Tue, Apr 14, 2009 at 2:41 PM, Fujii Masao  wrote:
>> > I'd like to propose another simple idea; pg_standby deletes the
>> > trigger file *whenever* the nextWALfile is a timeline history file.
>> > A timeline history file is restored at the end of recovery, so it's
>> > guaranteed that the trigger file is deleted whether nextWALfile
>> > exists or not.
>> >
>> > A timeline history file is restored also at the beginning of
>> > recovery, so the accidentally remaining trigger file is deleted
>> > in early warm-standby as a side-effect of this idea.
>>
>> Here is the revised patch as above.
>>
>> If you notice something, please feel free to comment.
>
> Deleting the trigger file when we request a history file works in most
> cases, but not in all. We also request a history file when we switch
> timelines, so code comments need slight modification.
>
> If take a base backup, switchover and then try to regen the primary from
> the base backup we would need to switch timelines, which could be
> problematic. That is unlikely, so we should at least very clearly
> document the actual behaviour, as we do in the code comments.

"switch timelines" means that a new timeline ID is assigned at
the end of archive recovery? If so, even in this case, there is
no problem with deleting the trigger file, I think.
Or, am I misunderstanding?

> I think your wording that smart mode guarantees no data will be lost is
> a little strong. I'd say "on successful completion all WAL records will
> be replayed resulting in zero data loss".

Sounds good. I'll change the wording.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres SQL specification (tests)

2009-04-16 Thread mito


Hi,

i have built an SQL interface using rule system which supports row 
versioning and i would like to test it against Postgres SQL specification.


Is there something like test cases for postgres SQL interface?

Or do you have any ideas how to build a group of all possible cases of 
table structure and generate appropriate queries, which can be tested 
against postgres?



thanks

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Sam Mason

On Wed, Apr 15, 2009 at 11:19:42PM +0300, Marko Kreen wrote:
> On 4/15/09, Tom Lane  wrote:
> > Given Martijn's complaint about more-than-16-bit code points, I think
> >  the \u proposal is not mature enough to go into 8.4.  We can think
> >  about some version of that later, if there's enough interest.
> 
> I think it would be good idea. Basically we should pick one from
> couple of pre-existing sane schemes.  Here is quick summary
> of Python, Perl and Java:
> 
> Python [1]:
> 
>   \u - 16-bit codepoint
>   \U - 32-bit codepoint
>   \N{char-name}  - Characted by name

Microsoft have also gone this way in C#, named code points are not
supported however.

> Perl [2]:
> 
>   \x{..} - {} contains hexadecimal codepoint
>   \N{char-name}  - Unicode char name

Looks OK, but the 'x' seems somewhat redundant.  Why not just:

  \{}

This would be following the BitC[2] project, especially if it was more
like:

  \{U+}

e.g.

  \{U+03BB}

would be the lowercase lambda character.  Added appeal is in the fact
that this (i.e. U+03BB) is how the Unicode consortium spells code
points.

> Java [3]:
> 
>   \u - 16-bit codepoint

AFAIK, Java isn't the best reference to choose; it assumed from an early
point in its design that Unicode characters were at most 16bits and
hence had to switch its internal representation to UTF-16.  I don't
program much Java these days to know how it's all worked out, but it
would be interesting to hear from people who regularly have to deal with
characters outside the BMP (i.e. code points greater than 65535).

-- 
  Sam  http://samason.me.uk/

 [1] http://msdn.microsoft.com/en-us/library/aa664669(VS.71).aspx
 [2] http://www.bitc-lang.org/docs/bitc/spec.html#stringlit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Robert Haas

2009/4/16 Simon Riggs :
> On Wed, 2009-04-15 at 20:58 -0400, Robert Haas wrote:
>> On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane  wrote:
>> > The output of EXPLAIN is nowhere near stable enough to use within the
>> > current exact-match regression test framework.  I'm not sure it would
>> > be stable even if we suppressed the rowcount and cost figures.  Those
>> > figures vary across platforms (because of alignment effects and probably
>> > other things) and are also sensitive to the timing of autovacuums.  It
>> > is known that a nontrivial fraction of the existing regression test
>> > cases do suffer from uninteresting plan changes across platforms or
>> > as a result of various phase-of-the-moon effects; that's why we keep
>> > having to add "ORDER BY" clauses now and then.
>>
>> Interesting.  I suppose you could insulate yourself from this somewhat
>> by populating pg_statistic with a particular set of values rather than
>> relying on ANALYZE to gather them, but this would have the substantial
>> downside of being way more work to maintain, especially if anyone ever
>> changed pg_statistic.
>>
>> On a more practical level, I do think we need to give real
>> consideration to some kind of options syntax for EXPLAIN, maybe
>> something as simple as:
>>
>> EXPLAIN (option_name, ...) query
>>
>> Or maybe:
>>
>> EXPLAIN (option_name = value, ...) query
>>
>> It may or may not be the case that generating a useful regression test
>> suite for the planner is too much work for anyone to bother, but they
>> certainly won't if the tools aren't available.  It seems we get at
>> least one request a month for some kind of explain-output option:
>> suppress row counts, suppress costs, gather I/O statistics, show
>> outputs, show # of batches for a hash join, and on and on and on.  I
>> think we should implement a very basic version that maybe does nothing
>> more than let you optionally suppress some of the existing output, but
>> which provides an extensible syntax for others to build on.
>
> I think the way to do this is to introduce plan output in XML (that
> matches the node structure of the plan). We can then filter away any
> junk we don't want to see for regression tests, or better still augment
> the exact-match framework with a fuzzy-match spec that allows us to
> specify a range of values.

I think XML explain output is a good idea, but I don't think it's a
substitute for better options to control the human-readable form.  But
the nice thing is that with an extensible syntax, this is not an
either/or proposition.

> The skill would be in constructing a set of tests that was not sensitive
> to minor changes. The OP's join for example had a huge cost range
> difference that would have clearly shown up in a regression test.
>
> This will only move forward if it adds value directly for Tom, so if
> it's worth doing then he needs to specify it and ask for someone to do
> it. There will be someone available if the task is well defined.

I'm not sure if by this you mean the EXPLAIN changes or the regression
tests, but either way I think you're half right: it's probably not
necessary for Tom to provide the spec, but it would sure be nice if he
could at least indicate his lack of objection to accepting a
well-designed patch in one of these areas - because no one is going to
want to go to the trouble of doing either of these things and then
have Tom say "well, I never liked that idea anyway".

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Marko Kreen

On 4/16/09, Sam Mason  wrote:
> On Wed, Apr 15, 2009 at 11:19:42PM +0300, Marko Kreen wrote:
>  > On 4/15/09, Tom Lane  wrote:
>
> > > Given Martijn's complaint about more-than-16-bit code points, I think
>  > >  the \u proposal is not mature enough to go into 8.4.  We can think
>  > >  about some version of that later, if there's enough interest.
>  >
>  > I think it would be good idea. Basically we should pick one from
>  > couple of pre-existing sane schemes.  Here is quick summary
>  > of Python, Perl and Java:
>  >
>  > Python [1]:
>  >
>  >   \u - 16-bit codepoint
>  >   \U - 32-bit codepoint
>  >   \N{char-name}  - Characted by name
>
>
> Microsoft have also gone this way in C#, named code points are not
>  supported however.

And it handles also non-BMP codepoints with \u escape similarly:

  http://en.csharp-online.net/ECMA-334:_9.4.1_Unicode_escape_sequences

This makes it even more standard.

>  > Perl [2]:
>  >
>  >   \x{..} - {} contains hexadecimal codepoint
>  >   \N{char-name}  - Unicode char name
>
>
> Looks OK, but the 'x' seems somewhat redundant.  Why not just:
>
>   \{}
>
>  This would be following the BitC[2] project, especially if it was more
>  like:
>
>   \{U+}
>
>  e.g.
>
>   \{U+03BB}
>
>  would be the lowercase lambda character.  Added appeal is in the fact
>  that this (i.e. U+03BB) is how the Unicode consortium spells code
>  points.

We already got yet-another-unique-way-of-escaping-unicode with U&.

Now let's try to support some actual standard also.

>  > Java [3]:
>  >
>  >   \u - 16-bit codepoint
>
>
> AFAIK, Java isn't the best reference to choose; it assumed from an early
>  point in its design that Unicode characters were at most 16bits and
>  hence had to switch its internal representation to UTF-16.  I don't
>  program much Java these days to know how it's all worked out, but it
>  would be interesting to hear from people who regularly have to deal with
>  characters outside the BMP (i.e. code points greater than 65535).

You did not read my mail carefully enough - the Java and also Python/C#
already support non-BMP chars with '\u' and exactly the same (utf16) way.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Andrew Dunstan




Tatsuo Ishii wrote:
I could live with either. Wikipedia says: "The characters outside the 
first plane usually have very specialized or rare use." For years we 
rejected all characters beyond the first plane, and while that's fixed 
now, the volume of complaints wasn't huge.
  

I you mean "first plane" as BMP (i.e. 16bit range), above is not true
for PostgreSQL 7.3 or later at least.



Oops. I meant 8.2 or later.

  


Umm, that's what I said. We used to do it like that, but we don't any more.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres SQL specification (tests)

2009-04-16 Thread Kevin Field

On Apr 16, 3:14 am, mito  wrote:
> Hi,
>
> i have built an SQL interface using rule system which supports row
> versioning and i would like to test it against Postgres SQL specification.
>
> Is there something like test cases for postgres SQL interface?
>
> Or do you have any ideas how to build a group of all possible cases of
> table structure and generate appropriate queries, which can be tested
> against postgres?
>
> thanks

For unit testing, I use pgTap, it's pretty handy for scripting your
tests.

What do you mean by all possible cases of table structure?  That would
be infinite...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOTIFY / LISTEN silently parses and discards schema-ish portion of notification name ...

2009-04-16 Thread James Robinson



On Apr 16, 2009, at 6:51 AM, pgsql-hackers-ow...@postgresql.org wrote:



Considered unexpected behavior, or at least in its undocumented form.
If value given to NOTIFY seems schema-qualified, the schema
qualification is eroded by the time it is presented to the listener  
--


See:

http://archives.postgresql.org//pgsql-patches/2007-02/msg00437.php

Pretty sure 8.4 will not use "relation".


Thanks for pointer on a proposed patch, Greg. That patch looks like it  
hasn't been applied to the 8.2 maintenance stream.


Looks like in 8.3 the grammar changed the name argument to ColId  
production, which goes unparsed. Shame on me for using such an old  
version [ 8.2 ], but hey, it works.





James Robinson
Socialserve.com

Re: [HACKERS] Yet another regression issue with Fedora-10 + PG 8.4 beta1

2009-04-16 Thread Devrim GÜNDÜZ

On Tue, 2009-04-14 at 18:32 -0400, Tom Lane wrote:
> Any ideas what might have caused it?
> 
> QEMU screwing up the handling of floating-point traps, perhaps?

Probably. Let me dig it.

Thanks.
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Sam Mason

On Thu, Apr 16, 2009 at 02:47:20PM +0300, Marko Kreen wrote:
> On 4/16/09, Sam Mason  wrote:
> > Microsoft have also gone this way in C#, named code points are not
> > supported however.
> 
> And it handles also non-BMP codepoints with \u escape similarly:
> 
>   http://en.csharp-online.net/ECMA-334:_9.4.1_Unicode_escape_sequences
> 
> This makes it even more standard.

I fail to see what you're pointing out here; as far as I understand it,
\u is for BMP code points and \U extends the range out to 32bit code
points.  I can't see anything about non-BMP and \u in the above link,
you appear free to write your own surrogate pairs but that seems like an
independent issue.

I'd not realised before that C# is specified to use UTF-16 as its
internal encoding.

> >  This would be following the BitC[2] project, especially if it was more
> >  like:
> >
> >   \{U+}
> 
> We already got yet-another-unique-way-of-escaping-unicode with U&.
> 
> Now let's try to support some actual standard also.

That comes across *very* negatively; I hope it's just a language issue.

I read your parent post as soliciting opinions on possible ways to
encode Unicode characters in PG's literals.  The U&'lit' was criticised,
you posted some suggestions, I followed up with what I hoped to be a
useful addition.  It seems useful here to separate "de jure" from "de
facto" standards; implementing U&'lit' would be following the de jure
standard, anything else would be de facto.

A survey of existing SQL implementations would seem to be more appropriate
as well:

Oracle: UNISTR(string-literal) and \

  It looks as though Oracle originally used UCS-2 internally (i.e. BMP
  only) but more recently Unicode support has been improved to allow
  other planes.

MS-SQL Server: 

  can't find anything remotely useful; best seems to be to use
  NCHAR(integer-expression) which looks somewhat unmaintainable.

DB2: U&string-literal and \xx

  i.e. it follows the SQL-2003 spec

FireBird:

  can't find much either; support looks somewhat low on the ground

MySQL:

  same again, seems to assume query is encoded in UTF-8

Summary seems to be that either I'm bad at searching or support for
Unicode doesn't seem very complete in the database world and people work
around it somehow.

> You did not read my mail carefully enough - the Java and also Python/C#
> already support non-BMP chars with '\u' and exactly the same (utf16) way.

Again, I think this may be a language issue; if not then more verbose
explanations help, maybe something like "sorry, I obviously didn't
explain that very well".  You will of course felt you explained it
perfectly well, but everybody enters a discussion with different
intuitions and biases, email has a nasty habit of accentuating these
differences and compounding them with language problems.

I'd never heard of UTF-16 surrogate pairs before this discussion and
hence didn't realise that it's valid to have a surrogate pair in place
of a single code point.  The docs say that  corresponds to
U+10302, Python would appear to follow my intuitions in that:

  ord(u'\uD800\uDF02')

results in an error instead of giving back 66306, as I'd expect.  Is
this a bug in Python, my understanding, or something else?

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Tom Lane

Sam Mason  writes:
> I'd never heard of UTF-16 surrogate pairs before this discussion and
> hence didn't realise that it's valid to have a surrogate pair in place
> of a single code point.  The docs say that  corresponds to
> U+10302, Python would appear to follow my intuitions in that:

>   ord(u'\uD800\uDF02')

> results in an error instead of giving back 66306, as I'd expect.  Is
> this a bug in Python, my understanding, or something else?

I might be wrong, but I think surrogate pairs are expressly forbidden in
all representations other than UTF16/UCS2.  We definitely forbid them
when validating UTF-8 strings --- that's per an RFC recommendation.
It sounds like Python is doing the same.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread David Fetter

On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:
> > 
> > EXPLAIN (option_name, ...) query
> > 
> > Or maybe:
> > 
> > EXPLAIN (option_name = value, ...) query
> > 
> > It may or may not be the case that generating a useful regression
> > test suite for the planner is too much work for anyone to bother,
> > but they certainly won't if the tools aren't available.  It seems
> > we get at least one request a month for some kind of
> > explain-output option: suppress row counts, suppress costs, gather
> > I/O statistics, show outputs, show # of batches for a hash join,
> > and on and on and on.  I think we should implement a very basic
> > version that maybe does nothing more than let you optionally
> > suppress some of the existing output, but which provides an
> > extensible syntax for others to build on.
> 
> I think the way to do this is to introduce plan output in XML

If we're going with a serialization, which I think would be an
excellent idea, how about one that's light-weight and human-readable
like JSON?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Sam Mason

On Thu, Apr 16, 2009 at 10:54:16AM -0400, Tom Lane wrote:
> Sam Mason  writes:
> > I'd never heard of UTF-16 surrogate pairs before this discussion and
> > hence didn't realise that it's valid to have a surrogate pair in place
> > of a single code point.  The docs say that  corresponds to
> > U+10302, Python would appear to follow my intuitions in that:
> 
> >   ord(u'\uD800\uDF02')
> 
> > results in an error instead of giving back 66306, as I'd expect.  Is
> > this a bug in Python, my understanding, or something else?
> 
> I might be wrong, but I think surrogate pairs are expressly forbidden in
> all representations other than UTF16/UCS2.  We definitely forbid them
> when validating UTF-8 strings --- that's per an RFC recommendation.
> It sounds like Python is doing the same.

OK, that's good.  I thought I was missing something.  A minor point is
that in UCS2 each 16bit value is exactly one character and characters
outside the BMP aren't supported, hence the need for UTF-16.

I've failed to keep up with the discussion so I'm not sure where this
conversation has got to!  Is the consensus for 8.4 to enable SQL2003
style U&lit escaped literals if and only if standard_conforming_strings
is set?  This seems easiest for client code as it can use this
exclusively for knowing what to do with backslashes.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Marko Kreen

On 4/16/09, Sam Mason  wrote:
> On Thu, Apr 16, 2009 at 02:47:20PM +0300, Marko Kreen wrote:
>  > On 4/16/09, Sam Mason  wrote:
> > > Microsoft have also gone this way in C#, named code points are not
>  > > supported however.
>  >
>  > And it handles also non-BMP codepoints with \u escape similarly:
>  >
>  >   http://en.csharp-online.net/ECMA-334:_9.4.1_Unicode_escape_sequences
>  >
>  > This makes it even more standard.
>
>
> I fail to see what you're pointing out here; as far as I understand it,
>  \u is for BMP code points and \U extends the range out to 32bit code
>  points.  I can't see anything about non-BMP and \u in the above link,
>  you appear free to write your own surrogate pairs but that seems like an
>  independent issue.

Ok, maybe I glanced too quickly over that page.

I can't find definite deference only hint on several pages:

  \U \U Unicode escape sequence for surrogate pairs.

Which hints that you can aswell enter the pairs directly: \uxx\uxx.
If I'd be language designer, I would not see any reason to disallow it.

And anyway, at least mono seems to support it:

using System;
public class HelloWorld {
public static void Main() {
Console.WriteLine("<\uD800\uDF02>\n");
}
}

It will output single UTF8 character.  I think this should settle it.

>  I'd not realised before that C# is specified to use UTF-16 as its
>  internal encoding.
>
>  > >  This would be following the BitC[2] project, especially if it was more
>  > >  like:
>  > >
>  > >   \{U+}
>  >
>
> > We already got yet-another-unique-way-of-escaping-unicode with U&.
>  >
>  > Now let's try to support some actual standard also.
>
>
> That comes across *very* negatively; I hope it's just a language issue.
>
>  I read your parent post as soliciting opinions on possible ways to
>  encode Unicode characters in PG's literals.  The U&'lit' was criticised,
>  you posted some suggestions, I followed up with what I hoped to be a
>  useful addition.  It seems useful here to separate "de jure" from "de
>  facto" standards; implementing U&'lit' would be following the de jure
>  standard, anything else would be de facto.
>
>  A survey of existing SQL implementations would seem to be more appropriate
>  as well:
>
>  Oracle: UNISTR(string-literal) and \
>
>   It looks as though Oracle originally used UCS-2 internally (i.e. BMP
>   only) but more recently Unicode support has been improved to allow
>   other planes.
>
>  MS-SQL Server:
>
>   can't find anything remotely useful; best seems to be to use
>   NCHAR(integer-expression) which looks somewhat unmaintainable.
>
>  DB2: U&string-literal and \xx
>
>   i.e. it follows the SQL-2003 spec
>
>  FireBird:
>
>   can't find much either; support looks somewhat low on the ground
>
>  MySQL:
>
>   same again, seems to assume query is encoded in UTF-8
>
>  Summary seems to be that either I'm bad at searching or support for
>  Unicode doesn't seem very complete in the database world and people work
>  around it somehow.

The de-facto about Postgres is stdstr=off.  Even if not, E'' strings
are still better for various things, so it would be good if they also
aquired unicode-capabilities.

>  > You did not read my mail carefully enough - the Java and also Python/C#
>  > already support non-BMP chars with '\u' and exactly the same (utf16) way.
>
>
> Again, I think this may be a language issue; if not then more verbose
>  explanations help, maybe something like "sorry, I obviously didn't
>  explain that very well".  You will of course felt you explained it
>  perfectly well, but everybody enters a discussion with different
>  intuitions and biases, email has a nasty habit of accentuating these
>  differences and compounding them with language problems.
>
>  I'd never heard of UTF-16 surrogate pairs before this discussion and
>  hence didn't realise that it's valid to have a surrogate pair in place
>  of a single code point.  The docs say that  corresponds to
>  U+10302, Python would appear to follow my intuitions in that:
>
>   ord(u'\uD800\uDF02')
>
>  results in an error instead of giving back 66306, as I'd expect.  Is
>  this a bug in Python, my understanding, or something else?

Python's internal representation is *not* UTF-16, but plain UCS2/UCS4,
that is - plain 16 or 32-bit values.  Seems your python is compiled with
UCS2, not UCS4.  As I understand, in UCS2 mode it simply takes surrogate
values as-is.  From ord() docs:

  If a unicode argument is given and Python was built with UCS2 Unicode,
  then the character’s code point must be in the range [0..65535]
  inclusive; otherwise the string length is two, and a TypeError will
  be raised.

So only in UCS4 mode it detects surrogates and converts them to internal
representation.  (Which in Postgres case would be UTF8.)

Or perhaps it is partially UTF16 aware - eg. I/O routines do unterstand
UTF16 but low-level string routines do not:

  print "<%s>" % u'\uD800\uDF02'

seems to handle it properly.

-- 
marko

-- 
Se

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Andrew Dunstan




Tom Lane wrote:

Sam Mason  writes:
  

I'd never heard of UTF-16 surrogate pairs before this discussion and
hence didn't realise that it's valid to have a surrogate pair in place
of a single code point.  The docs say that  corresponds to
U+10302, Python would appear to follow my intuitions in that:



  

  ord(u'\uD800\uDF02')



  

results in an error instead of giving back 66306, as I'd expect.  Is
this a bug in Python, my understanding, or something else?



I might be wrong, but I think surrogate pairs are expressly forbidden in
all representations other than UTF16/UCS2.  We definitely forbid them
when validating UTF-8 strings --- that's per an RFC recommendation.
It sounds like Python is doing the same.


  


You mustn't encode the surrogate, but it's up to us how we allow people 
to designate a given code point.


Frankly, I think we shouldn't provide for using surrogates at all. I 
would prefer something like \u for BMP items and \U as the 
straight 32bit designation of a higher codepoint.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Robert Haas

On Thu, Apr 16, 2009 at 11:21 AM, David Fetter  wrote:
> If we're going with a serialization, which I think would be an
> excellent idea, how about one that's light-weight and human-readable
> like JSON?

Wow, that's a great idea for another option to EXPLAIN.  Wouldn't it
be nice if EXPLAIN supported an options syntax?!!!

:-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres SQL specification (tests)

2009-04-16 Thread mito


By table structure i mean table definition options.

Kevin Field wrote:

On Apr 16, 3:14 am, mito  wrote:

Hi,

i have built an SQL interface using rule system which supports row
versioning and i would like to test it against Postgres SQL specification.

Is there something like test cases for postgres SQL interface?

Or do you have any ideas how to build a group of all possible cases of
table structure and generate appropriate queries, which can be tested
against postgres?

thanks


For unit testing, I use pgTap, it's pretty handy for scripting your
tests.

What do you mean by all possible cases of table structure?  That would
be infinite...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Marko Kreen

On 4/16/09, Tom Lane  wrote:
> Sam Mason  writes:
>  > I'd never heard of UTF-16 surrogate pairs before this discussion and
>  > hence didn't realise that it's valid to have a surrogate pair in place
>  > of a single code point.  The docs say that  corresponds to
>  > U+10302, Python would appear to follow my intuitions in that:
>
>  >   ord(u'\uD800\uDF02')
>
>  > results in an error instead of giving back 66306, as I'd expect.  Is
>  > this a bug in Python, my understanding, or something else?
>
>
> I might be wrong, but I think surrogate pairs are expressly forbidden in
>  all representations other than UTF16/UCS2.  We definitely forbid them
>  when validating UTF-8 strings --- that's per an RFC recommendation.
>  It sounds like Python is doing the same.

The point here is that Python/Java/C# allow them for escaping non-BMP
unicode values, irrespective of their interal encoding.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Tom Lane

Robert Haas  writes:
> I think XML explain output is a good idea, but I don't think it's a
> substitute for better options to control the human-readable form.

Yeah.  I think a well-designed XML output format for EXPLAIN is a fine
thing to work on, but I don't believe it would make the "create a
planner test suite" problem noticeably easier.

I see the purpose of an XML format as being to allow tools like
Red Hat's old Visual Explain (now maintained by EDB IIRC) to parse
EXPLAIN's output with somewhat better odds of not breaking from
one release to the next.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Sam Mason

On Thu, Apr 16, 2009 at 06:34:06PM +0300, Marko Kreen wrote:
> Which hints that you can aswell enter the pairs directly: \uxx\uxx.
> If I'd be language designer, I would not see any reason to disallow it.
> 
> And anyway, at least mono seems to support it:
> 
> using System;
> public class HelloWorld {
> public static void Main() {
> Console.WriteLine("<\uD800\uDF02>\n");
> }
> }
> 
> It will output single UTF8 character.  I think this should settle it.

I don't have any .net stuff installed so can't test; but C# is defined
to use UTF-16 as its internal representation so it would make sense if
the above gets treated as a single character internally.  However, if it
used any other encoding the above should be treated as an error.

> The de-facto about Postgres is stdstr=off.  Even if not, E'' strings
> are still better for various things, so it would be good if they also
> aquired unicode-capabilities.

OK, this seems independent of the U&'lit' discussion that started the
thread.  Note that PG already supports UTF8; if you want the character
I've been using in my examples up-thread, you can do:

  SELECT E'\xF0\x90\x8C\x82';

I have a feeling that this is predicated on the server_encoding being
set to "utf8" and this can only be done at database creation time.
Another alternative would be to use the convert_from function, i.e:

  SELECT convert_from(E'\xF0\x90\x8C\x82', 'UTF8');

Never had to do this though, so there may be better options available.

> Python's internal representation is *not* UTF-16, but plain UCS2/UCS4,
> that is - plain 16 or 32-bit values.  Seems your python is compiled with
> UCS2, not UCS4.

Cool, I didn't know that.  I believe mine is UCS4 as I can do:

  ord(u'\U00010302')

and I get 66306 back rather than an error.

> As I understand, in UCS2 mode it simply takes surrogate
> values as-is.

UCS2 doesn't have surrogate pairs, or at least I believe it's considered
a bug if you don't get an error when you present it with one.

> From ord() docs:
> 
>   If a unicode argument is given and Python was built with UCS2 Unicode,
>   then the character’s code point must be in the range [0..65535]
>   inclusive; otherwise the string length is two, and a TypeError will
>   be raised.
> 
> So only in UCS4 mode it detects surrogates and converts them to internal
> representation.  (Which in Postgres case would be UTF8.)

I think you mean UTF-16 instead of UCS4; but otherwise, yes.

> Or perhaps it is partially UTF16 aware - eg. I/O routines do unterstand
> UTF16 but low-level string routines do not:
> 
>   print "<%s>" % u'\uD800\uDF02'
> 
> seems to handle it properly.

Yes, I get this as well.  It's all a bit weird, which is why I was
asking if "this a bug in Python, my understanding, or something else".

When I do:

  python

Re: [HACKERS] Unicode string literals versus the world

2009-04-16 Thread Tom Lane

Sam Mason  writes:
> I've failed to keep up with the discussion so I'm not sure where this
> conversation has got to!  Is the consensus for 8.4 to enable SQL2003
> style U&lit escaped literals if and only if standard_conforming_strings
> is set?

That was Peter's proposal, and no one's shot a hole in it yet ...

I think the discussion about whether/how to add a Unicode extension to
E''-style literals is 8.5 material.  We are in beta so now is not
the time to add new features, especially ones that weren't even on the
TODO list before.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [rfc] unicode escapes for extended strings

2009-04-16 Thread Marko Kreen

Seems I'm bad at communicating in english, so here is C variant of
my proposal to bring \u escaping into extended strings.  Reasons:

- More people are familiar with \u escaping, as it's standard
  in Java/C#/Python, probably more..
- U& strings will not work when stdstr=off.

Syntax:

  \u  - 16-bit value
  \U  - 32-bit value

Additionally, both \u and \U can be used to specify UTF-16 surrogate
pairs to encode characters with value > 0x.  This is exact behaviour
used by Java/C#/Python.  (except that Java does not have \U)


I'm ok with this patch left to 8.5.

-- 
marko
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index a559d75..fdb0cc5 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -394,6 +394,14 @@ SELECT 'foo'  'bar';
 
 hexadecimal byte value

+   
+
+ \u,
+ \U
+ (x = 0 - 9, A - F)
+
+16 or 32-bit hexadecimal Unicode character value.
+   
   
   
  
@@ -407,6 +415,14 @@ SELECT 'foo'  'bar';
 
 
 
+	 The Unicode escape syntax works fully only when the server encoding is UTF8.
+	 When other server encodings are used, only code points in the ASCII range
+	 (up to \u007F) can be specified.  Both \u and \U
+	 can also be used to specify UTF-16 surrogate pair to escape characters
+	 with value larger than \u.
+	
+
+
  It is your responsibility that the byte sequences you create are
  valid characters in the server character set encoding.  When the
  server encoding is UTF-8, then the alternative Unicode escape
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index a070e85..c0695f1 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -98,6 +98,11 @@ static char *scanbuf;
 
 static unsigned char unescape_single_char(unsigned char c);
 
+/* first part of unicode surrogate */
+static unsigned long xeu_surrogate1;
+
+static void addunicode(pg_wchar c);
+
 %}
 
 %option 8bit
@@ -128,6 +133,7 @@ static unsigned char unescape_single_char(unsigned char c);
  *   $foo$ quoted strings
  *   quoted identifier with Unicode escapes
  *   quoted string with Unicode escapes
+ *   Unicode surrogate escape in extended string
  */
 
 %x xb
@@ -139,6 +145,7 @@ static unsigned char unescape_single_char(unsigned char c);
 %x xdolq
 %x xui
 %x xus
+%x xeu
 
 /*
  * In order to make the world safe for Windows and Mac clients as well as
@@ -217,6 +224,7 @@ xeinside		[^\\']+
 xeescape		[\\][^0-7]
 xeoctesc		[\\][0-7]{1,3}
 xehexesc		[\\]x[0-9A-Fa-f]{1,2}
+xeunicode		[\\](u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})
 
 /* Extended quote
  * xqdouble implements embedded quote, 
@@ -506,6 +514,37 @@ other			.
 {xeinside}  {
 	addlit(yytext, yyleng);
 }
+{xeunicode} {
+	pg_wchar c = strtoul(yytext+2, NULL, 16);
+
+	check_escape_warning();
+
+	if (c >= 0xD800 && c < 0xDC00)
+	{
+		xeu_surrogate1 = c;
+		BEGIN(xeu);
+	}
+	else if (c >= 0xDC00 && c < 0xE000)
+		yyerror("invalid Unicode escape value");
+
+	addunicode(c);
+}
+{xeunicode} {
+	pg_wchar c = strtoul(yytext+2, NULL, 16);
+
+	if (c < 0xDC00 || c >= 0xE000)
+		yyerror("invalid Unicode surrogate pair");
+
+	c = ((xeu_surrogate1 & 0x3FF) << 10) | (c & 0x3FF);
+
+	addunicode(c + 0x1);
+
+	BEGIN(xe);
+}
+.			{
+	yyerror("invalid Unicode surrogate pair");
+}
+
 {xeescape}  {
 	if (yytext[1] == '\'')
 	{
@@ -1153,3 +1192,18 @@ check_escape_warning(void)
  lexer_errposition()));
 	warn_on_first_escape = false;	/* warn only once per string */
 }
+
+static void
+addunicode(pg_wchar c)
+{
+	char buf[8];
+
+	if (c == 0 || c > 0x10)
+		yyerror("invalid Unicode escape value");
+	if (c > 0x7F && GetDatabaseEncoding() != PG_UTF8)
+		yyerror("Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8");
+
+	unicode_to_utf8(c, (unsigned char *)buf);
+	addlit(buf, pg_mblen(buf));
+}
+

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Grzegorz Jaskiewicz



On 16 Apr 2009, at 16:21, David Fetter wrote:


On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:


I think the way to do this is to introduce plan output in XML


If we're going with a serialization, which I think would be an
excellent idea, how about one that's light-weight and human-readable
like JSON?

+1

xml/json is machine readable.
I don't think, personaly that explain (analyze) is not easy to read by  
human, quite contrary.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Merlin Moncure

On Thu, Apr 16, 2009 at 2:04 PM, Grzegorz Jaskiewicz
 wrote:
>
> On 16 Apr 2009, at 16:21, David Fetter wrote:
>
>> On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:
>>>
>>> I think the way to do this is to introduce plan output in XML
>>
>> If we're going with a serialization, which I think would be an
>> excellent idea, how about one that's light-weight and human-readable
>> like JSON?
>
> +1
>
> xml/json is machine readable.
> I don't think, personaly that explain (analyze) is not easy to read by
> human, quite contrary.

Is that because of how the output is formatted though, or because the
concepts are difficult to express? (I agree though, json is better
especially for structures that are possibly highly nested).

merlni

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] unicode escapes for extended strings

2009-04-16 Thread Sam Mason

On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
> Seems I'm bad at communicating in english,

I hope you're not saying this because of my misunderstandings!

> so here is C variant of
> my proposal to bring \u escaping into extended strings.  Reasons:
> 
> - More people are familiar with \u escaping, as it's standard
>   in Java/C#/Python, probably more..
> - U& strings will not work when stdstr=off.
> 
> Syntax:
> 
>   \u  - 16-bit value
>   \U  - 32-bit value
> 
> Additionally, both \u and \U can be used to specify UTF-16 surrogate
> pairs to encode characters with value > 0x.  This is exact behaviour
> used by Java/C#/Python.  (except that Java does not have \U)

Are you sure that this handling of surrogates is correct?  The best
answer I've managed to find on the Unicode consortium's site is:

  http://unicode.org/faq/utf_bom.html#utf16-7

it says:

  They are invalid in interchange, but may be freely used internal to an
  implementation.

I think this means they consider the handling of them you noted above,
in other languages, to be an error.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance of full outer join in 8.3

2009-04-16 Thread Grzegorz Jaskiewicz



On 16 Apr 2009, at 19:41, Merlin Moncure wrote:


Is that because of how the output is formatted though, or because the
concepts are difficult to express? (I agree though, json is better
especially for structures that are possibly highly nested).
What I mean is that what postgresql displays currently as  
explain(analyze[verbose]) is clear and understandable.
Also, it is getting better and better from version to version. So I  
don't personally agree, that it is unreadable - and I am up for (and I  
am sure many users like me are) JSON, or XML output.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] unicode escapes for extended strings

2009-04-16 Thread Andrew Dunstan




Sam Mason wrote:

Are you sure that this handling of surrogates is correct?  The best
answer I've managed to find on the Unicode consortium's site is:

  http://unicode.org/faq/utf_bom.html#utf16-7

it says:

  They are invalid in interchange, but may be freely used internal to an
  implementation.
  




It says that about non-characters, not about the use of surrogate pairs, 
unless I am misreading it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] unicode escapes for extended strings

2009-04-16 Thread Sam Mason

On Thu, Apr 16, 2009 at 03:04:37PM -0400, Andrew Dunstan wrote:
> Sam Mason wrote:
> >Are you sure that this handling of surrogates is correct?  The best
> >answer I've managed to find on the Unicode consortium's site is:
> >
> >  http://unicode.org/faq/utf_bom.html#utf16-7
> >
> >it says:
> >
> >  They are invalid in interchange, but may be freely used internal to an
> >  implementation.
> 
> It says that about non-characters, not about the use of surrogate pairs, 
> unless I am misreading it.

No, I think you're probably right and I was misreading it.  I went
back and forth several times to explicitly check I was interpreting
this correctly and still failed to get it right.  Not sure what I was
thinking and sorry for the hassle Marko!

I've already asked on the Unicode list about this (no response yet), but
I have a feeling I'm getting worked up over nothing.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] unicode escapes for extended strings

2009-04-16 Thread Marko Kreen

On 4/16/09, Sam Mason  wrote:
> On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
>  > Seems I'm bad at communicating in english,
>
>
> I hope you're not saying this because of my misunderstandings!
>
>
>  > so here is C variant of
>  > my proposal to bring \u escaping into extended strings.  Reasons:
>  >
>  > - More people are familiar with \u escaping, as it's standard
>  >   in Java/C#/Python, probably more..
>  > - U& strings will not work when stdstr=off.
>  >
>  > Syntax:
>  >
>  >   \u  - 16-bit value
>  >   \U  - 32-bit value
>  >
>  > Additionally, both \u and \U can be used to specify UTF-16 surrogate
>  > pairs to encode characters with value > 0x.  This is exact behaviour
>  > used by Java/C#/Python.  (except that Java does not have \U)
>
>
> Are you sure that this handling of surrogates is correct?  The best
>  answer I've managed to find on the Unicode consortium's site is:
>
>   http://unicode.org/faq/utf_bom.html#utf16-7
>
>  it says:
>
>   They are invalid in interchange, but may be freely used internal to an
>   implementation.
>
>  I think this means they consider the handling of them you noted above,
>  in other languages, to be an error.

It's up to UTF8 validator whether to consider non-characters as error.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] postgres 8.4 beta1

2009-04-16 Thread Tiramisu Mokka

FYI,

compiles on OpenSuSE 11.1
client works against 8.3.7

more information as soon as i will migrate date.

-- kofeman

Re: [HACKERS] HashJoin w/option to unique-ify inner rel

2009-04-16 Thread Lawrence, Ramon

> Upon further review, it appears that a big part of this problem is
> that cost_hashjoin() doesn't understand that it needs cost semi-joins
> differently from inner or left joins.  The bogus logic looks to be
> right here:
> startup_cost += hash_qual_cost.startup;
> run_cost += hash_qual_cost.per_tuple *
> outer_path_rows * clamp_row_est(inner_path_rows *
> innerbucketsize) * 0.5;
> 
> Of course, when the join type is JOIN_SEMI, we're going to stop
> looking after we find the first match, so this estimate is really far
> off.

The 8.3 version of cost_hashjoin() had a line like this:

joininfactor = join_in_selectivity(&path->jpath, root);

and a cost function like this:

run_cost += hash_qual_cost.per_tuple *
outer_path_rows * clamp_row_est(inner_path_rows *
innerbucketsize) * joininfactor * 0.5;

This compensated for IN joins being able to stop scanning a bucket once
a match is found.  You may consider something similar for a semi-join.
Having experimented with a lot of this code recently, there is some
potential for improvement on the bucket sizes, etc., but it is a
non-trivial problem.

I tested a similar query on TPCH 100M1Z on version 8.3:

select * from customer where c_custkey in (select o_custkey from orders)

and found that hash aggregate was marginally faster.  If you turn off
aggregation, it selects an IN hash join which is about 5% slower and the
planner is not too far off.  So, it would be definitely possible to
modify the cost function appropriately.

> >>> It's tempting to have Hash cheat and just peek at the node beneath
it
> >>> to see if it's a HashAggregate, in which case it could call a
special
> >>> method to request the whole hash. But it would have to know that
it's
> >>> just a plain uniquify and not implementing a GROUP BY.

If HashAggregate is faster, then the question is can you make it better
by avoiding building the hash structure twice.  I haven't considered all
the possibilities, but the situation you have used as an example, an IN
query, seems workable.  Instead of translating to a hash
aggregate/hash/hash join query plan, it may be possible to create a
special hash join node that does uniquefy.  

The benefit is that the planner knows about it (instead of changing the
execution plan), you can be more accurate on costs for the hash join,
and you can optimize by using only one hash table construction. A
challenge that must be dealt with is handling the multi-batch case.  It
appears that hash aggregate does not currently handle this, but I may be
mistaken.

--
Ramon Lawrence

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HashJoin w/option to unique-ify inner rel

2009-04-16 Thread Tom Lane

Robert Haas  writes:
> Upon further review, it appears that a big part of this problem is
> that cost_hashjoin() doesn't understand that it needs cost semi-joins
> differently from inner or left joins.

Yeah, I have a note to look into that before 8.4 final.  The same is
true for nestloops: stopping after hitting one match to the current
outer can make a big difference, and that's not reflected in costsize.c
yet.  I'm not sure whether cost_mergejoin needs to care.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HashJoin w/option to unique-ify inner rel

2009-04-16 Thread Robert Haas

On Thu, Apr 16, 2009 at 7:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Upon further review, it appears that a big part of this problem is
>> that cost_hashjoin() doesn't understand that it needs cost semi-joins
>> differently from inner or left joins.
>
> Yeah, I have a note to look into that before 8.4 final.  The same is
> true for nestloops: stopping after hitting one match to the current
> outer can make a big difference, and that's not reflected in costsize.c
> yet.  I'm not sure whether cost_mergejoin needs to care.

Cool.  It's worth noting that this is also a problem for anti-joins,
which can also stop after hitting one match.

For merge join, it looks like you might save something if there are
duplicates in both the inner and outer lists.  If the outer side
contains 1,1,1,1,1,2 and the inner side contains 1,1,1,2,2, then an
INNER JOIN or LEFT JOIN will rescan the first three tuples of the
inner side three times, whereas a SEMI or ANTI JOIN will scan the
first tuple three times and then the next two only once.  But I'm not
sure if we can estimate this accurately enough to matter.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HashJoin w/option to unique-ify inner rel

2009-04-16 Thread Robert Haas

> If HashAggregate is faster, then the question is can you make it better
> by avoiding building the hash structure twice.  I haven't considered all
> the possibilities, but the situation you have used as an example, an IN
> query, seems workable.  Instead of translating to a hash
> aggregate/hash/hash join query plan, it may be possible to create a
> special hash join node that does uniquefy.

Yeah, that's what I was looking at.  The problem is that unique-ify is
not free either - we have to invoke the appropriate comparison
operators for every tuple in the bucket for which the hash values
match exactly.  So, for example if the input has K copies each of N
items, I'll need to do (K - 1) * N comparisons, assuming no hash
collisions.  In return, the number of tuples in each bucket will be
reduced by a factor of K, but that doesn't actually save very much,
because I can reject all of those with an integer comparison anyway,
again assuming no hash collisions, so it's pretty cheap.

If the hash join was on track to go multi-batch, then unique-ifying it
on the fly makes a lot of sense...  otherwise, I'm not sure it's
really going to be a win.  Anyhow, further analysis needed...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Lifetime of FmgrInfo

2009-04-16 Thread Joshua Tolley

I was browsing PL/pgSQL source, and saw this line (pl_comp.c:151):

function = (PLpgSQL_function *) fcinfo->flinfo->fn_extra

It then does some work to determine whether the result in "function" is
valid or not. So I got to wondering, what's the lifetime of the
FunctionCallInfoinfo object passed to the call handler function? Or in
other words, what memory context is it in? And is there some way I could
find that out more easily than digging through the source?

- Josh / eggyknap


signature.asc
Description: Digital signature

[HACKERS] [PATCH] unalias of ACL_SELECT_FOR_UPDATE

2009-04-16 Thread KaiGai Kohei

Currently, the ACL_SELECT_FOR_UPDATE privilege is defined as an alias
of ACL_UPDATE as follows:

  at src/include/nodes/parsenodes.h:
   :
  /* Currently, SELECT ... FOR UPDATE/FOR SHARE requires UPDATE privileges */
  #define ACL_SELECT_FOR_UPDATE   ACL_UPDATE
   :

It is unconfortable for us because SE-PostgreSQL have two individual
permissions for updates (db_table:{update}) and explicit table locks
(db_table:{lock}), but it unables to discriminate whether the given
relation is actually used for UPDATE or SELECT FOR UPDATE.
So, I would like to define the bit as (1<<12) which is not in use.

I think we have two options to solve the matter.

The one is to check ACL_SELECT_FOR_UPDATE by UPDATE privilege, as
the attached patch doing. It implicitly expands the users privilege
when he has ACL_UPDATE, so it enables GRANT UPDATE to cover both of
ACL_UPDATE and ACL_SELECT_FOR_UPDATE as the current implementation
doing.

The other is to add a new privilege for explicit table locks, such
as something like LOCK privilege. It is a straightforward approach,
but we need to have a user visible changes.

Which is more preferable design?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 
*** base/src/include/nodes/parsenodes.h	2009-04-09 00:13:21.0 +0900
--- base_ex/src/include/nodes/parsenodes.h	2009-04-09 00:41:07.0 +0900
*** typedef uint32 AclMode;			/* a bitmask o
*** 71,80 
  #define ACL_CREATE		(1<<9)	/* for namespaces and databases */
  #define ACL_CREATE_TEMP (1<<10) /* for databases */
  #define ACL_CONNECT		(1<<11) /* for databases */
- #define N_ACL_RIGHTS	12		/* 1 plus the last 1ai_grantee == roleid)
  		{
! 			result |= aidata->ai_privs & mask;
  			if ((how == ACLMASK_ALL) ? (result == mask) : (result != 0))
  return result;
  		}
--- 1141,1156 
  	for (i = 0; i < num; i++)
  	{
  		AclItem*aidata = &aidat[i];
+ 		AclMode		aiprivs = aidata->ai_privs;
+ 
+ 		/* fixup ACL_SELECT_FOR_UPDATE */
+ 		if (aiprivs & ACL_UPDATE)
+ 			aiprivs |= ACL_SELECT_FOR_UPDATE;
  
  		if (aidata->ai_grantee == ACL_ID_PUBLIC ||
  			aidata->ai_grantee == roleid)
  		{
! 			result |= aiprivs & mask;
  			if ((how == ACLMASK_ALL) ? (result == mask) : (result != 0))
  return result;
  		}
*** aclmask(const Acl *acl, Oid roleid, Oid 
*** 1162,1167 
--- 1167,1177 
  	for (i = 0; i < num; i++)
  	{
  		AclItem*aidata = &aidat[i];
+ 		AclMode		aiprivs = aidata->ai_privs;
+ 
+ 		/* fixup ACL_SELECT_FOR_UPDATE */
+ 		if (aiprivs & ACL_UPDATE)
+ 			aiprivs |= ACL_SELECT_FOR_UPDATE;
  
  		if (aidata->ai_grantee == ACL_ID_PUBLIC ||
  			aidata->ai_grantee == roleid)
*** aclmask(const Acl *acl, Oid roleid, Oid 
*** 1170,1176 
  		if ((aidata->ai_privs & remaining) &&
  			has_privs_of_role(roleid, aidata->ai_grantee))
  		{
! 			result |= aidata->ai_privs & mask;
  			if ((how == ACLMASK_ALL) ? (result == mask) : (result != 0))
  return result;
  			remaining = mask & ~result;
--- 1180,1186 
  		if ((aidata->ai_privs & remaining) &&
  			has_privs_of_role(roleid, aidata->ai_grantee))
  		{
! 			result |= aiprivs & mask;
  			if ((how == ACLMASK_ALL) ? (result == mask) : (result != 0))
  return result;
  			remaining = mask & ~result;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lifetime of FmgrInfo

2009-04-16 Thread Tom Lane

Joshua Tolley  writes:
> It then does some work to determine whether the result in "function" is
> valid or not. So I got to wondering, what's the lifetime of the
> FunctionCallInfoinfo object passed to the call handler function?

Query lifespan, usually.  There are counterexamples on both sides,
but as a rule it's expected that it's worthwhile for a function to
cache anything it can in the fcinfo->flinfo->fn_extra structure.
If it's not worthwhile, that's the caller's fault not the function's.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] unalias of ACL_SELECT_FOR_UPDATE

2009-04-16 Thread Heikki Linnakangas

KaiGai Kohei wrote:
> Currently, the ACL_SELECT_FOR_UPDATE privilege is defined as an alias
> of ACL_UPDATE as follows:
> 
>   at src/include/nodes/parsenodes.h:
>:
>   /* Currently, SELECT ... FOR UPDATE/FOR SHARE requires UPDATE privileges */
>   #define ACL_SELECT_FOR_UPDATE   ACL_UPDATE
>:
> 
> It is unconfortable for us because SE-PostgreSQL have two individual
> permissions for updates (db_table:{update}) and explicit table locks
> (db_table:{lock}), but it unables to discriminate whether the given
> relation is actually used for UPDATE or SELECT FOR UPDATE.

What's the point of doing SELECT FOR UPDATE if you're not actually going
to UPDATE the row? Having separate permissions for SELECT FOR UPDATE and
UPDATE seems useless.

A separate permission for SELECT FOR SHARE makes more sense, though.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] unalias of ACL_SELECT_FOR_UPDATE

2009-04-16 Thread KaiGai Kohei

Heikki Linnakangas wrote:
> KaiGai Kohei wrote:
>> Currently, the ACL_SELECT_FOR_UPDATE privilege is defined as an alias
>> of ACL_UPDATE as follows:
>>
>>   at src/include/nodes/parsenodes.h:
>>:
>>   /* Currently, SELECT ... FOR UPDATE/FOR SHARE requires UPDATE privileges */
>>   #define ACL_SELECT_FOR_UPDATE   ACL_UPDATE
>>:
>>
>> It is unconfortable for us because SE-PostgreSQL have two individual
>> permissions for updates (db_table:{update}) and explicit table locks
>> (db_table:{lock}), but it unables to discriminate whether the given
>> relation is actually used for UPDATE or SELECT FOR UPDATE.
> 
> What's the point of doing SELECT FOR UPDATE if you're not actually going
> to UPDATE the row? Having separate permissions for SELECT FOR UPDATE and
> UPDATE seems useless.

I wonder why SELECT FOR UPDATE need ACL_UPDATE, although the statement
itself does not modify any of the given relation.
Indeed, it normally leads UPDATE statements, but I think ACL_UPDATE
should be checked on the later phase.

> A separate permission for SELECT FOR SHARE makes more sense, though.

It is my major concern rather than exclusive locks.
The SELECT FOR SHARE statement also requires ACL_SELECT_FOR_UPDATE,
although it is a read only operation. It makes us hard to set up
a table with foreign-key which refers a primary-key on read-only
table, for example.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] SE-PostgreSQL for v8.5 development (r1819)

2009-04-16 Thread KaiGai Kohei

The following list of patches are the latest SE-PostgreSQL (r1819).

http://sepgsql.googlecode.com/files/sepgsql-01-sysatt-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-02-core-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-03-writable-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-04-rowlevel-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-05-perms-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-06-utils-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-07-tests-8.4beta1-r1819.patch
http://sepgsql.googlecode.com/files/sepgsql-08-docs-8.4beta1-r1819.patch

List of updates:
* The base version was updated to the latest CVS HEAD.
* The code to receice notifications from the kernelspace via netlink
  socket was simplified using the new avc_netlink_xxx() APIs.
* It enables to handle permissive domain on the upcoming linux-2.6.31.
* It enables to handle undefined permissions in the policy correctly.

The purpose of every patches are not changed.

Thanks,

KaiGai Kohei wrote:
> The following list of patches are the initial revision of SE-PostgreSQL
> on the v8.5 development cycle.
> These are separated into several functional components to help review
> and commit in earlier phase. Every patches (except for the core) have
> abour 1KL scales. It is far smaller than them in a year ago. :-)
> 
>   http://sepgsql.googlecode.com/files/sepgsql-01-sysatt-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-02-core-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-03-writable-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-04-rowlevel-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-05-perms-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-06-utils-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-07-tests-8.4devel-r1769.patch
>   http://sepgsql.googlecode.com/files/sepgsql-08-docs-8.4devel-r1769.patch
> 
> Needless to say, it is now designed on 8.4devel tree, so anyone who want
> to build/install SE-PostgreSQL can apply these patches by hand.
> I'll also update and fix them with the progress of v8.4 development.
> Before you apply them, please confirm whether they are the latest, or not.
> 
> Bruice,
> | KaiGai-san, the only option I can offer is perhaps to list a URL for
> | your SE-PostgreSQL patch to be applied by people who want to use SE-PG.
> 
> Does it mean I need to submit a patch to add an introduction under doc/ ?
> If so, I'll submit it as soon as possible.
> 
> Thanks,
> 
> 
> 01) Security system attribute support
> scale: 38 files changed, 853 insertions(+), 1 deletion(-), 113 
> modifications(!)
>   This patch adds a new system catalog "pg_security" and enables to store
>   security identifier associated to a text representation within padding
>   area of HeapTupleHeader, as object identifier doing.
>   It is a foundation of any other facilities.
> 
> 02) Core facilities of SE-PostgreSQL
> scale: 55 files changed, 3588 insertions(+), 10 deletions(-), 736 
> modifications(!)
>   This patch adds a mandatory access control feature collaborating with
>   SELinux in table, column, procedure level granurality. Most of this
>   patch is same as I proposed in the v8.4 development cycle, except for
>   it is designed on the basis of security system attribute support.
> 
> 03) Writable system column support
> scale: 7 files changed, 298 insertions(+), 199 modifications(!)
>   This patch enables users to update/insert on system columns 
> ("security_label"
>   and "security_acl") with explicit values. This feature is necessary to 
> provide
>   a user interface for row-level access controls.
> 
> 04) Row-level access controls support
> scale: 31 files changed, 1101 insertions(+), 231 modifications(!)
>   This patch enables to apply mandatory/discretionary access control in 
> row-level
>   granularity also.
> 
> 05) Advanced permission checks support
> scale: 18 files changed, 858 insertions(+), 3 deletions(-), 43 
> modifications(!)
>   This patch add some of advanced permission checks:
>- file:{read write} on server side filesystem accesses
>- db_procedure:{install} on user defined functions as system internal ones
>- db_database:{load_module install_module} on binary shared library files
>   In the v8.4 development, these are suggested to separate from the core.
> 
> 06) Security options in utilities
> scale: 4 files changed, 95 insertions(+), 116 modifications(!)
>   This patch adds options on utilities
>- "--enable-selinux" option for initdb
>- "--security-label" option for pg_dump and pg_dumpall
> 
> 07) Testcases of SE-PostgreSQL
> scale: 18 files changed, 1819 insertions(+), 2 modifications(!)
>   This patch adds testcases for SE-PostgreSQL.
> 
> 08) Documentation of SE-PostgreSQL
> scale: 16 files changed, 1595 insertions(+), 42 modifications

40 matches

Mail list logo