from:"Mark"

[HACKERS] Postgresql10 Bug report. (pg_catalog.pg_statistic_ext does not exist)

2017-04-06 Thread mark

apologies if someone has already reported this.

steps to reproduce.

install PG10 rpms.

create table.

using psql 10 \d the table.

note the error below.



m=# create table mytable (myid serial, mytext text);
CREATE TABLE
m=# \d mytable
ERROR:  relation "pg_catalog.pg_statistic_ext" does not exist
LINE 8: FROM pg_catalog.pg_statistic_ext stat WHERE starelid  = '163...
 ^
m=# \q



list of my rpm versions.

[postgres@localhost ~]$ rpm -qa | grep postgresql10
postgresql10-test-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-devel-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-contrib-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-server-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-libs-10.0-20170406_1PGDG.rhel7.1.x86_64
postgresql10-debuginfo-10.0-20170406_1PGDG.rhel7.1.x86_64


thanks

-Mark

Re: [HACKERS] Postgresql10 Bug report. (pg_catalog.pg_statistic_ext does not exist)

2017-04-06 Thread mark

it would appear that it didn't restart when I thought it had with the
service command.

apologies, I'm not able to reproduce anymore after restarting things.


On Thu, Apr 6, 2017 at 11:27 AM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > mark wrote:
> >> m=# create table mytable (myid serial, mytext text);
> >> CREATE TABLE
> >> m=# \d mytable
> >> ERROR:  relation "pg_catalog.pg_statistic_ext" does not exist
>
> > Ah, what happens is you're using a new psql with a pre-10 server.  Yeah,
> > this is a bug since psql is supposed to work fine with older servers
> > too.  Thanks for the report, will fix.
>
> No, there is a version test there, and it works fine for me.  I think
> the OP is somehow using a very recent psql with a v10, but not so recent,
> server.  Perhaps he didn't restart his server after updating RPMs.
>
> regards, tom lane
>

Re: [HACKERS] [GENERAL] Insert result does not match record count

2015-05-04 Thread mark

Did this every go any further?


I wrote some data transformation script at work, and after seeing  "with
count -2017657667" (and similar) in my scripts log I got a bit worried.
seeing else where were folks just run a full on count(*) later to check
counts but that is even MORE time and I was thinking it was a psycopg2
problem, but seems there are issues with the internal counters in pg as
well for tracking "large" changes.

thanks,

Mark

On Sun, Feb 2, 2014 at 9:12 AM, Tom Lane  wrote:

> Vik Fearing  writes:
> > Without re-doing the work, my IRC logs show that I was bothered by this
> > in src/backend/tcop/postgres.c:
>
> > max_rows = pq_getmsgint(&input_message, 4);
>
> > I needed to change max_rows to int64 which meant I had to change
> > pq_getmsgint to pq_getmsgint64 which made me a little worried.
>
> As well you should be, because we are *not* doing that.  That would be
> a guaranteed-incompatible protocol change.  Fortunately, I don't see
> any functional need for widening the row-limit field in execute messages;
> how likely is it that someone wants to fetch exactly 3 billion rows?
> The practical use-cases for nonzero row limits generally involve fetching
> a bufferload worth of data at a time, so that the restriction to getting
> no more than INT_MAX rows at once is several orders of magnitude away
> from being a problem.
>
> The same goes for internal uses of row limits, which makes it
> questionable whether it's worth changing the width of ExecutorRun's
> count parameter, which is what I assume you were on about here.  But
> in any case, if we did that we'd not try to reflect it as far as here,
> because the message format specs can't change.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

[HACKERS] xslt_process deprecated?

2014-09-03 Thread Mark


Hi,
I'd like to use the xslt_process function but it is in part of the 
documentation that is deprecated.  I don't want to use something that is 
going to disappear and if there is a better alternative I'd like to use 
it, however I cannot find an equivalent in the documentation.  I could 
well be looking in the wrong place, apologies if I've just been too 
blind to see it.


Can anyone help?

Thanks,
Mark.
--



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] issue with smlar exension and gist index creation (9.2Beta1)

2012-06-05 Thread mark

Hi All,

I am playing around with 9.2Beta1 and the smlar extension that was presented
at pgcon. Looks like a lot of great work has gone into both - so thanks to
everyone for all the great work. 


I did run into an issue while creating a GIST index using the _text_sml_ops.

I am getting "ERROR: failed to re-find parent for block 25399". 


Create index statement :
Create index my_musbject_array_idx on message_20120605 USING
GIST(regexp_split_to_array(msubject, E'\\s+') _text_sml_ops);

(msubject is a varchar(255))
Encoding is UTF8


Table has data in it already. I know this isn't the preferred method of
creating this kind of index, but I don't see why it should fail with this
error. 


I am working to see if I can find a reproducible case that doesn't involve
19GB of table data,  but was told I might as well throw this out to the list
tonight. 

I understand that smlar is very new, not even a contrib yet, but was told I
might as well throw this out to this list. 


If anyone wants more details please let me know and I will try and get them
to you. 


..: Mark 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] issue with smlar exension and gist index creation (9.2Beta1)

2012-06-06 Thread mark



> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Tuesday, June 05, 2012 9:22 PM
> To: mark
> Cc: 'pgsql-hackers'
> Subject: Re: [HACKERS] issue with smlar exension and gist index
> creation (9.2Beta1)
> 
> "mark"  writes:
> > I am playing around with 9.2Beta1 and the smlar extension that was
> > presented at pgcon. Looks like a lot of great work has gone into both
> > - so thanks to everyone for all the great work.
> > I did run into an issue while creating a GIST index using the
> _text_sml_ops.
> > I am getting "ERROR: failed to re-find parent for block 25399".
> 
> Try beta2 ... this sounds suspiciously like the bug Heikki fixed last
> week:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d1996
> ed5e8bfaf1314e7817015668029c07d3b9b
> 
>   regards, tom lane


Thanks, I will give that a shot and see if I still have the same issue. 

..: Mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] locales and encodings Oh MY!

2010-11-12 Thread mark

On Fri, Nov 12, 2010 at 12:45 AM, Gabriele Bartolini
 wrote:
> Hi Mark,
>
> Il 12/11/10 03:31, mark ha scritto:
>>
>> I have listed what I think I will be doing with regards to initdb. if
>> anyone
>> sees problems with the following mixture during my dump ->  init->
>>  restore I
>> would be most keen in hearing about it.
>
> FYI, PostgreSQL 8.4 introduced database level collation. Before that, yes,
> you need to re-init your data dir.

Guess I missed that, I have 8.3.X boxes in production and 9.0.1 boxes
in dev so I guess only someone of them will require a re-init.

>>
>> initdb /path/to/data/dir --lc_ctype=C --lc_collation=C
>> --lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8
>> --lc_time=en_US.UTF8 -E UTF8
>
> Maybe you meant --lc_collate ?
>

Yes I did mean lc_collate - thanks

> Cheers,
> Gabriele
>

With how similar straight C and en_US.UTF8 are it was suggested to me,
by persons who are far more C knowledgeable then I in my office, that
this is something the PG community could "fix" . A "fix" being so that
"col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
(and probably some others).

@hackers ->
is the request unreasonable ? anyone got any idea of the price tag to
make that happen ?

> --
>  Gabriele Bartolini - 2ndQuadrant Italia
>  PostgreSQL Training, Services and Support
>  gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ADMIN] locales and encodings Oh MY!

2010-11-12 Thread mark

On Fri, Nov 12, 2010 at 12:03 PM, Tom Lane  wrote:
> Robert Haas  writes:
>>> With how similar straight C and en_US.UTF8 are it was suggested to me,
>>> by persons who are far more C knowledgeable then I in my office, that
>>> this is something the PG community could "fix" . A "fix" being so that
>>> "col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8
>>> (and probably some others).
>
>>> is the request unreasonable ? anyone got any idea of the price tag to
>>> make that happen ?
>
>> I thought it already did that.
>
> No, and the odds of it ever happening are insignificant.  The sort order
> associated with en_US (and other "dictionary order" locales) is just too
> randomly different from what you need to optimize a LIKE search.
> (Whoever told you en_US sorts similarly to C is nuts.)
>
> The solution if you want the database's prevailing sort order to be en_US
> is to put an extra text_pattern_ops index on the column you want to do
> LIKE searches on.  We might eventually have the ability to spell that
> "put a C-locale index on the column", but text_pattern_ops is the way to
> do it today.
>
>                        regards, tom lane
>

Ok I hear you loud and clear.I am going to eat the overhead until
I get to 9.0.1, currently on 8.3.X in some places.

I will either take an outage and do a dump - re-init-restore or
inplace upgrade and then do some locking, copy, drop old, rename new
db path.

thanks all.


..: Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ts_rank

2011-05-10 Thread Mark

Could somebody explain me on which methods is based ts_rank and how it works?
I would appreciate some articles, if exist. 
Thanks a lot for reply.
Mark

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384120p4384120.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ts_rank

2011-05-21 Thread Mark

>There's some potentially useful information here:
>http://www.postgresql.org/docs/9.0/interactive/textsearch-controls.html#TEXTSEARCH-RANKING

Thanks for reply. I was reading the documentation of PostgreSQL, but there
it is not written the name of the used methods. Everywhere there is written,
that ts_rank use standard ranking function. But it is difficult to say which
is the standard function. 
Somewhere I found that it is maybe based on Vector space model and it seems
to be truth, because in the code of tsrank.c is counted the frequency of
words, but I am not sure of that :-(



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384614p4414631.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Copyrights on files

2007-03-25 Thread mark

On Sat, Mar 24, 2007 at 11:28:26PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > I have emailed Andrew Yu to see if we can remove his line, but I
> > question whether the other people can be reached.
> > How should we handle this?
> If they are released under the BSD license, why do we care about it?

Only the GPL license has the explicit clause "or any feature version of
this license". This clause was designed specifically for this purpose.
To allow the GPL to be updated and automatically apply to all software.
Except, it doesn't. Linux, for example, at least for some time, had an
explicit point that Linus Torvalds did not agree with this ability to
for the FSF to change the terms under which Linux is available. He wrote
at the top of the license words to the effect: "You may *only* use version
2 of the GPL."

Not that I have ever chosen to use the BSD license as a template for any
of my software - but if I did, the University of Berkeley cannot change
the terms of *my* software. The University of Berkeley can only change
the terms of software actually owned by the University of Berkeley.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] like/ilike improvements

2007-05-22 Thread mark

On Tue, May 22, 2007 at 12:12:51PM -0400, Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > ... It turns out (according to the analysis) that the 
> > only time we actually need to use NextChar is when we are matching an 
> > "_" in a like/ilike pattern.
> I thought we'd determined that advancing bytewise for "%" was also risky,
> in two cases:
> 1. Multibyte character set that is not UTF8 (more specifically, does not
> have a guarantee that first bytes and not-first bytes are distinct)
> 2. "_" immediately follows the "%".

Have you considered a two pass approach? First pass - match on bytes.
Only if you find a match with the first pass, start a second pass to
do a 'safe' check?

Are there optimizations to recognize whether the index was created as
lower(field) or upper(field), and translate ILIKE to the appropriate
one?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] like/ilike improvements

2007-05-24 Thread mark

On Thu, May 24, 2007 at 11:20:51PM -0400, Tom Lane wrote:
> I wrote:
> > Andrew Dunstan <[EMAIL PROTECTED]> writes:
> >> Yes, I agree completely. However it looks to me like IsFirstByte will in 
> >> fact always be true when we get to call NextChar for matching "_" for UTF8.
> > If that's true, the patch is failing to achieve its goal of treating %
> > bytewise ...
> OK, I studied it a bit more and now see what you're driving at: in this
> form of the patch, we treat % bytewise unless it is followed by _, in
> which case we treat it char-wise.  That seems a good tradeoff,
> considering that such a pattern is probably pretty uncommon --- we
> should be willing to handle it a bit slower to simplify other cases.

Is it worth the effort to pre-process the pattern?

For example:

%% -> %
%_ -> _%

If applied recursively, this would automatically cover:

%_%  -> _%
_%_  -> __%

The 'benefit' would be that the pattern matching code would not
need an inner if statement?

Also - I didn't see a response to my query with regard treating UTF-8
as a two pass match. First pass treating it as bytes. If the first pass
matches, the second pass doing a full analysis. In the case of low
selectivity, this will be a win, as the primary filter would be the
full speed byte-based matching.

I had also asked why the focus would be on high selectivity. Why would
the primary filter criteria for a properly designed select statement by
a like with high selectivity? The only time I have ever used like is
when I expect low selectivity. Is there a reasonable case I am missing?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Change sort order on UUIDs?

2007-06-14 Thread mark

On Thu, Jun 14, 2007 at 03:38:44PM -0400, Robert Wojciechowski wrote:
> I've been testing the new UUID functionality in 8.3dev and noticed that
> UUIDs are sorted using memcmp in their default in-memory layout, which
> is:
>  struct uuid {
>  uint32_ttime_low;
>  uint16_ttime_mid;
>  uint16_ttime_hi_and_version;
>  uint8_t clock_seq_hi_and_reserved;
>  uint8_t clock_seq_low;
>  uint8_t node[_UUID_NODE_LEN];
>  };
> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
> as described above. With random (version 4) or hashed based (version 3
> or 5) UUIDs there's nothing that can be done to improve the situation,
> obviously.

I suggest that treating the UUID as anything other than a unique
random value is a mistake. There should be no assumptions by users
with regard to how the order is displayed. Also, as UUID generation
based on time is always in sequence, it seems to me that sorting by
UUID time would have the effect of inserts always being to the end of
the index. While this might pack tightly, wouldn't this hurt
concurrency? Random access vs sequential performance. For UUID, I
would value random access before sequential performance. Why would
anybody scan UUID through the index in "sequential" order?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

On Fri, Jun 15, 2007 at 09:40:29AM -0500, Michael Glaesemann wrote:
> On Jun 14, 2007, at 19:04 , [EMAIL PROTECTED] wrote:
> >For UUID, I
> >would value random access before sequential performance. Why would
> >anybody scan UUID through the index in "sequential" order?
> AIUI, to allow UUID columns to be indexed using BTREE, there needs to  
> be some ordering defined. So regardless of what this ordering is,  
> doesn't there need to be some order? And as a (primary?) purpose of  
> UUIDs is to be (universally) unique, and the implementation of  
> uniqueness constraints in PostgreSQL is based on BTREE indexes, this  
> makes the necessity of ordering doubly so. Or have I missed something?

The BTREE needs a comparator function, yes. The BTREE comparator
function need not match any expectation of the caller.

For example, if Robert is correct, that indexes on UUID will become
fragmented over time with the current comparator, then he is providing
only a half solution. He will have non fragmented time-based UUID
index, while continuing to allow everybody else to have fragmented
non time-based UUID. The logic does not make sense. Either fragmentation
is an issue, or it is not. Concurrency is either an issue, or it is not.

I do not believe that it has been demonstrated that there is a
fragmentation issue, nor do I believe it has been shown what effect
would occur on concurrency.

Personally, my preference is for the BTREE comparator to be the fastest
comparison possible, to allow the quickest scanning of the index entries.
I don't believe fragmentation is a serious issue, and I believe there are
concurrency benefits to inserting into different index pages, rather than
always adding to the end.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

On Fri, Jun 15, 2007 at 11:05:01AM -0400, Robert Wojciechowski wrote:
> Also, treating UUIDs as time based is completely valid -- that is the
> point of version 1 UUIDs. They have quite a few advantages over random UUIDs.

It's a leap from extracting the UUID as time, to sorting by UUID for
results, or an assumption that reduced fragmentation would increase
performance for regular loads.

> Ordering random UUIDs as if they were time-based will still result in
> random access on the b-tree, so again luckily this just relies on how
> you generate the UUIDs not how you order them.

Therefore fragmented by your assertion, therefore a performance
problem. If your premise is correct, the conclusion that it leads to
is that UUIDv1 will perform better under PostgreSQL.

> So far this ordering method seems to satisfy both needs, but my
> performance analysis will reveal more.

The word "satisfy" above is loosely defined. The existing model
"satisfies" both needs. You believe you have a case that would prove
that the existing model is sub-optimal. You would like to make a
change which you believe would optimize your own use, at the minimal
expense of other users. The minimal expense comes from the more
complicated comparator function, and the confusion of any who see
their non-UUIDv1 UUID's sorted by some apparently arbitrary scheme
that seems to have a history of assuming that UUIDv1 will be used.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Change sort order on UUIDs?

2007-06-15 Thread mark

If index lookup speed or packing truly was the primary concern, people
would use a suitably sized SEQUENCE. They would not use UUID.

I believe the last time I calculated this, the result was that you
could fit 50% more entries in the index if you use a 32-bit sequence
number instead of a 128-bit UUID.

I've been mixed on my personal use of UUID. The main benefits of UUID
for me have been:

   1) Different sites can generate UUID values without communicating
  according to a well known standard formula, and have an excellent
  chance of always generating unique values for a particular application
  that can be later merged together without conflict.

   2) The numbers are difficult to predict by an end user of your
  application. If exposing your row id to the world, a UUID is
  more obscure and difficult to predict than a sequence number.

The rest of the benefits are slim or non-existent. It's not more
efficient to store or access. It's not easier to process, memorize or
type. Any significance of values, such as UUIDv1 embedded a time and
mac address are of only limited value, as it relies on the generator
following the expected algorithm, and the generators having the same
time (with different allowances for error).

For example, if one truly wished to have an efficient lookup key, and
record creation time, why not use two fields? 1 x 32-bit sequence number,
and 1 x 64-bit timestamp. This gives portability and makes the purpose of
the fields clear. It gives flexibility to create the index on either/or.

For read-only data, I've taken to using the SHA1 sum of the data as
the unique id instead of UUID or SEQUENCE. Works pretty good... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] vacuum problems

2001-07-11 Thread Mark



Quick rundown of our configuration:
Red Hat 7.1 (no changes or extras added by us)
Postgresql 7.1.2 and CVS HEAD from 07/10/2001
3.8 gb database size

I included two pgsql versions because this happens on both.

Here's the problem we're having:

We run a vacuumdb from the server on the entire database.  Some large tables 
are vacuumed very quickly, but the vacuum process hangs or takes more than a 
few hours on a specific table (we haven't let it finish before).  The vacuum 
process works quickly on a table (loginhistory) with 2.8 million records, but 
is extremely slow on a table (inbox) with 1.1 million records (the table with 
1.1 million records is actually larger in kb size than the other table).

We've tried to vacuum the inbox table seperately ('vacuum inbox' within 
psql), but this still takes hours (again we have never let it complete, we 
need to use the database for development as well).

We noticed 2 things that are significant to this situatoin:
The server logs the following:


DEBUG:  --Relation msginbox--
DEBUG:  Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup 
1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100,
MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332;
EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec.
DEBUG:  Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360.
CPU 0.47s/6.70u sec.
DEBUG:  Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0.
CPU 0.37s/6.15u sec.
DEBUG:  Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0.
CPU 0.32s/6.30u sec.
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES

the last few lines (XLogWrite .) repeat for ever and ever and ever.  With 
7.1.2 this never stops unless we run out of disk space or cancel the query.  
With CVS HEAD this still continues, but the log files don't consume all disk 
space, but we still have to cancel it or it might run forever.

Perhaps we need to let it run until it completes, but we thought that we 
might be doing something wrong or have some data (we're converting data from 
MS SQL Server) that isn't friendly.

The major issue we're facing with this is that any read or write access to 
the table being vacuumed times out (obviously because the table is still 
locked).  We plan to use PostgreSQL in our production service, but we can't 
until we get this resolved.

We're at a loss, not being familiar enough with PostgreSQL and it's source 
code.  Can anyone please offer some advice or suggestions?

Thanks,

Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] vacuum problems

2001-07-11 Thread Mark



We increased shared memory in the linux kernel, which decreased the vacuumdb 
time from 40 minutes to 14 minutes on a 450 mhz processor.  We calculate that 
on our dual 1ghz box with ghz ethernet san connection this will go down to 
under 5 minutes.  This is acceptable to us.  Sorry about the unnecessary post.

On Wednesday 11 July 2001 09:16, Mark wrote:
> Quick rundown of our configuration:
> Red Hat 7.1 (no changes or extras added by us)
> Postgresql 7.1.2 and CVS HEAD from 07/10/2001
> 3.8 gb database size
>
> I included two pgsql versions because this happens on both.
>
> Here's the problem we're having:
>
> We run a vacuumdb from the server on the entire database.  Some large
> tables are vacuumed very quickly, but the vacuum process hangs or takes
> more than a few hours on a specific table (we haven't let it finish
> before).  The vacuum process works quickly on a table (loginhistory) with
> 2.8 million records, but is extremely slow on a table (inbox) with 1.1
> million records (the table with 1.1 million records is actually larger in
> kb size than the other table).
>
> We've tried to vacuum the inbox table seperately ('vacuum inbox' within
> psql), but this still takes hours (again we have never let it complete, we
> need to use the database for development as well).
>
> We noticed 2 things that are significant to this situatoin:
> The server logs the following:
>
>
> DEBUG:  --Relation msginbox--
> DEBUG:  Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup
> 1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100,
> MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332;
> EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec.
> DEBUG:  Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360.
> CPU 0.47s/6.70u sec.
> DEBUG:  Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0.
> CPU 0.37s/6.15u sec.
> DEBUG:  Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0.
> CPU 0.32s/6.30u sec.
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
>
> the last few lines (XLogWrite .) repeat for ever and ever and ever. 
> With 7.1.2 this never stops unless we run out of disk space or cancel the
> query. With CVS HEAD this still continues, but the log files don't consume
> all disk space, but we still have to cancel it or it might run forever.
>
> Perhaps we need to let it run until it completes, but we thought that we
> might be doing something wrong or have some data (we're converting data
> from MS SQL Server) that isn't friendly.
>
> The major issue we're facing with this is that any read or write access to
> the table being vacuumed times out (obviously because the table is still
> locked).  We plan to use PostgreSQL in our production service, but we can't
> until we get this resolved.
>
> We're at a loss, not being familiar enough with PostgreSQL and it's source
> code.  Can anyone please offer some advice or suggestions?
>
> Thanks,
>
> Mark
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] dynamic #include's?

2001-10-23 Thread Mark


I have an idea for creating a Perl script, but I just wanted to make
sure that no one has already created something similar.
When doing a full row select, it's necessary to create all the bind
variables, and then do a column by column select statement.  Ugly.  I
want to automagically create an include that I would be able to place
in the DECLARE SECTION and also use in the SELECT statements.
All of these methods are based upon my prior experience with embedded
sql, so if Postgres has a better method, or something new has come up,
please let me know.
Thanks!
Mark

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Sun, Oct 15, 2006 at 03:41:25AM -0700, Anon Mous wrote:
> However, the problem is surmountable and has been figured out by Oracle,
> although I don't know how they did it:
> 
> http://www.oracle.com/technology/products/ias/joc/index.html 

I'm pretty sure this is application-side caching. The application is
able to cache intelligently and efficiently, as it is able to make
assumptions. It's only seeing one view of the data. The view is
internally consistent, and only the application is making changes
to the view that it sees.

On the rest of your thinking:

Are people comfortable in believing that tables that do not change
are a valuable point to consider for caching?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Sun, Oct 15, 2006 at 08:31:36PM +0530, Merlin Moncure wrote:
> On 10/15/06, Anon Mous <[EMAIL PROTECTED]> wrote:
> > Would it be possible to combine a special memcache implementation of
> > memcache with a Postgresql interface wrapper?
> have you seen
> http://people.freebsd.org/~seanc/pgmemcache/

Interesting. I note that they don't address the view consistency
problem any better than an application using memcached directly.
And that's the real problem with memcached, and why people are
tempted to 'indulge' by relying on PostgreSQL. Some people value
the consistency. Others don't. memcached, whether application-side,
or whether automatically invoked by triggers (pgmemcache) is a
decision to ignore the consistency.

Using memcache, I've had problems with consistency brought right to
the front. Both of these have failed me:

1) When updating a PostgreSQL record, I invalidate the memcache record.
   If another process comes along in parallel before I commit, notices
   that the memcache record is invalidated, it queries the data from
   SQL, and updates the memcache record back to the old value. :-(

2) When updating a PostgreSQL record, I updated the memcache record
   to the new value. If another process comes along in parallel before
   I commit, that is still looking at an older view, cross-referencing
   may not work as expected.

I'm currently settled on 2), but setting a short timeout (5 seconds) on
the data. Still an imperfect compromise between speed and accuracy, but
it isn't causing me problems... yet.

I don't see memcache as a general solution to query plan or query
result caching. Along these lines, I would look more towards having
the query plans or query results stored in cache along with the
transaction numbers that would let us know whether either is valid.

Consistency is very valuable to me. If it wasn't for memcache being
hundreds or more times faster, I wouldn't use it in the cases I do.
It can be dangerous.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Sun, Oct 15, 2006 at 06:43:43PM +0100, Lexington Luthor wrote:
> [EMAIL PROTECTED] wrote:
> >Using memcache, I've had problems with consistency brought right to
> >the front. Both of these have failed me:
> >1) When updating a PostgreSQL record, I invalidate the memcache record.
> >   If another process comes along in parallel before I commit, notices
> >   that the memcache record is invalidated, it queries the data from
> >   SQL, and updates the memcache record back to the old value. :-(
> How can this fail? The PostgreSQL MVCC will hold the second transaction 
> back until the effect on the tuple is known (i.e. after the first 
> transaction is over). Have you not been using SERIALIZABLE transactions?

I don't *want* to use SERIALIZABLE transactions. That would grind my
application to a halt.

Consistency isn't the same as serializable. Memcache offers *NEITHER*.

> With a bit of careful planning (and a few SELECT FOR UPDATE queries to 
> prevent deadlock), having perfect consistency and correct caching is 
> possible.

Your conclusion is false for all cases except data that will never change.

You can never have perfect consistency across different systems (memcache /
postgresql) and especially not when their visibility rules differ. What is
visible to something via memcache is always latest uncommitted. What is
visible in PostgreSQL is something less than that. Consistency is not
possible. Correct caching is therefore also not possible unless you define
correct as 'latest', and even then, you have problems if memcache expires
the record, before the real record has been commited into PostgreSQL.

Under a significant enough load, it becomes visible.

> >2) When updating a PostgreSQL record, I updated the memcache record
> >   to the new value. If another process comes along in parallel before
> >   I commit, that is still looking at an older view, cross-referencing
> >   may not work as expected.
> This breaks integrity, and all bets are off.

Both break integrity.

> >I'm currently settled on 2), but setting a short timeout (5 seconds) on
> >the data. Still an imperfect compromise between speed and accuracy, but
> >it isn't causing me problems... yet.
> What exactly does your application do about the possibility of incorrect 
> data?

Right now? I've limited it to display problems. Wrong counts. I think
I tracked down all the significant problems. For me, "latest" is often
equally good to "consistent" where memcache is giving "latest" and
PostgreSQL is giving "consistent".

> >Consistency is very valuable to me. If it wasn't for memcache being
> >hundreds or more times faster, I wouldn't use it in the cases I do.
> >It can be dangerous.
> Consistency and caching are not mutually exclusive, and there are many 
> frameworks that handle the burden of maintaining both for you.

Consistency and memcached, *are* mutually exclusive.

memcached provides no mechanisms for consistency.

Close may good enough for many. Close is the best that it can do.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote:
> Just throwing some ideas around -
> What if we could do something like
> CREATE TABLESPACE myramcache LOCATION MEMORY(2GB);
> CREATE TABLE mycache (
> id as integer, data as varchar(50))
> USING TABLESPACE myramcache;
> INSERT INTO mycache SELECT id,data FROM myrealtable;
> ...
> You could setup a table in memory to contain known popular data, you 
> could also use this to create a temporary table in memory to speed up 
> multiple intermediate calculations without touching disks.

I'm not sure what this would save. If the table is read-only, there
shouldn't be writes happening. If it's small, and frequently accessed,
it should fit in the buffer cache.

None of this avoids the cost of query planning, or query execution.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Sun, Oct 15, 2006 at 06:43:43PM +0100, Lexington Luthor wrote:
> With a bit of careful planning (and a few SELECT FOR UPDATE queries to 
> prevent deadlock), having perfect consistency and correct caching is 
> possible.

I didn't respond directly to this claim of yours.

SELECT FOR UPDATE is only useful if I'm going to do SELECT. If I am
using memcache, one would presume that I am using it in place of
SELECT, to improve performance. If I'm going to SELECT and then
memcache, I haven't gained anything.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Mon, Oct 16, 2006 at 05:14:59AM +0930, Shane Ambler wrote:
> [EMAIL PROTECTED] wrote:
> >On Mon, Oct 16, 2006 at 03:08:39AM +0930, Shane Ambler wrote:
> >>You could setup a table in memory to contain known popular data, you 
> >>could also use this to create a temporary table in memory to speed up 
> >>multiple intermediate calculations without touching disks.
> >I'm not sure what this would save. If the table is read-only, there
> >shouldn't be writes happening. If it's small, and frequently accessed,
> >it should fit in the buffer cache.
> Because it is frequently accessed doesn't mean that it is small - the 
> main point is control over what is cached and a starting point for other 
> options mentioned later.

This presumes that your instructions will do a better job than it is
already doing. You are telling it "use this much cache memory". Only,
if you specify more than the amount of RAM you have, this will be
swapped to disk, and you won't have avoided a disk access. If you
specify less than the amount of RAM you have, you are preventing
PostgreSQL or the kernel from deciding that another page is more
valuable than your "large static table".

> >None of this avoids the cost of query planning, or query execution.
> No but you can avoid costly disk access and still have the postgres 
> level of integrity and integration that memcached doesn't offer.

If PostgreSQL or the kernel is caching it, this is zero.

As a thought experiment, I'm not seeing the benefit. I think if you
could prove a benefit, then any proof you provided could be used to
improve the already existing caching layers, and would apply equally
to read-only or read-write pages. For example, why not be able to
hint to PostgreSQL that a disk-based table should be considered a
priority to keep in RAM. That way, PostgreSQL would avoid pushing
pages from this table out.

I'm not convinced that this would be a gain, though. I highly expect
that an LRU rule is better than a statically defined "keep this in
RAM" rule.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Sun, Oct 15, 2006 at 02:39:36PM -0700, Josh Berkus wrote:
> > Using memcache, I've had problems with consistency brought right to
> > the front. Both of these have failed me:
> > 1) When updating a PostgreSQL record, I invalidate the memcache record.
> >If another process comes along in parallel before I commit, notices
> >that the memcache record is invalidated, it queries the data from
> >SQL, and updates the memcache record back to the old value. :-(
> >
> > 2) When updating a PostgreSQL record, I updated the memcache record
> >to the new value. If another process comes along in parallel before
> >I commit, that is still looking at an older view, cross-referencing
> >may not work as expected.
> The answer is that cached values are not expected to be consistent.  If they 
> were, then they'd have to have all of the transaction overhead which 
> PostgreSQL has, and lose any gain in efficiency.

Agree.

> Generally, memcached is best used for values that don't get kept in the 
> database at all.  Example (how I use it)
> 
> Using pgmemcache for user session information:
> 
> 1) User logs in.  Their session information is stored in the permanent 
> user_session table and the pgmemcache pseudotable, user_session_cache.
> 2) User browses the website.  This requires updating their session every time
> a page is loaded with their last activity timestamp, their page path (for 
> backtracking) and their ip information (for hijack prevention).  This 
> informaiton is recorded in user_session_cache *only*, with the presumption 
> that it will be lost if the server goes down.
> 3) User logs out (or is auto-logged-out).  Keys are deleted from 
> user_session_cache and their exit information is written to the permanent 
> table user_session.

Is there a benefit here to doing this with pgmemcache over the application
calling in to memcache directly?

Are you able to effectively and efficiently include memcache derived
information within select queries that include information you want
to pull out of the database?

I like the logout commits information part, and it's something I've been
meaning to do, but haven't gotten around to.

> The advantage of this is that it allows lowering the amount of write activity
> to the user_session table by 95% with no loss of information we care about.  
> Of course, if you are using a Java or .NET application server, it probably 
> provides the above functionality itself.

Agree.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Postgresql Caching

2006-10-15 Thread mark

On Mon, Oct 16, 2006 at 07:00:20AM +0930, Shane Ambler wrote:
> [EMAIL PROTECTED] wrote:
> >As a thought experiment, I'm not seeing the benefit. I think if you
> >could prove a benefit, then any proof you provided could be used to
> >improve the already existing caching layers, and would apply equally
> >to read-only or read-write pages. For example, why not be able to
> >hint to PostgreSQL that a disk-based table should be considered a
> >priority to keep in RAM. That way, PostgreSQL would avoid pushing
> >pages from this table out.
> If memcached (or pgmemcached implemented in triggers) can show a speed 
> improvement using ram based caching (even with network overhead) of 
> specific data then it stands to reason that this ram based cache can be 
> integrated into postgres with better integration that will overcome the 
> issues that pgmemcached has.

I think you might misunderstand how memcache gets its speed. It gets
away from reading or writing to disk because it makes no promise that
the data is safe, even in RAM. memcache may choose to overwrite the
data at any time.

It makes no consistency guarantees. There are no read/write
boundaries. You and I can both read, and both write, and the result
is indeterminate.

It limits itself to only two fields per row. A key and a value. Both
fields must be string types.

Based on all of the assumptions above, all read and write operations are
fast enough, that they can be serialized. This allows for the process to
be single-process, single-thread, with no requirement for co-ordination
between these processes or threads. There is no need for locking any
data structures, or waiting or any resources. Requests can be dispatched
immediately.

What of the above fits into PostgreSQL? Can PostgreSQL choose to remove
records on a whim? Can PostgreSQL get away with removing transaction
boundaries, and making specific tables always read and write to latest?
Can PostgreSQL tables be limited to two fields? Can PostgreSQL get away
with zero synchronization between processes or threads?

The architectures are almost opposite of each other. I don't see how you
could combine the architectures. Effectively, you would need to have two
engines inside PostgreSQL, with the engine type selected from the table
type. MySQL sort of does this. In MySQL, some tables support transactions
while others do not. Is that what you are asking for?

> My original thoughts were integrating it into the sql level to allow the 
> database structure to define what we would want to cache in ram, which 
> is similar to what is happening with using pgmemcached.

In my experience, the most costly part of SQL, for very simple queries,
is the query planning. As soon as you have more than one key and one value,
you require query planning of some sort. I believe this is the SQL overhead.
Parsing the SQL, and determining how to best execute it.

Lighter weight databases, such as BerkeleyDB already exist to do what
you are asking for. memcache makes few guarantees. BerkeleyDB and
similar gives you transactions. PostgreSQL and similar give you SQL.
Each level of abstraction costs.

> Expanding create table to specify that a table gets priority in cache or 
> allocate x amount of cache to be used by table y could be a better way 
> than saying all of this table in ram.

Or, it could be a worse way. Where is the evidence that it would be better?

> I think the main benefit of my first ideas would come from the later 
> examples I gave where create memory tablespace with slaves would allow 
> the use of extra machines, effectively increasing the ram available 
> outside the current postgres setup.

MySQL has some sort of distributed scheme like this, based on a
partitioning of the keys. I'm don't know how great it is. Other than
the sales pitch we were given when MySQL gave us a presentation, I
haven't heard of it in use by others.

> Maybe implementing this idea as a way of increasing the current postgres 
> caching would be a better implementation than the memory tablespaces 
> idea. As in integrating a version of pgmemcached as an option into the 
> current caching layers. Thus implementing it at the config level instead 
> of the structure design. Although defining tables to get priority or 
> allocated space in the ram cache would fit well with that.

If there is a problem with the current cache algorithm, it should be fixed.

What is the problem with it?

I think the memcache people are thinking that the cost of PostgreSQL is
about the disk. Although the disk plays a part, I'm pretty sure it's
only a fraction. Not providing transaction guarantees, not providing an
SQL level abstraction, and not having multiple processes or threads
plays a much bigger part.

Cheers,
mark

-- 
[EMAIL PROTECTED] /

Re: [HACKERS] Postgresql Caching

2006-10-16 Thread mark

On Sun, Oct 15, 2006 at 06:33:36PM -0700, Jeremy Drake wrote:
> > 2) When updating a PostgreSQL record, I updated the memcache record
> >to the new value. If another process comes along in parallel before
> >I commit, that is still looking at an older view, cross-referencing
> >may not work as expected.
> Shouldn't you be able to use 2-stage commit for this?  Prepare to commit,
> update the memcache record, then commit?  Or am I thinking of something
> else?

Two stage commits makes the window of error smaller, it can't eliminate it.

I believe the window would be reduced to:

1) The transactions that are currently looking at older data, and:

2) Any memcache query that happens between the time of the PostgreSQL
   commit and the memcache commit.

It's a pretty small window. The question for my use case, would be
very hundreds of people clicking on web links per second, might happen
to hit the window. By setting the memcache store to 5 seconds instead
of the regular 60+, I seem to have eliminated all reported cases of the
problems. If the stored data is invalid, it only stays invalid for a
short time. I'm compromising accuracy for efficiency.

The thing about all of this is, if what memcache is doing could be
done with consistency? PostgreSQL would probably already be doing it
right now...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Postgresql Caching

2006-10-16 Thread mark

On Mon, Oct 16, 2006 at 05:59:05PM +0930, Shane Ambler wrote:
> > Registering each cache entry by the tables included in the query and 
> >invalidating the cache during on a committed update or insert 
> >transaction to any of the tables would, transparently, solve the 
> >consistency problem.
> That was part of my thinking when I made the suggestion of adding 
> something like memcached into postgres.

There is a valid suggestion in here, but I think it's the caching of
query plans, and caching of query plan results that the PostgreSQL
gain would be at.

The query to query plan cache could map SQL statements (with parameters
specified) to a query plan, and be invalidated upon changes to the
statistical composition of any of the involved tables. The query plan to
query results cache would keep the results and first and last transaction
ids that the results are valid for.

Although it sounds simple, I believe the above to be very complicated to
pursue. The real PostgreSQL hackers (not me) have talked at length about
it over the last while that I've read their mailing lists. They've come
up with good ideas, that have not all been shot down. Nobody is willing
to tackle it, because it seems like a lot of effort, for a problem that
can be mostly solved by application-side caching.

It's a subject that interests me - but it would take a lot of time, and
that's the thing that few of us have. Time sucks. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Postgresql Caching

2006-10-16 Thread mark

On Mon, Oct 16, 2006 at 12:40:44PM -0400, Neil Conway wrote:
> On Mon, 2006-10-16 at 13:59 +0200, Markus Schaber wrote:
> > It's already possible to do this, just create the TABLESPACE in a
> > ramdisk / tmpfs or whatever is available for your OS.
> This is not an ideal solution: if the machine reboots, the content of
> the tablespace will disappear, requiring manual administrator
> intervention to get Postgres running again.

It's enough to show whether disk read/write is the crux of this issue
or not. I suspect not.

Anybody have numbers?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [SPAM?] Re: [HACKERS] Asynchronous I/O Support

2006-10-20 Thread mark

On Fri, Oct 20, 2006 at 11:13:33AM +0530, NikhilS wrote:
> >Good idea, but async i/o is generally poorly supported.

> Async i/o is stably supported on most *nix (apart from Linux 2.6.*) plus
> Windows.
> Guess it would be still worth it, since one fine day 2.6.* will start
> supporting it properly too.

Only if it can be shown that async I/O actually results in an improvement.

Currently, it's speculation, with the one trial implementation showing
little to no improvement. Support is a big word in the face of this
initial evidence... :-)

It's possible that the PostgreSQL design limits the effectiveness of
such things. It's possible that PostgreSQL, having been optimized to not
use features such as these, has found a way of operating better,
contrary to those who believe that async I/O, threads, and so on, are
faster. It's possible that async I/O is supported, but poorly implemented
on most systems.

Take into account that async I/O doesn't guarantee parallel I/O. The
concept of async I/O is that an application can proceed to work on other
items while waiting for scheduled work in the background. This can be
achieved with a background system thread (GLIBC?). There is no requirement
that it actually process the requests in parallel. In fact, any system that
did process the requests in parallel, would be easier to run to a halt.
For example, for the many systems that do not use RAID, we would potentially
end up with scattered reads across the disk all running in parallel, with
no priority on the reads, which could mean that data we do not yet need
is returned first, causing PostgreSQL to be unable to move forwards. If
the process is CPU bound at all, this could be an overall loss.

Point being, async I/O isn't a magic bullet. There is no evidence that it
would improve the situation on any platform.

One would need to consider the PostgreSQL architecture, determine where
the bottleneck actually is, and understand why it is a bottleneck fully,
before one could decide how to fix it. So, what is the bottleneck? Is
PostgreSQL unable to max out the I/O bandwidth? Where? Why?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [SPAM?] Re: [HACKERS] Asynchronous I/O Support

2006-10-20 Thread mark

On Fri, Oct 20, 2006 at 05:37:48PM +0200, Zeugswetter Andreas ADI SD wrote:
> Yup, that would be the scenario where it helps (provided that you have
> a smart disk or a disk array and an intelligent OS aio implementation).
> It would be used to fetch the data pages pointed at from an index leaf,
> or the next level index pages.
> We measured the IO bandwidth difference on Windows with EMC as beeing 
> nearly proportional to parallel outstanding requests up to at least

Measured it using what? I was under the impression only one
proof-of-implementation existed, and that the scenarios and
configuration of the person who wrote it, did not show significant
improvement.

You have PostgreSQL on Windows with EMC with async I/O support to
test with?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-22 Thread mark

On Sun, Oct 22, 2006 at 06:06:13PM -0400, Tom Lane wrote:
> Intel Xeon EM64T (Fedora Core 5)
> 
> 8192 bytes4.4208797.633120
> 1024 bytes0.5717940.819372
> 64 bytes  0.0473540.071906
> 
> Intel Pentium 4 (Fedora Core 5)
> 
> 8192 bytes6.94232428.848572   (yes, really)
> 1024 bytes0.9052593.625360
> 64 bytes  0.0680650.260224

AMDX2 3800+ (Fedora Core 5)

STD CRC SLICE8 CRC

  8192 bytes8.5766357.170038
  1024 bytes1.5043611.402446
  64 bytes  0.1544590.144209

Odd that the AMD shows opposite of the two Intel numbers above, and
that it was an "Intel engineer" who wrote it. My first speculation is
that you did your Intel numbers backwards. My second speculation is
that you already thought of that and confirmed before posting. :-)

So yeah - not too impressive...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread mark

On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> > >> There are actually three checks used to detect end of WAL: zero record
> > >> length, invalid checksum, and incorrect back-pointer.  Zero length is
> > >> the first and cleanest-looking test, but AFAICS we have to have both of
> > >> the others to avoid obvious failure modes.
> > > The checksum protects from torn pages and disk errors. If you have
> > > full_page_writes set then you already believe yourself safe from torn
> > > pages and your disks could also already be CRC-checking the data.
> > No, because unlike tuples, WAL records can and do cross page boundaries.

> But not that often, with full_page_writes = off. So we could get away
> with just CRC checking the page-spanning ones and mark the records to
> show whether they have been CRC checked or not and need to be rechecked
> at recovery time. That would reduce the CRC overhead to about 1-5% of
> what it is now (as an option).

WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
guarantee in here that the 8 Kbytes worth of data will be written as
sequential writes, nor that the 8 Kbytes of data will necessarily
finish.

If the operating system uses 8 Kbyte pages, or the RAID system uses 8
Kbytes or larger chunks, and they guarantee sequential writes, perhaps
it is ok. Still, if the power goes out after writing the first 512
bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
might get better or worse, depending on the RAID configuration.

I'm almost wondering whether the three numbers are enough. I'm too busy
to sketch it all down and predict failure points... :-)

> Just a thought: Would there be benefit in not allowing page-spanning WAL
> records, when they are small? That would tend to reduce the number of
> WAL writes, even if it did cause some space wastage on disk. That would
> reduce the number of same block re-writes and might improve the
> sequential behaviour of WAL access. We never needed to think about this
> while full_page_writes=on was the only option.

Probably. Might not be much though.

> > > CRC-checked disks are actually the industry norm and have been for
> > > around 5 years.
> > Huh?  Disks have ALWAYS had CRCs, and this is in any case utterly
> > irrelevant to the database-crash risk.
> According to the man from Seagate...

Once upon a time when bits were stored the size of smarties (exaggeration)
Back in those days (before my time), they liked to use things like parity.

I didn't read the page, but perhaps there is some confusion between
CRC and error correction codes. Obviously, technologies were
introduced over time. I don't remember ever having a hard disk that
didn't have some sort of error detection. The 5 year claim seems
decades too short unless they are talking about a newer technology.
Even the old 5.25" DOS floppies seemed to be able to detect errors
rather than return invalid corrupted bits.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New CRC algorithm: Slicing by 8

2006-10-24 Thread mark

On Tue, Oct 24, 2006 at 05:05:58PM +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 10:18 -0400, [EMAIL PROTECTED] wrote:
> > On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> > > On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > > > No, because unlike tuples, WAL records can and do cross page boundaries.
> > > But not that often, with full_page_writes = off. So we could get away
> > > with just CRC checking the page-spanning ones and mark the records to
> > > show whether they have been CRC checked or not and need to be rechecked
> > > at recovery time. That would reduce the CRC overhead to about 1-5% of
> > > what it is now (as an option).
> > 
> > WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
> > guarantee in here that the 8 Kbytes worth of data will be written as
> > sequential writes, nor that the 8 Kbytes of data will necessarily
> > finish.
> > 
> > If the operating system uses 8 Kbyte pages, or the RAID system uses 8
> > Kbytes or larger chunks, and they guarantee sequential writes, perhaps
> > it is ok. Still, if the power goes out after writing the first 512
> > bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
> > might get better or worse, depending on the RAID configuration.
> 
> That is the torn-page problem. If your system doesn't already protect
> you against this you have no business turning off full_page_writes,
> which was one of my starting assumptions.

I wasn't aware that a system could protect against this. :-)

I write 8 Kbytes - how can I guarantee that the underlying disk writes
all 8 Kbytes before it loses power? And why isn't the CRC a valid means
of dealing with this? :-)

I'm on wrong on one of these assumptions, I'm open to being educated.
My opinion as of a few seconds ago, is that a write to a single disk
sector is safe, but that a write that extends across several sectors
is not.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Design Considerations for New Authentication Methods

2006-11-02 Thread mark

On Thu, Nov 02, 2006 at 10:48:29PM +0100, Magnus Hagander wrote:
> The same could apply to SSL cert based authentication, for users
> connecting from machines outside of my realm. Once you have "unlocked"
> the certificate, you can authenticate any number of times to any number
> of services that will accept this certificate *without* having to
> re-enter your password.

Why would you need to unlock it? SSL certificate is effectively a password
stored in a file of length 1024 bits or whatever.

> This is both a convenience for the user, and a requirement if you use
> OTPs.

I don't understand the OTP part. Single signon is only a convenience.
Ability to resume a session (provided by SSL) or ability to login using
a smaller authentication token than a certificate can be used to provide
performance improvement.

If the requirement is that no password is provided, password + SSL
certificate is not an improvement. Any token based authentication system
should allow for the token to become invalid at any time, and require
re-authentication using the primary mechanism.

The benefit to kerberos, from my perspective, is that it already exists,
and is widely used.

I prefer SSL certificates alone myself. All of my db passwords are randomly
generated anyways, and a 1024-bit randomly generated password is better than
a 64-bit password picked by a person at a keyboard. Having both isn't an
improvement I think. My own system at home uses RSA keys or SSH entry. I
don't bother with passwords anymore. If they can access my password, they
can access my certificate. If they can access my certificate, they can access
my password. Dual authentication models work better with very different
systems. For example, a USB key or digital display that I possess, and a
password that I know or have stored in a file.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Design Considerations for New Authentication Methods

2006-11-02 Thread mark

On Thu, Nov 02, 2006 at 07:49:01PM -0800, Joshua D. Drake wrote:
> To be honest, I have often wondered *why* we support kerberos outside of
> the uber l33t geek factor. I have not once in a commercial deployment
> had a business requirement for the beast. LDAP? Now that is a whole
> other issue :)

Isn't NFSv4 a big application that uses Kerberos? I seem to recall that
AFS may have been a large user as well.

The only reason it isn't widely used is because companies are slow to
change. We still use NIS for host names in too many places!

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Design Considerations for New Authentication Methods

2006-11-02 Thread mark

On Fri, Nov 03, 2006 at 08:05:05AM +0100, Magnus Hagander wrote:
> > > The same could apply to SSL cert based authentication, for users 
> > > connecting from machines outside of my realm. Once you have 
> > "unlocked"
> > > the certificate, you can authenticate any number of times to any 
> > > number of services that will accept this certificate 
> > *without* having 
> > > to re-enter your password.
> > Why would you need to unlock it? SSL certificate is 
> > effectively a password stored in a file of length 1024 bits 
> > or whatever.
> Because if someone can access this file, I don't want them to
> automticlly "be me". Say this file is on my smartcard - I most certainly
> want a PIN code before it logs me in.
> Now, if I trust my local machine reasonably well, this "unlock" can
> equal the "local login". But there's still an unlock sequence.

Yes - local login. I didn't think of it in that context, as I think
more of batch processes, or servers accessing the database. A person
accessing just doesn't seem significant to me. It's more of a special
case. :-)

I prefer to use PostgreSQL with a local socket, and passing of UNIX
credentials over the socket. If you can login to the account, you
can access all of the scripts owned by that account that have a
PostgreSQL username/password embedded within them anyways - so why
embed at all? Obviously, for remote database access, or for any system
with load sharing across systems accessing the same database, this
doesn't work too well and an alternative such as SSL certificates
becomes desirables. The effect is the same, though.

> > I don't understand the OTP part. Single signon is only a convenience.
> > Ability to resume a session (provided by SSL) or ability to 
> > login using a smaller authentication token than a certificate 
> > can be used to provide performance improvement.
> OTP can certainly be a *lot* more secure, when used in the right way.
> This of course rquires you use a two-factor system such as a token based
> key or challenge/response system. 

Not sure why it would be more secure by using a smaller key on second
entry. Sure the smaller key times out, but effectively you now have
two or more keys instead of one. :-)

> And it's not just a convenience. SSL reusme session doesn't work if the
> first login is to my fileserver, the second to my maliserver and the
> third to my database server. I would then require three separate OTP
> logins.

*nod*

> Since they would normally have a time-window, it will also
> noticably slow down the process since I'd have to wait for a new key
> before accessing each resource.

This presumes that you use a key system. SSL certificate is adequate
on its own for many uses... :-)

> > The benefit to kerberos, from my perspective, is that it 
> > already exists, and is widely used.
> Yes, that is a huge benefit.

Ignoring my liking of SSL certificates, and my defense of them, I agree
it is a huge benefit to support Kerberos for these reasons.

> > My own system at home uses RSA keys or 
> > SSH entry. I don't bother with passwords anymore. If they can 
> > access my password, they can access my certificate. If they 
> > can access my certificate, they can access my password. Dual 
> > authentication models work better with very different 
> > systems. For example, a USB key or digital display that I 
> > possess, and a password that I know or have stored in a file.
> Well, you know how to deal with passwords and authentication. Most users
> don't. Therefor using things like smartcard+PIN can *both* increase
> security *and* make things easier for them. To make it work in reality,
> that means you need to support whatever infrastructure standard other
> systems use, and that's most commonly Kerberos today. And second most
> common I would beleive is SSL/TLS certs.

*nod*

I agree.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions

2006-11-04 Thread mark

Compiled fine. Still a few warnings (using Fedora Core 6 / AMD64).

The new PG_MAGIC_MODULE requirement threw me for a loop. I expect it
will catch others off guard as well.

One of my complicated queries that I threw at it seems to run about
10% - 20% faster now, which is pretty sweet.

The multiline wrap text editor in psql works really well, except it
seems to screw up if I resize the terminal. If I restore the terminal
to its original size, and refresh, it fixes itself.

I'm using it on one of my productions system now, and nothing has
failed yet. :-)

I'm pretty happy.

Good job people.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions

2006-11-04 Thread mark

On Sun, Nov 05, 2006 at 01:15:51AM -0500, [EMAIL PROTECTED] wrote:
> One of my complicated queries that I threw at it seems to run about
> 10% - 20% faster now, which is pretty sweet.

I take this back. I forgot to 'analyze'. After 'analyze', the times
returned to the slower 8.1 times. :-(

I will have to investigate. The generated plan is more complex after
'analyze'...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions

2006-11-05 Thread mark

On Sun, Nov 05, 2006 at 09:11:07AM +, Simon Riggs wrote:
> On Sun, 2006-11-05 at 01:15 -0500, [EMAIL PROTECTED] wrote:
> > The new PG_MAGIC_MODULE requirement threw me for a loop. I expect it
> > will catch others off guard as well.
> Did you find the documentation adequate? Could you locate what you
> needed to know quickly and accurately? Do you think the change was
> adequately explained?

The documentation was good - once I checked developer.postgresql.org
for the latest copy.

> If that hit you then we're gonna get a few more people trip the same
> way. Do you have any suggestions as to how to avoid that experience for
> others?

I believe the release notes are inadequate. I've read them three times
before and it never stood out for me.

Currently it is referenced under:

E.1.3.16. Source Code Changes

   - Add PG_MODULE_MAGIC header blocks to all shared object files
   (Martijn van Ousterhout)

  The magic block prevents version mismatches between loadable object
  files and servers.

I believe I read this all three times as a "internal PostgreSQL change
for developers only". The PG_MODULE_MAGIC has a wider impact that what
is listed above.

I suspect this should be listed under "Migration to 8.2" or "Server
Changes" as "loadable modules are now required to include PG_MODULE_MAGIC
to protect the server from accidentally loading an incompatible module -
all loadable module authors must implement the change described in the
documentation to allow their loadable modules to work properly in 8.2".

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions

2006-11-05 Thread mark

On Sun, Nov 05, 2006 at 11:01:40AM -0500, Neil Conway wrote:
> On Sun, 2006-11-05 at 01:15 -0500, [EMAIL PROTECTED] wrote:
> > Compiled fine. Still a few warnings (using Fedora Core 6 / AMD64).
> Presumably those are just the standard warnings we can't easiy
> eliminate. If not, can you post them please?

They all appear harmless. For the uninitialized warnings, the compiler
is not able to prove that tm2timestamp, numericvar_to_int8, or
cost_bitmap_tree_node always stores to the final argument and does not
fetch from the final argument. An '= 0' would get rid of each of the
warnings, and might simplify some code (that does conditional
assignment to 0), although perhaps the goal is to improve performance
and avoid an assignment to '= 0' if not necessary.

I suspect initialization would have no measurable performance impact,
and would improve the maintainability of the code. One more warning
that people don't become trained to ignore. If tm2timestamp ever
did not assign to the final argument, the value would be zero, and
not random data from the stack.

For the label warning, I think it might be generated by bison/yacc,
and no REJECT rule is used?

I don't know about the nbtinsert.c warnings. It looks like part of a
structure is initialized, and then the structure is used. A little odd.

I've included them all below. Pretty few for an open source project. :-)

Cheers,
mark


gcc -O3 -march=athlon64 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wde
claration-after-statement -Wendif-labels -fno-strict-aliasing -I../../../../src/
include -D_GNU_SOURCE   -c -o nbtinsert.o nbtinsert.c
nbtinsert.c: In function ‘_bt_insertonpg’:
nbtinsert.c:1065: warning: ‘state.best_delta’ may be used uninitialized in this 
function
nbtinsert.c:1065: warning: ‘state.newitemonleft’ may be used uninitialized in th
is function
nbtinsert.c:1065: warning: ‘state.firstright’ may be used uninitialized in this 
function

gcc -O3 -march=athlon64 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wde
claration-after-statement -Wendif-labels -fno-strict-aliasing -I../../../../src/
include -D_GNU_SOURCE   -c -o costsize.o costsize.c
costsize.c: In function ‘cost_bitmap_or_node’:
costsize.c:707: warning: ‘subselec’ may be used uninitialized in this function
costsize.c:706: warning: ‘subCost’ may be used uninitialized in this function
costsize.c: In function ‘cost_bitmap_and_node’:
costsize.c:663: warning: ‘subselec’ may be used uninitialized in this function
costsize.c:662: warning: ‘subCost’ may be used uninitialized in this function
costsize.c: In function ‘cost_bitmap_heap_scan’:
costsize.c:514: warning: ‘indexSelectivity’ may be used uninitialized in this fu
nction
costsize.c:513: warning: ‘indexTotalCost’ may be used uninitialized in this func
tion


gcc -O3 -march=athlon64 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wde
claration-after-statement -Wendif-labels -fno-strict-aliasing -I../../../../src/
include -D_GNU_SOURCE   -c -o numeric.o numeric.c
numeric.c: In function ‘numericvar_to_int4’:
numeric.c:1777: warning: ‘val’ may be used uninitialized in this function
numeric.c: In function ‘numeric_int2’:
numeric.c:1867: warning: ‘val’ may be used uninitialized in this function
numeric.c: In function ‘numeric_int8’:
numeric.c:1820: warning: ‘result’ may be used uninitialized in this function



gcc -O3 -march=athlon64 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wde
claration-after-statement -Wendif-labels -fno-strict-aliasing -I../../../../src/
include -D_GNU_SOURCE   -c -o timestamp.o timestamp.c
timestamp.c: In function ‘timestamptz_zone’:
timestamp.c:4388: warning: ‘result’ may be used uninitialized in this function
timestamp.c: In function ‘timestamptz_timestamp’:
timestamp.c:4356: warning: ‘result’ may be used uninitialized in this function
timestamp.c: In function ‘timestamp2timestamptz’:
timestamp.c:4323: warning: ‘result’ may be used uninitialized in this function
timestamp.c: In function ‘timestamp_zone’:
timestamp.c:4215: warning: ‘result’ may be used uninitialized in this function
timestamp.c: In function ‘timestamptz_trunc’:
timestamp.c:3254: warning: ‘result’ may be used uninitialized in this function
timestamp.c: In function ‘SetEpochTimestamp’:
timestamp.c:1424: warning: ‘dt’ may be used uninitialized in this function
timestamp.c: In function ‘timestamptz_part’:
timestamp.c:1424: warning: ‘dt’ may be used uninitialized in this function
timestamp.c: In function ‘timestamp_part’:
timestamp.c:3799: warning: ‘timestamptz’ may be used uninitialized in this funct
ion
timestamp.c:1424: warning: ‘dt’ may be used uninitialized in this function
timestamp.c: In function ‘timestamp_in’:
timestamp.c:1424: warning: ‘dt’ may be used uninitialized in this function
timestamp.c: In function ‘timestamptz_in’:
timestamp.c:1424: warning: ‘dt’ may be used uninitialized in this function

gcc -O3 -march=athlon64 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -W

Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions

2006-11-10 Thread mark

On Fri, Nov 10, 2006 at 08:17:09PM -0500, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > On Sun, Nov 05, 2006 at 11:01:40AM -0500, Neil Conway wrote:
> >> Presumably those are just the standard warnings we can't easiy
> >> eliminate. If not, can you post them please?
> > They all appear harmless.
> The reason those "uninitialized variable" warnings got away from us is
> that gcc doesn't emit them at -O2 or below, so most of us never saw 'em
> before.  I've cleaned them up.

Cool. Thanks. I like my compiles warnings free. :-)

> The "find_rule" gripe is really a flex bug :-( ... not easy to avoid.

*nod*

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Bitmap index thoughts

2006-12-27 Thread mark

On Wed, Dec 27, 2006 at 10:16:54PM +1100, Gavin Sherry wrote:
> On Wed, 27 Dec 2006, Heikki Linnakangas wrote:
> > But actually I'm not convinced we need to worry about efficient storage
> > of small bitmaps at all. The typical use case for bitmap indexes is
> > large tables with small number of distinct values, and the problem
> > doesn't really arise in that scenario. Let's keep it simple for now, we
> > can enhance it in later releases.
> The scenario I'm concerned about is where a sales data base, say, has
> 100,000 products. However, only 500 or 1000 products are popular. They
> dominate, say >99% of the sales. The other 99,900 products consume a
> little bit over 8K each for very little benefit :-(.
> This is pretty contrived but it seem real world enough...

Seems like a good candidate for CREATE INDEX WHERE :-)

I wonder what would happen if somebody implemented automatic index
exclusion conditions after use of an INDEX proved to be in the realm
of the worst case scenario? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Bitmap index thoughts

2006-12-27 Thread mark

On Wed, Dec 27, 2006 at 03:42:36PM +, Heikki Linnakangas wrote:
> >I wonder what would happen if somebody implemented automatic index
> >exclusion conditions after use of an INDEX proved to be in the realm
> >of the worst case scenario? :-)
> I'm sorry, I don't understand that sentence...

I was joking about a rather magical automatic INDEX restriction
modifier. For example, if the index becomes large enough to matter
(100 Mbytes?) and any single key takes up more than, say, 20% of the
index, it might be cool if it would automatically add the value to
the restriction list, and prune the now wasted index records.

Sorry for inserting a silly joke in a serious discussion... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] TODO: GNU TLS

2006-12-28 Thread mark

On Thu, Dec 28, 2006 at 10:13:14AM -0800, Joshua D. Drake wrote:
> On Thu, 2006-12-28 at 13:01 -0500, Stephen Frost wrote:
> > * Joshua D. Drake ([EMAIL PROTECTED]) wrote:
> > > What is the consideration here? I read the thread and it appears that
> > > OpenSSL is not compatible with GPL? But we don't care about that right?
> > > The OpenSSL looks pretty BSDish to me, expect the advertising clause (is
> > > that what caused XFree86.org to fork?).
> > OpenSSL isn't compatible with the GPL. 

With few exceptions, you cannot derive or include GPL software in your
non-GPL software. The GPL works very hard to maintain this position to
"protect" the freedom of the user.

The GPL cannot control how OpenSSL is distributed, though. The OpenSSL
license controls this. I don't see any place in the (short and sweet!)
OpenSSL license that prevents it from being using in GPL software. Are
you reading some particular point in the OpenSSL license that I am not?
PostgreSQL isn't GPL software anyways, and there is certainly nothing
in the OpenSSL license preventing it from being used in PostgreSQL.

If the argument is that the 'whole derived product' must fit the
outter most provided license, then I think you should consider that
PostgreSQL should not include *ANY* GPL software, as any user of
PostgreSQL cannot be guaranteed of the generous freedoms provided by
the PostgreSQL license. Some components are covered by GPL, which is
restrictive compared to PostgreSQL.

Down this path is the impractical, and silly conclusion that all
software must be licensed under the exact same license to be
distributed. An all GPL distribution, for example. While those with a
political agenda such as Richard Stallman would cheer at this result,
those people do not have the power to force this will on us.

The Free Software Foundation provides an LGPL that has fewer restrictions
than GPL, out of recognition that their political goals are not practical
for all uses. LGPL software may be linked with GPL software without
invalidating the GPL rights of the user. GPL applies to the GPL part,
and LGPL applies to the LGPL part. All is well in the world.

In conclusion - I'll restate. The only license that can restrict the
distribution of OpenSSL, is the OpenSSL license. The GPL is not relevant
in determining where OpenSSL may be distributed to.

Anybody who believes OpenSSL is a problem, must be aware of some
software distribution for which the OpenSSL licensing terms are
unreasonable. I'm not sure who that would be. They ask for
attribution. They ask that their name not be used to promote another
product. They ask that their name not be used in the name of another
product. All of these terms seem fair to me.

> The original discussion stated that well placed attorneys in the market
> feel that the FSF is trying to reach beyond the hands of god on this one
> and that we don't need to worry about it (my words, for actual quote
> read thread ;)).

I agree with Tom - if they really want people to use GNUTLS, why did
they make it have such a different interface?

I recently had to choose between the two for a product at work, and
GNUTLS seemed to fall short in several areas. It was a race between
GNUTLS seeming to having superior documentation vs OpenSSL seeming to
have superior functionality. For my rather complicated requirements,
OpenSSL won out (function is more important than documentation), and
the product using it is about 90% complete.  It includes such ugliness
as OpenSSL/C code that needs to load the encrypted private key and
client/server x509 certificates from a Java Keystore (JKS)... Total
fun... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] TODO: GNU TLS

2006-12-28 Thread mark

On Thu, Dec 28, 2006 at 02:48:56PM -0500, Stephen Frost wrote:
> I disagree that the only way Postgres should support multiple
> libraries for a given component is if they provide the same API- we
> wouldn't have much in the way of authentication options if that was
> really the case.

I don't believe that was said. However, using two very different APIs
for the exact same purpose, providing the exact same functionality
would seem to require a business case. If fear of litigation over
what seems to be a non-existent point is the only business case, the
position deserves to be challenged.

There are other elements that could be included in the business case.
For example, the documentation for GNUTLS seems to be significantly
better.

I don't like fear mongering. It smells like FUD. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] TODO: GNU TLS

2006-12-28 Thread mark

at much
> more reason.  I didn't really expect a reaction of "there isn't a
> licenseing issue so we shouldn't add support for another library".  I
> could understand if people don't care about the licensing aspect but
> I don't really get this one-size-fits-all mentality when it comes to an
> SSL library.  We don't seem to feel that way about authentication
> mechanisms, operating systems, or even client libraries.

They're entirely different discussions. One is about politics. One is
about practical application.

With regard to practical application, I agree with you.

With regard to giving in to legal FUD, I feel it is my duty to stand up
to it as best as I can, to prevent it from becoming widely accepted.
Nothing personal, and we're both entitled to our own opinions on the matter.
You expressed yours. I've expressed mine. Hopefully truth is found from
the reading of both.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] TODO: GNU TLS

2006-12-28 Thread mark

gal or not. Richard Stallman doesn't want people to do
this, because it hurts his cause, but legally, he can't stop them,
as it is a necessary part of life with software. The product is always
derived from at least one non-GPL product, even if only the hardware
that it sits on. That is, until the FSF achieves its agenda of having
a 100% GPL system

> c) Authors of GPL code are very unlikely to prosecute such a claim
>since OpenSSL *is* OSS and does abide by the DFSG (I wonder if the
>FSF might find someone some day willing to, though they might not be
>insterested due to possible bad press, who knows...)

My opinion: This usually takes the form of 1) somebody forgetting to
honour a part of the license, 2) somebody pointing it out, 3) somebody
grudingly obeying. This might be a GPL-derived product that was released
from the authors as binary only, where the software is grudingly released
upon request (perhaps request by the FSF). This might be an OpenSSL-derived
product that was released from the authors without attributing OpenSSL
in the documentation, in which case they add a note in the documentation.
This could be a BSD-derived product that forgets to include a copy of
the BSD license in the distribution, in which case the license is
grudingly added back.

It doesn't matter what the license is. People know they should honour
the license. Ignorance is not an excuse. Choosing to ignore the license
is not honourable.

> d) PostgreSQL can be configured to not use SSL when it ships and
>therefore as shipped there isn't a problem (again, kind of a stretch)

This goes back to a). If the distribution includes OpenSSL, it must
honour the OpenSSL license. If PostgreSQL is not shipped with OpenSSL,
but allows it to be linked with OpenSSL, than the responsibility is
moved to the person who links the application and redistributes the
linked application. If the end user does it, then it is not a
redistribution, so copyright law cannot apply.

> e) The GPL doesn't apply to shared library linking

My opinion: I don't think this is true. LGPL doesn't apply to shared
library linking. GPL does.

> f) The OpenSSL falls under the 'distributed with the OS' clause of the
>GPL and therefore it isn't a problem that there are additional
>restrictions in its license (probably the most practical argument,
>but not sure it'd hold up when that'd basically make all of Debian
>exempt since it can all be considered 'part of the OS'...)

My opinion: Same as a). The responsibility is moved to the distributor.
The OS distribution must honour the license.

> Your comments above about a PostgreSQL obligation just don't make sense
> and so I have to conclude that you don't understand the issue, which may
> be my fault for not explaining it clearly enough but there's only so
> much I can do..  Hopefully this will help...

I think we've both concluded that the other person doesn't understand the
issue. Although it may be clear to you or I that the other is wrong, I
think the strength of the arguments will stand on their own as to whether
other people choose to believe you, I, or some third interpretation.

I've tried not to be presuming. I ask of you the same.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] TODO: GNU TLS

2006-12-29 Thread mark

On Fri, Dec 29, 2006 at 08:31:34PM +1300, Mark Kirkwood wrote:
> [EMAIL PROTECTED] wrote:
> >I will try again. It is a difficult subject for many.
> >GPL software derived from PostgreSQL must honour the restrictions defined
> >by the PostgreSQL (BSD) license.
> >GPL software derived from OpenSSL must honour the restrictions defined
> >by the OpenSSL license.
> >What is the difference? Do you see it? You speak of "compatibility" as
> >if it means that the above are different in some technical way. They
> >are NOT different. Just because the GPL >= the PostgreSQL license,
> >does not allow you to disobey the PostgreSQL license restrictions. You
> >*cannot* release your entire derived GPL product as GPL, if it is
> >distributed with PostgreSQL. The PostgreSQL component retains the
> >PostgreSQL licensing restrictions, The GPL restrictions do not
> >supercede or replace the PostgreSQL component and there is NOTHING the
> >GPL can do to change this.
> I think the issue revolves around the conditions that GPL stipulates 
> about "linking against" libraries requiring the entire product to be 
> *distributed* as GPL, even if components have differing licenses. This 
> is the so-called "viral" clause that gets much attention!

*nod*

This is what people are thinking. It's about direction where the grey zone
starts to occur. GPL is viral in that a GPL-derived product must be GPL.

We're not talking about including GPL code in OpenSSL, though. This is
about OpenSSL as the base library. The GPL cannot stipulate that a GPL
program may only be linked against GPL libraries, or used on a GPL
operating system, on GPL hardware. For example, if GNU coreutils is
compiled for HP-UX. This is "linking against" software that is much
more restrictive than the OpenSSL license. Is Stephen, or the people
whose opinion he is repeating, going to claim that the FSF considers
this an illegal distribution of GNU coreutils?

The GPL can only require that any product *derived* from the work
licensed with the GPL, only be derived from by products which are
themselves GPL.

I'm not that stunned that this is confusing to people. There is so much
invalid information available, and most of us try to stay away from law
as much as possible. It can be overwhelming. :-)

> Now as Tom pointed out, I dunno why OpenSSL suddenly gets so much 
> attention, but anyway, just trying to clarify why *in principle* that 
> Stephen F is talking about a valid *possible* interpretation of the 
> licensing maze...

*nod*

Thanks for the input.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] TODO: GNU TLS

2006-12-29 Thread mark

 camp,
regardless of what is said. We all have a tendencies to read each others
words, looking only for fault in what is said, purposefully choosing not
to assimilate the other persons contribution. It's called the "I am right
you are wrong" syndrome, and I'm not exempt from it.

I hope I provided value to this discussion. If not, I apologize.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] TODO: GNU TLS

2006-12-29 Thread mark

On Fri, Dec 29, 2006 at 10:32:34AM -0800, Joshua D. Drake wrote:
> Currently there has not been one technical argument that is valid to
> have us include GNU TLS.

1) The normal freedom that not being tied down to a single product
   provides. The same reason somebody might build MySQL + PostgreSQL
   support into their product. It usually forces a generic abstraction
   to be used, which may be a long term investment into a better code
   base within PostgreSQL.

2) Documentation is much better in GNUTLS. When using OpenSSL, I find
   myself frequently referring to the source code itself, as the best
   documentation available is for the now-possibly-out-of-date SSLeay.

3) Due to various political agendas, and legal confusion, GNUTLS has
   been steadily growing in popularity. One day it may be that GNUTLS
   is better maintained and well known than OpenSSL, at which point
   it might be a practical choice to only support GNUTLS, and drop
   support for OpenSSL entirely.

4) GNUTLS development seems more active? OpenSSL has been in a frozen/mature
   state for a while. I don't understand why OpenSSL is still labelled as
   0.9.x, which might indicate alpha quality, under heavy development.

I don't find the reasons too compelling - but they are points to
consider.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] TODO: GNU TLS

2006-12-30 Thread mark

On Sat, Dec 30, 2006 at 06:05:14PM +0100, Martijn van Oosterhout wrote:
> Except tht X.509 is already done (in a sense). The client can supply a
> certificate that the server can check, and vice-versa. You can't link
> this with the postgresql username yet, but I havn't seen any proposals
> about how to do that.

I suggest associating the SHA-1 fingerprint with the ROLE. I would love
to have this.

> The reason I wanted to use PGP is that I already have a PGP key. X.509
> certificates are far too complicated (a certificate authority is a
> useless extra step in my case).

I prefer to allow self-signed certificates approved by fingerprint,
rather than content - having a central authority vouche for a person's
right to use my system does not appeal to me. Yes, this does make
X.509 far too complicated. I have a tendency to put garbage in the
X.509 fields, and use only the private key / public key / fingerprint
of public certificate, which would match your use of PGP keys... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] TODO: GNU TLS

2006-12-30 Thread mark

On Sat, Dec 30, 2006 at 05:03:23PM -0500, Bruce Momjian wrote:
> Stephen Frost wrote:
> > I appriciate your pedantism but in the end it really doesn't matter very
> > much.  This is, aiui anyway, the way Debian interprets the various
> > licenses.  You're welcome to your own interpretation.
> That was my point --- that it isn't clear what "additional restrictions"
> are, and that an advertizing clause or additional license can be
> interpreted as the same thing.

Breaking my claim of silence - I think Stephen has presented the Debian
interpretation, but qualified several times that Debian may be incorrect.

I spent quite a lot of thread space on exactly this issue - that forcing
an additional license to be imposed on the user, is indeed an additional
restriction. People either get it or they don't. It would either stand up
in court or it wouldn't. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] TODO: GNU TLS

2006-12-31 Thread mark

On Sun, Dec 31, 2006 at 03:59:29PM +0100, Martijn van Oosterhout wrote:
> Please read the OpenSSL-GPL FAQ. They themselves acknowledge it's a
> problem, but claim they fall under the "operating system exception",
> which is fine for everyone except the distributor of the operating
> system.
> 
> http://www.openssl.org/support/faq.html#LEGAL2
> 
> They recommend that if you want to use OpenSSL, use a licence other
> than the GPL.

I believe this to be a slight misrepresentation. The section before
states "The OpenSSL team does not offer legal advice." The section you
quote then goes on to contradict this, by stating a position much more
conservative than your summary:

On many systems including the major Linux and BSD distributions,
yes (the GPL does not place restrictions on using libraries that
are part of the normal operating system distribution).

On other systems, the situation is less clear. Some GPL software
copyright holders claim that you infringe on their rights if you
use OpenSSL with their software on operating systems that don't
normally include OpenSSL.

If you develop open source software that uses OpenSSL, you may
find it useful to choose an other license than the GPL, or state
explicitly that "This program is released under the GPL with the
additional exemption that compiling, linking, and/or using OpenSSL
is allowed." If you are using GPL software developed by others,
you may want to ask the copyright holder for permission to use
their software with OpenSSL.

It seems your interpretation of the OpenSSL "position" is as
questionable as your interpretation of the GPL, and what the GPL can
legally require. :-)

Nobody has proven an issue exists. The only way to prove it would be
for an actual court case to set the precident.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [COMMITTERS] pgsql: StrNCpy -> strlcpy (not complete)

2007-02-10 Thread mark

On Sat, Feb 10, 2007 at 09:21:04PM -0500, Bruce Momjian wrote:
> Woh.  Peter, you realize one of the reasons we use StrNCpy as a macro is
> for performance.  I don't see strlcpy as a macro.  Are you going to
> change all call locations to strlcpy?  If so, have you measured the
> performance impact?

I think we had this discussion already. strncpy() copies N bytes,
whereas strlcpy() copies only as many bytes as necessary. For short
strings with larger buffers, strlcpy() wins. It's understood that
in many cases in PostgreSQL, the expectation is for short strings,
and it is not required for the later bytes to be '\0'.

I assume Peter is only changing the provably good uses? :-)

Cheers,
mark


> Peter Eisentraut wrote:
> > Log Message:
> > ---
> > StrNCpy -> strlcpy (not complete)
> > 
> > Modified Files:
> > --
> > pgsql/src/backend/bootstrap:
> > bootstrap.c (r1.229 -> r1.230)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/bootstrap/bootstrap.c.diff?r1=1.229&r2=1.230)
> > pgsql/src/backend/libpq:
> > crypt.c (r1.72 -> r1.73)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/libpq/crypt.c.diff?r1=1.72&r2=1.73)
> > hba.c (r1.159 -> r1.160)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/libpq/hba.c.diff?r1=1.159&r2=1.160)
> > ip.c (r1.39 -> r1.40)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/libpq/ip.c.diff?r1=1.39&r2=1.40)
> > pgsql/src/backend/nodes:
> > print.c (r1.83 -> r1.84)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/nodes/print.c.diff?r1=1.83&r2=1.84)
> > pgsql/src/backend/postmaster:
> > pgarch.c (r1.28 -> r1.29)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/pgarch.c.diff?r1=1.28&r2=1.29)
> > postmaster.c (r1.518 -> r1.519)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/postmaster.c.diff?r1=1.518&r2=1.519)
> > pgsql/src/backend/tcop:
> > postgres.c (r1.521 -> r1.522)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tcop/postgres.c.diff?r1=1.521&r2=1.522)
> > pgsql/src/backend/utils/misc:
> > guc-file.l (r1.46 -> r1.47)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/misc/guc-file.l.diff?r1=1.46&r2=1.47)
> > pgsql/src/bin/initdb:
> > initdb.c (r1.131 -> r1.132)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c.diff?r1=1.131&r2=1.132)
> > pgsql/src/bin/pg_ctl:
> > pg_ctl.c (r1.77 -> r1.78)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_ctl/pg_ctl.c.diff?r1=1.77&r2=1.78)
> > pgsql/src/bin/pg_dump:
> > pg_dumpall.c (r1.89 -> r1.90)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_dump/pg_dumpall.c.diff?r1=1.89&r2=1.90)
> > pgsql/src/bin/pg_resetxlog:
> > pg_resetxlog.c (r1.56 -> r1.57)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_resetxlog/pg_resetxlog.c.diff?r1=1.56&r2=1.57)
> > pgsql/src/interfaces/libpq:
> > fe-auth.c (r1.122 -> r1.123)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/libpq/fe-auth.c.diff?r1=1.122&r2=1.123)
> > fe-connect.c (r1.342 -> r1.343)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/libpq/fe-connect.c.diff?r1=1.342&r2=1.343)
> > pgsql/src/timezone:
> > pgtz.c (r1.49 -> r1.50)
> > 
> > (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/timezone/pgtz.c.diff?r1=1.49&r2=1.50)
> > 
> > ---(end of broadcast)---
> > TIP 6: explain analyze is your friend
> 
> -- 
>   Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
>   EnterpriseDB   http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-12 Thread mark

On Mon, Feb 12, 2007 at 12:48:06AM -0500, Tom Lane wrote:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > How about adding a new 2-byte field to header for in-page c_tid poiner
> > for HOT ?
> We just finished sweating blood to get the tuple header size down to 23
> bytes from 27 (which saves 8 bytes not 4 if MAXALIGN=8).  We are not
> going to blow that again on HOT.

I haven't had enough time to follow all of the details here - but if the
ability to update a row with minimal overhead, as long as there is extra
room in the same block is a great idea (it sounds appealing to me) - could
it be done with just a 1 byte list? 24 instead of 23 for the tuple size.

I'll try to catch up at some point. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Variable length varlena headers redux

2007-02-13 Thread mark

Hey all:

This is a randomly inserted distraction to tell you that I really like to
read about these ideas. No input from myself at this point. I'm happy with
the direction you are taking.

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread mark

On Wed, Feb 14, 2007 at 01:56:03PM -0500, Bruce Momjian wrote:
> Could we insert index tuples for the live tuple and then remove the root
> tuple, perhaps later?  So basically we break the chain at that time. 
> The problem there is that we basically have nothing better than what we
> have now --- we are just delaying the index insert, and I don't see what
> that buys us.

At some point - inserting into the block would not be possible, as
there is no free space. Would that be a good time to do the index
insert? Then, a later vacuum would eventually prune out the whole old
chain. As long as vacuuming the intermediate entries in the chain
keeps the block with free space, there is no need to remove the root
tuple. If space ever runs out (vacuum not running frequently enough -
many updates performed in the same interval) - fall back to the
mechanism that is being used today.

I see it buying increased performance for rows that are frequently
updated.  If it can delay modifying the indices to only once every 10
or more updates, it seems to me that the improvement should be
significant. Perhaps PostgreSQL could be used for page hit counters
again... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Plan for compressed varlena headers

2007-02-15 Thread mark

On Thu, Feb 15, 2007 at 10:42:49AM -0500, Tom Lane wrote:
> > #define VARDATA_4B(PTR) ((PTR)->va_4byte.va_data)
> > #define VARDATA_2B(PTR) ((PTR)->va_2byte.va_data)
> > #define VARDATA_1B(PTR) ((PTR)->va_1byte.va_data)
> I thought we had abandoned the 2-byte-header variant?  Maybe you need to
> start a separate thread about exactly which of the bit-level proposals
> you want to implement.  There were quite a few tradeoffs discussed in
> the previous thread IIRC.

I agreed with Tom in the last thread. The 2 byte case doesn't seem like
good value for the return.

Simpler analysis results in easier to optimize code for the compiler,
and less complexity stored on disk.

Please remove 2B. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] HOT WIP Patch - version 2

2007-02-20 Thread mark

On Tue, Feb 20, 2007 at 08:31:45PM +0530, Pavan Deolasee wrote:
> I see your point, but as you mentioned do we really care ? The chain
> needs to be broken so that the intermediate DEAD tuples can be
> vacuumed. We can't vacuum them normally because they could
> be a part of live HOT-update chain. Resetting the HOT-updated
> status of the root tuple helps to mark the index entry LP_DELETE
> once the entire HOT-update chain is dead.
> ...

For some reason this paragraph raised a query in my mind. Will we
be able to toggle this new "hot update" code at configure time, so
that we can measure what sort of effect this change has once it is
complete?

Even if only during the early testing cycles for the next release, I
think it would be useful.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] SCMS question

2007-02-26 Thread mark

On Mon, Feb 26, 2007 at 11:07:01AM +0100, Markus Schiltknecht wrote:
> Tom Lane wrote:
> >Yah know, the one bit of these pitches that always sounds like pure
> >snake oil is the claim that they offer some kind of mechanical solution
> >to merge conflicts.  AFAICS that has nothing to do with the SCMS in use
> >and everything to do with whether your "diff" command is AI-complete.
> 
> You should have said "merge" command. Every tried that? Or kdiff3? Try 
> it, you will be surprised!

I'll have to try kdiff3 - but the "merge" command, although it often works,
I strongly dislike when it marks up the lines as "there was a conflict here"
and gives you three files in the directory to choose to start from. This is
far too manual, which invites mistakes. If kdiff3 is more like the ClearCase
graphical merge utility, I would far prefer that. Can you say "I want change
2 followed by change 3" with checkboxes, a live final version to view, and
the ability to manually type or adjust lines in the final version to view?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SCMS question

2007-02-26 Thread mark

On Mon, Feb 26, 2007 at 02:57:03PM -0500, Andrew Dunstan wrote:
> Robert Treat wrote:
> >FWIW ClearCase also offers a command line version of its merge tool, where 
> >it shows three columns (a la diff --side-by-side) and allows you to pick 
> >which column you want to merge in (repo, change1, or change2 for example). 
> >It's a nice attempt at doing it on the command line, but the graphical 
> >version is so much better it's worth it to work out remote X and use that 
> >instead :-)
> If ClearCase is held up as a model SCM system the God help us.
> True story: a few years back I had a team that where some of them had 
> been using ClearCase, and when they joined our team switched to CVS. A 
> few months later the company made a company-wide decision to standardise 
> on ClearCase. Within weeks I had them begging and pleading with me to be 
> allowed to go back to CVS.

This shouldn't be a religious discussion - but I can't let the above go.
Anybody who prefers CVS over ClearCase for any reasons *other* than
financial reasons - doesn't understand SCM.

CVS is pretty close to the bottom for me - below it is SCCS/RCS.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-14 Thread mark

On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote:
> "David Fetter" <[EMAIL PROTECTED]> writes:
> > CREATE TABLE symptom (
> > symptom_id SERIAL PRIMARY KEY, /* See above. */
> > ...
> > );
> >
> > CREATE TABLE patient_presents_with (
> > patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> > UNIQUE(patient_id, symptom_id)
> > );
> 
> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
> all boolean values.

Where is the boolean above? It is M:N, with each having whatever data
is required.

The issue I have with the above is that it seems unnecessarily
inefficient.  Whenever mapping from a patient to a symptom, or a
symptom to a patient, it requires searching indexes for three tables.
Perhaps this would work well if there was heavy overlap of symptoms
for different patients. For the cases I have hit this problem,
however, there may be overlap, but it is not easy to detect, and even
if it was detected, we would end with some sort of garbage collection
requirements where symptoms are removed once all references to the
symptoms are removed.

The case most familiar to me, is a set of viewing preferences for web
pages. Some users specify no preferences, while others have dozens of
preferences. As I have no requirements to search for users with a
particular preference, I chose to solve this by packing many of the
preferences together into a TEXT field, and having the application
pack/unpack the data. I still have tables that map object id to
attribute/value, but they are used for the data that can require
longer queries. Without clustering the data, searching for a dozen
of these attributes requires either querying all attributes, where
the attributes could be scattered throughout the table, or querying
them one by one, which is worse.

If there was an efficient way to do this for both of my use cases,
I would be strongly tempted to use it. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

2007-03-14 Thread mark

On Wed, Mar 14, 2007 at 03:25:48PM +, Gregory Stark wrote:
> <[EMAIL PROTECTED]> writes:
> > On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote:
> >> "David Fetter" <[EMAIL PROTECTED]> writes:
> >> > CREATE TABLE symptom (
> >> > symptom_id SERIAL PRIMARY KEY, /* See above. */
> >> > ...
> >> > );
> >> > CREATE TABLE patient_presents_with (
> >> > patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> >> > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> >> > UNIQUE(patient_id, symptom_id)
> >> > );
> >> I'm just glad I don't have your doctor. I hope mine doesn't think 
> >> symptoms are all boolean values.
> > Where is the boolean above? It is M:N, with each having whatever data
> > is required.
> No, the above schema can only show whether a patient has or doesn't have a
> symptom. There is nowhere to store *where* the pain, inflammation, swelling,
> aneurism, etc is, or how severe it is, or when it occurred, etc.

What do you think the '...' is in the symptom table? :-)

I'm not saying I would do it that way - but it seems to me that this
is a bit of a religious debate, without people listening to each
other. For one person to suggest a workable solution, and another
person to write it off as quickly as you did, misunderstanding, or
misrepresenting it, is what I would call "not listening". :-)

> In any case the above arguably *is* an EA schema anyways. Your "symptom" is
> just as much an abstract meaningless concept from a database point of view as
> the questionnaire's "answer" or the bug tracker's "tag". Especially once you
> start actually having to record information *about* the symptom.

> This is a silly argument. The only reasonable conclusion is that any dogmatic
> principle that doesn't take into account the application requirements is
> wrong. In some cases you want a flexible abstract schema because the
> application is flexible and abstract, in others you need the database schema
> to understand your specific data structures so it can help you manipulate it.
> You have to pick which is more useful for your application, you can't have
> your cake and eat it too.

No. Another reasonable conclusion is that the answer is not simple. This
doesn't mean the answer is undesirable. It means that people need more
time. :-)

I prefer abstractions, especially if they are more efficient than if I were
to roll my own each time. Relational databases do a good job today. They can
do a better job tomorrow.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread mark

On Fri, Jun 23, 2006 at 10:24:06AM -0400, Alvaro Herrera wrote:
> Mark Woodward wrote:
> > > In case of the number of actively modified rows being in only tens or
> > > low hundreds of thousands of rows, (i.e. the modified set fits in
> > > memory) the continuous vacuum process shows up as just another backend,
> > > not really taking order of magnitude more resources. It mainly generates
> > > WAL traffic, as modified pages are already in memory/cache and are
> > > mostly synced by background writer and/or checkpoint.
> > > Of course you have to adjust vacuum_cost_* variables so as to not
> > > saturate IO.
> > These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
> > where it is very lacking.
> We all know Postgres is lacking; some of us try to improve it (some with
> more success than others).  People who know the current limitations but
> like the capabilities, try to find workarounds to the problems. What
> surprises me is that, if you have such a low opinion of Postgres, you
> still use it.

If everybody had good opinions, where would the development come from?

It's the parts that suck that need fixing the most... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread mark

On Fri, Jun 23, 2006 at 03:08:34PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > ...
> > suggesting.  We're having a hard enough time debugging and optimizing
> > *one* storage model.  I think the correct path forward is to stick with
> > the same basic storage model and vacuuming concept, and address the
> > known performance issues with better-optimized vacuuming.  No, it will
> > never be perfect for every scenario, but we can certainly make it much
> > better than it is now, without risking killing the project by
> > introducing undebuggable, unmaintainable complexity.
> Well, are you suggesting we just stop improving the database?  I am sure
> not.  But, your suggestion is that we can't do better without incurring
> more complexity (true), and that complexity will not be worth it.  I
> don't agree with that until I see some proposals, and shutting down
> discussion because they will add complexity or are fruitless seems
> unwise.

It sounds like everybody agrees that things need to be fixed, and genuinely
motivated people are trying to offer what they have to the table.

Tom already has enough on his plate, as do most others here - so unless
a competent champion can take up the challenge, discussion is all we have.

I'm not liking the "we should do it this way," "no, we should do it that."
My read of the situation is that both may be useful, and that both should
be pursued. But one set of people can't pursue both.

Is any who is able, able to take up this challenge? Perhaps more than one,
from both major directions? (vacuum on one side, and improved storage on
the other) Somebody with the time and skill, who can work through the
design discussions on one of the aspects?

I want to contribute soon, and this is the sort of thing that interests me -
but I still don't have time yet, and there would be no guarantee that I
succeeded. Somebody else? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-24 Thread mark

On Sat, Jun 24, 2006 at 03:29:47AM -0400, Jan Wieck wrote:
> >It sounds like everybody agrees that things need to be fixed, and genuinely
> >motivated people are trying to offer what they have to the table.
> One singe core team member responds vaguely in a way, you feel being 
> supportive of your case, and you conclude that "everybody agrees"? 
> Sorry, x'use me?

> There are a couple of side effects on this "update in place" issue that 
> aren't even mentioned yet. Nobody with any significant in depth 
> knowledge of the Postgres non-overwriting storage engine concept seems 
> to suggest any move towards a storage system, that does updates in place 
> that require "undo" operations in case of a process/system failure. 
> You're ready to "fix" all those places to support the undo you need? You 
> must have a big table.

Jan: Who on the list has claimed that nothing is broken and nothing needs
to be improved? Are you making this claim?

Shutting down ideas is not constructive. Encouraging those with ideas to
step up and do more than talk could be.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

2006-06-28 Thread mark

On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> Instead, I would like to humbly request the inclusion of a UUID datatype 
> (or an opaque 128 bit datatype) in the core package. It's increasingly 
> common and some databases (MS SQLServer) already have built in support for 
> it.

We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the
type has value, that is is generic, and would be widely used without
being abused. All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark

On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
> > On Wed, June 28, 2006 1:14 pm, [EMAIL PROTECTED] wrote:
> > > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> > >> Instead, I would like to humbly request the inclusion of a UUID
> > >> datatype (or an opaque 128 bit datatype) in the core package. It's
> > >> increasingly common and some databases (MS SQLServer) already have built
> > >> in support for it.
> > > We have it. We're just not putting in the effort required to have it
> > > included in core, as it's too much effort to convince people that the type
> > > has value, that is is generic, and would be widely used without being
> > > abused. All the geometric types that I'll never use in core, with few or
> > > no uses, including functions to operate on these types, and no UUID
> > > type... Hehe... To me, that's irony... :-)
> > Is it on pgfoundry? From past discussions, the new criteria for getting
> > something into core is to first determine if it is successful on
> > pgfoundry.
> If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
> downloaded it. But I find that exceptionally hard to believe...
> 
> Looking back through the list archives I think you'd find this comes up
> at least every few months.

I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.

Two of us worked on a re-write based off a different UUID system library,
and I've been happily using it in production for a year or so. I don't
believe either of us have bothered to market it. Each time it comes up,
a number of people on this list shut it down, and it doesn't seem worth
the effort to convince them otherwise. They can have their ivory tower,
and I can have my plugin.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark

On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
> > Personally I don't buy the misuse objection - we already have plenty of 
> > things that can be misused. As long as there is a reasonable valid use 
> > and we can make it portable enough, I think there is a good case for 
> > including it.
> Well, since Mark has one, how about we consider adding it in?
> If nothing else, can you please put your stuff on pgFoundry so others
> can find it, Mark?

It was written by Nathan Wagner <[EMAIL PROTECTED]> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:

 "Permission to use, copy, modify, and distribute this software for
  any purpose with or without fee is hereby granted, provided that
  the above copyright notice and this permission notice appear in all
  copies."

I haven't tested to see how portable the OSSP UUID implementation is.
This is their words:

 "OSSP uuid was already written with maximum portability in mind, so
  there should be no great effort required to get it running on any Unix
  platform with a reasonable POSIX API. Additionally, the portability
  was tested by successfully building and running it on the following
  particular Unix platforms (syntax is "- ()"):

  alpha-tru644.0 (cc)
  alpha-tru645.1 (gcc, cc)
  hppa-hpux11.11 (cc)
  ia64-hpux11.23 (cc)
  ix86-debian2.2 (gcc, icc)
  ix86-debian3.0 (gcc)
  ix86-debian3.1 (gcc)
  ix86-freebsd4.9 (gcc)
  ix86-freebsd5.2 (gcc, icc)
  ix86-netbsd1.6 (gcc)
  ix86-qnx6.2 (gcc)
  ix86-solaris10 (gcc)
  ix86-unixware7.1.3 (cc)
  mips64-irix6.5 (gcc)
  sparc64-solaris8 (gcc, forte)
  sparc64-solaris9 (gcc)"

I've put it through a fair amount of testing, including using it
within compound indexes, expecting the index to be used for at
least '=', constructing many UUIDs quickly, in a sequence, and
converting it to and from string form. We chose to implement our
own encode / decode routines for performance reasons. With the
exception of testing it on a wider range of platforms, I would
call the module stable.

If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark

On Thu, Jun 29, 2006 at 02:02:32AM -0400, [EMAIL PROTECTED] wrote:
> It was written by Nathan Wagner <[EMAIL PROTECTED]> and myself, and
> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
> I'm not an expert on the license, but it seems acceptable to me:
> ...
> If there is interest - I'm sure Nathan and I would be willing to put
> it on pgfoundry, and at some point give it up for inclusion into
> PostgreSQL.

This might require a little bit of research. It appears that the
development version of OSSP UUID may provide its own PostgreSQL
'bindings'. I may try and contact the author of the OSSP UUID and
see whether any changes we have that he does not, can be rolled
into his version...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-30 Thread mark

On Fri, Jun 30, 2006 at 08:53:28AM +0200, Thomas Hallgren wrote:
> Josh Berkus wrote:
> >> I agree about splitting the utilities, except that I think the database
> >> should be able to generate UUIDs somehow.
> > There is a GUID add-in, and someone is working on a 2nd one.  UUIDs 
> > are not part of the SQL standard, and we've only seen sporadic demand 
> > for them (and different types each time) so I can't imagine one making 
> > it further than contrib real soon.
> >
> > Also, one could argue that UUIDs are a foot gun, so they're not 
> > exactly the type of thing we want to advocate in advance of demand.

Josh: Although PostgreSQL is easy to extend - it is still beyond many
people to put in the time required to learn how. The demand exists.
It's the supply that doesn't. People work around the problem.
Perhaps they are using one of the sequence number 'tricks' such as
having each site be allocated a range, or modulus. I was willing to
learn how to implement a UUID type, and not willing to use one of
these sequence number hacks.

If you want to call UUID a foot gun - then please call sequence
numbers hacks a foot gun as well, to be fair, and then we can start
to talk about how a theoretically perfect system should work.

If UUID existed in core, people would use it. It would be used, and it
would be abused - like most other PostgreSQL features. There would be
nothing unique about this. And what's the real danger anyways? People
who should be using 4 byte or 8 byte sequences, find that they lose
a little performance, and that their databases are larger than they
expected? UUID is designed not to collide. So what is the real danger
you are speaking about? How bad can they shoot themselves?

> Martijn van Oosterhout wrote:
> > It seems to me that maybe the backend should include a 16-byte fixed
> > length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> > then people can use that to build whatever they like, using domains,
> > for example...
> So how about the split? I.e. just add a 16 byte data type and forget all 
> about UUID's for now.

Martijn: Were you thinking that it would look like a really big integer,
displayed by default as a decimal string in the client?

This makes sense to me.

If it was a full data type - could it be passed around in memory by
value, and not as a pointer? Or is 16 bytes too big to pass around by
value?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-30 Thread mark

On Fri, Jun 30, 2006 at 10:38:49AM +0200, Martijn van Oosterhout wrote:
> On Fri, Jun 30, 2006 at 04:04:19AM -0400, [EMAIL PROTECTED] wrote:
> > > > It seems to me that maybe the backend should include a 16-byte fixed
> > > > length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> > > > then people can use that to build whatever they like, using domains,
> > > > for example...
> > > So how about the split? I.e. just add a 16 byte data type and forget all 
> > > about UUID's for now.
> > Martijn: Were you thinking that it would look like a really big integer,
> > displayed by default as a decimal string in the client?
> > This makes sense to me.
> Either that, or a hex string. My problem with displaying as integer is
> that not many clients will be able to parse (or print) a 16-byte
> integer (the C library doesn't do it), but anyone can write a
> hex-to-binary converter, or convince scanf/printf to do it for them.

No real preference here. I'd be happy to have a native 16-byte type.

> > If it was a full data type - could it be passed around in memory by
> > value, and not as a pointer? Or is 16 bytes too big to pass around by
> > value?
> You can't pass it by value (doesn't fit in a register on the CPU and
> there is no corrosponding C type), and I'm not sure you'd want to. A
> pointer is much easier and faster to pass around.

It depends how it is used. If the memory location needs to be
allocated, for the value to be used only a few times, the overhead of
allocation and redirection can be more expensive. If many though, than
the reduction in value copying can make the pointer faster. 64-bytes,
and 128-bytes are just on the line of not being clearly one or the
other. It was just a thought though. The PostgreSQL API seemed pretty
fixed the last time I looked at this stuff.

> The other thing I was thinking of is a type generator, like so:
> # select make_raw_hex_type(16,'uuid');
> NOTICE: Created raw hex type 'uuid' of fixed length 16
>  make_raw_hex_type
> ---
> (0 rows)
> # select '1234FF'::uuid;
> ERROR: Bad length for type 'uuid'
> # select 'hello world'::uuid;
> ERROR: Invalid characters for type 'uuid'
> # select '1234567890abcdef'::uuid;
>  ?column?
> --
>  1234567890ABCDEF
> (1 row)

> Only this could be used to create other types too, for cryptographic
> functions for example. PostgreSQL doesn't have any type generators yet,
> so I'm unsure whether a patch creating one would be accepted for core.

Not sure what I think of this. I suppose the intention would be for it
to work for lengths other than 16? I can see people wanting to use such
a generalized function for char as well as bytea, for at least latin1
characters...

If people agree to a generic 16-byte type, or a hex type with defined
fixed length with a set of standard functions and index operators that
it should work for, but nobody more qualified wants to make the patch
- I'll step up.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] update/insert,

2006-07-05 Thread mark

On Wed, Jul 05, 2006 at 04:59:52PM +0200, Zeugswetter Andreas DCP SD wrote:
> > OK, but the point of the question is that constantly updating 
> > a single row steadily degrades performance, would 
> > delete/insery also do the same?
> Yes, there is currently no difference (so you should do the update).
> Of course performance only degrades if vaccuum is not setup correctly.

As Martijn pointed out, there are two differences. One almost
insignificant having to do with internal linkage. The other that
multiples queries are being executed. I would presume with separate
query plans, and so on, therefore you should do the update.

For the case you are talking about, the difference is:

 1) Delete which will always succeed
 2) Insert that will probably succeed

Vs:

 1) Update which if it succeeds, will stop
 2) Insert that will probably succeed

In the first case, you are always executing two queries. In the second,
you can sometimes get away with only one query.

Note what other people mentioned, though, that neither of the above is
safe against parallel transactions updating or inserting rows with the
same key.

In both cases, a 'safe' implementation should loop if 2) fails and
restart the operation.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-07-06 Thread mark

On Thu, Jun 29, 2006 at 06:47:17PM +0200, Martijn van Oosterhout wrote:
> It seems to me that maybe the backend should include a 16-byte fixed
> length object (after all, we've got 1, 2, 4 and 8 bytes already) and
> then people can use that to build whatever they like, using domains,
> for example...

So... Back to this.

It won't happen unless somebody does it - and I realize that people
are busy with their own projects, so unless somebody more willing and
better suited will step up, I'm going to take a stab at getting
advanced consensus.

Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.

1) The added 128-bit type should take the form of:

a) UUID, with all functions
b) UUID, with only basic generation functions + encode/decode/indexable
c) UUID, with only encode/decode/indexable - generic except for the
   name of the type, and the encoding format.
d) Generic 128-bit type - same as c) except may not encode or decode
   as UUID (dashes). Either a large number (hex string?), or binary data.
e) Generic n-byte binary data type generator. Not sure of feasibility
   of this at this point. See thread.

2) According to your answer in 1), the added 128-bit type should be:

a) In core first.
b) In contrib first.
c) In pgfoundry first.

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] A couple thoughts about btree fillfactor

2006-07-10 Thread mark

On Mon, Jul 10, 2006 at 12:36:34PM -0400, Tom Lane wrote:
> Now that the index options infrastructure is in, I am having a couple of
> second thoughts about the specific behavior that's been implemented,
> particularly for btree fillfactor.
> 1. The btree build code (nbtsort.c) is dependent on the assumption that
> the fillfactor is at least 2/3rds.  This is because we must put at least
> two keys in each page, and with maximally sized keys (1/3rd page) it
> might try to put only 0 or 1 tuple in a page if fillfactor is small.
> However, maximally sized keys are certainly a corner case, and in more
> usual situations a smaller fillfactor could be useful.  I'm thinking
> we could change the nbtsort.c code to work like "stop filling page
> when fillfactor is exceeded AND there are at least two entries already".
> Then any old fillfactor would work.

I like the idea. Do you think there should be a way of packing certain
indexes tighter, once they are known to be mostly read only? For
example, an option on REINDEX? This would free PostgreSQL to use a
smaller fillfactor while still allowing people to optimize those of
their tables that would benefit from a higher fillfactor once they
become mostly static?

> 3. What should the minimum fillfactor be?  The patch as submitted
> set the minimum to 50% for all relation types.  I'm inclined to
> think we should allow much lower fillfactors, maybe down to 10%.
> A really low fillfactor could be a good idea in a heavily updated
> table --- at least, I don't think we have any evidence to prove
> that it's not sane to want a fillfactor below 50%.

If there was a way of packing relations tighter, allowing much lower
fillfactors should be fine.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] A couple thoughts about btree fillfactor

2006-07-10 Thread mark

On Mon, Jul 10, 2006 at 03:17:01PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > ... Do you think there should be a way of packing certain
> > indexes tighter, once they are known to be mostly read only? For
> > example, an option on REINDEX? This would free PostgreSQL to use a
> > smaller fillfactor while still allowing people to optimize those of
> > their tables that would benefit from a higher fillfactor once they
> > become mostly static?
> Isn't it sufficient to change the fillfactor and REINDEX?

I've never tried that - if it works sure... :-)

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] More nuclear options

2006-07-11 Thread mark

On Tue, Jul 11, 2006 at 03:53:26PM -0400, Josh Berkus wrote:
> >All I am saying is that it couldn't hurt to put the information out 
> >there...   we're not hurting for disk space and none of this stuff appears 
> >inherently wrong, just outdated, but it might still prove useful for some 
> >people.   
> Again, it's the same question.  If *you* want to be the maintainer, I'll 
> put it on pgfoundry.  Otherwise, you're asking me to be responsible for 
> the code because you don't want to throw it away.

To present a somewhat external opinion - I've looked at pgfoundry in
the past and been both confused and disappointed. Code that doesn't
compile, with no maintainer gives me a dirty taste in my mouth, and
my inner voice says "what the heck is this crap?"

There are several open source projects that give me this taste. Please
help PostgreSQL not be on this list by pruning dead projects, or
poor quality projects from the public image. It's EMBARASSING! :-)

Thanks.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Three weeks left until feature freeze

2006-07-13 Thread mark

On Thu, Jul 13, 2006 at 09:29:06AM -0400, Jonah H. Harris wrote:
> I'm being objective here, and PL/J is not nearly as stable or
> well-maintained... that means a lot to me or to anyone who looks at
> using a Java PL.  Do we intend to ship both and say that one is less
> capable?  Have you used either of them?  Don't get me wrong, I like
> PL/J in concept... but it's just not even close to production-ready
> yet.  I know of no one using PL/J in production and about 40 or so
> people using PL/Java.

On the subject of 38K lines of code, much that isn't C (going by memory,
I apologize if this is wrong), how many of these lines could be/should be
shared between PL/Java and PL/J? It seems to me that the general concepts
should be in common, and that it is only how the Java interfaces with the
backend that changes. Could they not be one PL, with two mechanisms for
speaking to the backend?

I agree with competition to improve quality, but at some point, with
too few maintainers, and one project clearly more advanced in terms of
maturity than the other, that perhaps having two separate projects
does not make sense. It sounds to me like PL/Java is rich in terms of
PostgreSQL abstractions, and that this shouldn't be a reason to
penalize it. Does it really matter how much Java code there is in it?
It's only the C code that needs to interface with the backend.

Or perhaps I'm out to lunch, and the PL/Java abstractions are tightly
tied to the backend API, and there is thousands of lines of unnecessary
code.

Now you are going to make me try them both out. I have not tried either.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Three weeks left until feature freeze

2006-07-13 Thread mark

On Thu, Jul 13, 2006 at 11:03:27AM -0400, Tom Lane wrote:
> The only argument I find interesting for including the PLs in core
> (which has zilch to do with how any particular packager ships them)
> is that it's easier to do maintenance that way: if we make a change in
> an API that affects the PLs, we can change the PLs at the same time.

> However, that argument only holds water if the core developers are
> able/willing to make the corresponding changes.  And in that light,
> the fact that PL/Java includes a huge whack of non-C code is very
> significant.  *I* won't take responsibility for fixing PL/Java when
> I break it, because I don't know Java well enough.  I don't know what
> other people who do core development feel about that --- but I dislike
> the idea that when someone changes such an API, the buildfarm will go
> all red because there's only one person with the ability to fix PL/Java.

Tom:

Currently, the PL implementations combine both language-specific glue
and language-specific abstractions together. In light of your comments
below, and the opinion I expressed in my previous response, I find
myself wondering whether this architecture is contributing to the
problem.

Would it make sense for this architecture to be split?

My thinking is that much of the code in the Perl, TCL, Java, ... PL
implementations is related to language-specific abstractions and
documentation, and does not need to be bundled with the core, nor
does it need to be tested as a part of the core. For example, I imagine
that many of the lines in PL/Java could be distributed as a single
hardware-independent .jar file separate from the core, if the core
exposed the required API to Java.

Where this could go, is that the core developers would only be
responsible for ensuring that the backend API functions as documented,
without needing to understand how these functions are exposed to the
user. You agree to maintain Java interfaces to the C functions. No more,
no less. If somebody else wants to build complicated abstractions on top,
or wants to provide thousands of pages of documentation - this is their
choice, but would be distributed separate from the core, but would be
simple to plug in.

Am I just spitting crazy talk, or is this making sense?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Three weeks left until feature freeze

2006-07-13 Thread mark

On Thu, Jul 13, 2006 at 01:02:16PM -0400, Tom Lane wrote:
> PL/Java will improve the visibility of PL/Java to people who won't go
> looking for it.  That's fine, but ultimately that's a packaging argument
> not a development argument.  The people who think PL/Java is an
> essential checklist item undoubtedly also think JDBC is an essential
> checklist item, but I'm not seeing any groundswell of support for
> putting JDBC back into core.  Instead we expect packagers (like the RPM
> set) to make JDBC available alongside the core postgres packages.
> That's how PL/Java ought to be handled, too, IMHO.

JDBC is different, in that it doesn't require the PostgreSQL core to
build. It's 100% native Java, and as such, I see benefit to it being
distributed separately.

This is why I was thinking that the problem is that the backend (SPI?) 
API isn't exposed as native methods in the required languages. If just
the SPI API was exposed from the core to the languages, the
maintenance effort and size should be less, and the add-ons would not
require that they be built with the PostgreSQL core, making it easy to
integrate them after the fact.

If this is just crazy talk - please stop me.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread mark

On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote:
> Jie Zhang wrote:
> > > IIRC they quoted the cardinality of 1 as something that is still
> > > faster than btree for several usecases.
> > > 
> > > And also for AND-s of several indexes, where indexes are BIG, your btree
> > > indexes may be almost as big as tables but the resulting set of pages is
> > > small.
> > Yeah, Hannu points it out very well -- the bitmap index works very well
> > when columns have low cardinalities and AND operations will produce small
> > number of results.
> What operations on columns of low cardinality produce a small number of
> results?  That seems contradictory.

Not necessarily. Cardinality of 2 means 1/2. 3 means 1/3. 4 means 1/4. Etc.

Reading 1/4, for a larger table, has a good chance of being faster than
reading 4/4 of the table. :-)

No opinion on whether such tables exist in the real world - but the
concept itself seems sound.

> > Also, the bitmap index is very small in low cardinality cases, where the
> > btree tends to take up at least 10 times more space.
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?

Without looking at the code, but having read the Oracle docs, I would
guess yes. Every vacuum/delete would need to clear all of the bits for
the row. Every insert/update would need to allocate a bit in at least
the bitmap tree for the row inserted/updated. Seems like more pages
may need to be written. Although perhaps some clever optimization
would limit this. :-)

It seems interesting though. We won't really know until we see the
benchmarks. I'm seeing it as a form of working directly with the
intermediate form of the bitmap index scanner. If the existing index
scanner, building the bitmaps on the fly can out-perform the regular
index scanner, I'm seeing potentially in a pre-built bitmap.

Obviously, it isn't the answer to everything. The use I see for it,
are a few of my 1:N object attribute tables. The table has an object
identifier, and a column indicating the attribute type that the value
is for. If I have 20 or more attribute type values, however, most
objects include rows for most attribute types, my regular index ends
up repeating the attribute type for every row. If I want to scan the
table for all rows that have a particular attribute type with a
particular value, it's a seqscan right now. With the bitmap scanner,
knowing which rows to skip to immediately is readily available.

Will it be worth it or not? I won't know until I try it. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread mark

On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > Reading 1/4, for a larger table, has a good chance of being faster than
> > reading 4/4 of the table. :-)
> Really?
> 
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page.  So using an index provides
> zero I/O savings on the heap side, and any fetches needed to read the
> index are pure cost.  Now you have to demonstrate that the CPU costs
> involved in processing the index are significantly cheaper than the cost
> of just testing the WHERE qual at every heap tuple --- not a bet that's
> likely to win at a one-in-four ratio.

Haha. Of course - but that's assuming uniform spread of the values.
Next I would try clustering the table on the bitmap index... :-)

My databases aren't as large as many of yours. Most or all of them
will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these,
but the WHERE clause might be.

But yeah - we don't know. Waste of code or performance boost.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] On-disk bitmap index patch

2006-07-28 Thread mark

On Fri, Jul 28, 2006 at 02:43:23PM -0700, Luke Lonergan wrote:
> On 7/28/06 1:25 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote:
> > What we don't want to happen is for us to release bitmapped indexes, and
> > find out later that btree is better in all cases.  Then we have to tell
> > people not to use bitmapped indexes until we fix it in the next major
> > releasse.  FYI, that is  basically where we are right now with hash
> > indexes.
> On this thread people have presented results that show clear and irrefutable
> evidence that there are use cases where bitmap indexes outperform Btree for
> many datatypes on realistic problems, including the TPC-H benchmark.

Irrefutable is a little optimistic, don't you think? :-)

There is reason to believe that a bitmap index is useful in some
scenarios. We're not yet clear on what these are, whether they apply
to production use scenarios, or whether b-tree could not be optimized
to be better.

I support you - I want to see these great things for myself.

But irrefutable? Irrefutable is not true. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] interesting article: Leverage your PostgreSQL V8.1 skills to learn DB2, Version

2006-08-05 Thread mark

On Sat, Aug 05, 2006 at 10:01:30AM +0200, Pavel Stehule wrote:
> I found maybe interesting article 
> http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0603wasserman2/

That for the article. I had been meaning to look at DB2, and it gave me a
quick summary.

What I get from the article, though, is that DB2 is more modularized than
PostgreSQL, however, it has concurrency issues. Is this true? Anybody allowed
to comment on the peformance of small-sized (~1 million rows) databases?

I'm not seeing a compelling reason to switch.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [BUGS] numerics lose scale and precision in views of unions

2006-08-10 Thread mark

On Wed, Aug 09, 2006 at 11:35:48PM -0400, Tom Lane wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > * Tom Lane ([EMAIL PROTECTED]) wrote:
> >> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
> >> could it say?  "less" and "greater" are surely wrong).  So you need to
> > It could say "not equal" pretty reasonably as the scale is
> > different.
> Nope, there are exactly three options: equal, less, greater.
> btree doesn't understand anything else.

Mathematically, 1.0 = 0.9500... -> 1.05000...

In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:

1.0 = 0.95
1.0 = 1.05
1.0 <> 0.94
1.0 <> 1.06
0.94 < 1.0
1.06 > 1.0

I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [BUGS] numerics lose scale and precision in views of unions

2006-08-10 Thread mark

On Thu, Aug 10, 2006 at 03:40:11AM -, Andrew - Supernews wrote:
> On 2006-08-10, Stephen Frost <[EMAIL PROTECTED]> wrote:
> >> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
> >> could it say?  "less" and "greater" are surely wrong).  So you need to
> > It could say "not equal" pretty reasonably as the scale is
> > different.
> The comparison function must be trichotomous, that is the possible results
> are "equal", "greater", and "less". There is no other "not equal" result.

And it is wrong - because they are equal. But 1.0 is also equal to 1.01.
We shouldn't go there. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [BUGS] numerics lose scale and precision in views of unions

2006-08-10 Thread mark

On Thu, Aug 10, 2006 at 09:20:09AM -0400, [EMAIL PROTECTED] wrote:
> Mathematically, 1.0 = 0.9500... -> 1.05000...
> 
> In theory, B-Tree could be fine with this. As long as the operators
> for =, <>, <, and > are made to consistently understand this principle.
> For example:
> 
> 1.0 = 0.95
> 1.0 = 1.05
> 1.0 <> 0.94
> 1.0 <> 1.06
> 0.94 < 1.0
> 1.06 > 1.0
> 
> I'm not sure that it warrants being changed, though. Storing mixed
> precision values in a column and expecting the database to understand
> rounding rules does not appeal to me. Too magical.

I take this back. It's magical enough to be impossible.

If 1.0 is inserted, and then 1.05, it could consider them 'equal' and
put them together. If 1.1 and 1.14 were inserted, they could be 'equal'
and put together. Then an insertion of '1' would have to re-group them
all together, making the B-Tree ineffective.

So - taking back what I say above.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Coverity reports looking good

2006-08-20 Thread mark

On Sun, Aug 20, 2006 at 11:52:53AM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Whats basically left is a large number of memory leaks in frontend
> > applications such as pg_dump, initdb, pg_ctl, etc. These haven't ever
> > really been a priority (buildACLCommands is really bad in this
> > respect).
> Well, if Coverity's idea of good programming practice is that every
> program must explicitly free everything it ever malloced before it
> terminates, then I'm afraid we'll have to agree to disagree.  The
> above is nothing but make-work as far as programs with short intended
> runtimes go.  A leak in psql would be interesting if it can accumulate
> across command-execution cycles, but I have zero interest in "cleaning
> up" any of the programs mentioned above.

Each of the reported issues should be investigated, for however short, to
see whether it is intentional or not.

For example, if the memory is allocated within a loop, or which the bounds
are not fixed, even a short running program can benefit from being fixed.

If it is just configuration data represented in memory, created once,
who cares... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-20 Thread mark

On Sun, Aug 20, 2006 at 04:46:30PM -0400, Douglas McNaught wrote:
> Naz Gassiep <[EMAIL PROTECTED]> writes:
> > I have a PostgreSQL installation on a Debian box that had the 64bit
> > SMP kernel installed before PostgreSQL was compiled and installed on
> > it. Does PostgreSQL take any advantage of the 64 bit environment or
> > have we not done anything to move into the 64 bit world yet?
> Depends on whether PG was compiled as 64-bit or 32-bit--is your
> toolchain 64-bit all the way, or is it just the kernel?

I think he means - have benchmarks, or profiling been done with the
goal of specifically improving performance on 64-bit platforms.

For most applications available today, the answer is no. Compiling an
application designed for 32-bit, on a 64-bit architecture, does not
automatically improve performance. Too frequently, it can actually
reduce performance. Pointers are large, which means that any
application that is heavily pointer based can be forced to deal with
twice as many copies of memory, which reduces the effectiveness of
the various cache levels, and RAM itself.

Hopefully GLIBC counts here, in that it should contain 64-bit specific
code where it might count, so libc calls should be able to take
advantage of the 64-bit machine instructions.

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Mon, Aug 21, 2006 at 02:56:10PM +1200, Andrej Ricnik-Bay wrote:
> On 8/21/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >Is there an interest, or any active project to examine PostgreSQL in
> >the area of 64-bit processors? Has it already been done? I don't recall
> >seeing a reference to it in my travels. I'm also not sure on what to
> >expect for results, as the territory is still new. 64-bit processors
> >have existed for a while, but 32-bit processors have been the popular
> >choice, making 64-bit support an after thought?
> That's certainly just a reference to the wintel world? AIX, HP-UX
> and Solaris-Sparc have been 64-bit for a while now...

I don't think so. In the Open Source world, most projects are still 32-bit
centric, regardless of how many years the products have been supported on
64-bit platforms.

What application were you thinking of that takes full advantage of 64-bit,
making the 64-bit application much significantly faster than the 32-bit
application? The only area I am aware of, is video processing.

It's often a surprise to people that an upgrade to 64-bit, regardless of
CPU architecture, too often ends up slower, rather than faster.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Sun, Aug 20, 2006 at 11:00:26PM -0400, Douglas McNaught wrote:
> [EMAIL PROTECTED] writes:
> > Is there an interest, or any active project to examine PostgreSQL in
> > the area of 64-bit processors? Has it already been done? I don't recall
> > seeing a reference to it in my travels. I'm also not sure on what to
> > expect for results, as the territory is still new. 64-bit processors
> > have existed for a while, but 32-bit processors have been the popular
> > choice, making 64-bit support an after thought?
> I find this question a bit amusing, since PG has run on 64-bit
> architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while
> now. :)

I don't think so. Software can be designed to take best advantage of
hardware. Recompiling it for a different architecture, running test
cases, and declaring support, is not the same as optimizing for.

> As I said in a private email to Naz, the main advantage I think you'd
> see from 64-bit is the ability to run with more than 2GB or so of
> shared buffers on a system with lots of RAM.  Whether you'd want to do
> that, or let the OS do most of the buffering, is an open question...

This is what I mean by after thought. PostgreSQL is designed for
32-bit processors. Which is fine. I'm not complaining. The question
was whether there is an interest in pursuing 64-bit specific
optimizations. In the PostgreSQL code, a quick check points me only to
"has long int 64" as a 64-bit source code #ifdef. Of the six places
that reference this, five of them actually slow down the code, as they
check for overflow of the 'long int' result beyond 4 bytes of
data. The sixth place is used to define the 64-bit type in use by
PostgreSQL, which I suspect is infrequently used.

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:

1) require significant re-architecture

2) reduce the performance in a 32-bit world

It's a question that only half interests me. As with most projects, I
don't think the projects are ready to re-architect for this
purpose. Perhaps once 50%+ of people are running PostgreSQL in 64-bit
mode, the question will be more serious to more people.

As a half interesting question, I'm defending it as a valid question.
Please don't write it off, but it is fine to say "not yet, we have more
important things to work on".

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Mon, Aug 21, 2006 at 10:23:16AM -0400, Douglas McNaught wrote:
> [EMAIL PROTECTED] writes:
> > I believe the answer is no. No or few 64-bit optimization possibilities
> > have been chased down, probably because some or many of these would:
> > 1) require significant re-architecture
> > 2) reduce the performance in a 32-bit world
> Honestly, I think the main "optimizations" happen automatically: the
> compiler uses the extra registers (if present) and pointers in LP64
> mode are automatically 64-bit, which allows much larger memory areas.
> That's probably 95% of the win right there.  What other
> "optimizations" need doing?

Compilers are getting smarter, but having code profiled, and hand tuned
for 32-bits does not automatically get compiled with a 64-bit compiler,
to code that has been profile, and hand tuned for 64-bit.

An example of this is in how algorithms are implemented. For
Intel-style processors at least, using array offsets is cheaper than
using many pointers. For 64-bit processors, 32-bit array offsets would
use less memory.

For re-architecture, the first thing that came to mind was Datum
increasing in size. Copy by value instead of allocating a pointer,
and passing the pointer. The pointer will be 64-bits, so Datum
could support 64-bits as well. Tom caught this, but he did point
out what I missed. This would only benefit data types that are
64-bits in length, which are still likely uncommon (most people
would use serial, not bigserial).

Has anybody taken any of the 64-bit processor optimization manuals,
and walked through it, cross referencing it against bottleneck
programs in PostgreSQL? To counter this, I suggest to you that most
every attempt to optimize PostgreSQL for performance reasons, likely
considered variations in algorithms measured on a 32-bit processor,
finally selecting the version that was simplest, and best performing,
on a 32-bit processor. This is normal evolution for a piece of
software. It is naturally optimized for the processor that is most
used by the users.

I'm not the expert on the matter. I don't see evidence that anybody
has taken this route though.

> People have been running PG with big databases on 64-bit iron for
> years; don't you think any glaring infelicities would have been
> addressed?

I doubt there are 'glaring infelicities'. I doubt any changes required
to provide a noticeable speed improvement would be one liners. They will
not be obvious. They will require a complete understanding of the many
architectural differences between common 32-bit processors, and common
64-bit processors. Few have this information in their head. I don't.
I suspect that anybody who claims that the only difference is the amount
of addressable memory, and the number of registers doesn't. :-)

> > It's a question that only half interests me. As with most projects, I
> > don't think the projects are ready to re-architect for this
> > purpose.
> What re-architecting would be needed?

I'm asking that it be admitted that it has not been looked at. Not
seriously. Nor is this unique to PostgreSQL. I expect the performance
for Linux applications in general to slowly improve on 64-bit
processors as more and more people begin to use it. I recall a few
threads on other mailing lists where the primary developers admitted
to not having ready access to a 64-bit machine. Until they do, the
chance that these applications are hand-tuned for 64-bit is unlikely.

I'm not claiming that I have the answers. I am claiming that a few of
you share this weakness, and that an assumption that PostgreSQL is
designed for 64-bit would be incorrect. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote:
> > WRT 64-bit and Postgres, it depends on the CPU as to whether you
> > see a simple performance benefit.  On the Opteron you will see a
> > benefit when doing CPU bound work.  When doing the CPU portion, the
> > additional registers of the Opteron running in 64-bit mode are used
> > by the compiler to produce a 20-30% boost in performance.  On the
> > Xeon in 64-bit mode, the same regions of execution will slow down
> > by about 5%.
> 
> > Postgres benefits automatically from the larger memory addressing
> > of the 64-bit kernel by using the larger I/O cache of Linux.
> 
> Main benefit Postgres in 64-bit mode possible only in case dedicated 
> DB server on system with RAM > 3GB and use most part of RAM for 
> shared buffers and avoid persistent moving buffers between OS cache 
> and shared memory. On system with RAM below 2-3GB to difficult found 
> serious gain of performance.

This is the main difference between PostgreSQL today - designed for
32-bit - when recompiled with a 64-bit compiler.

The additional registers are barely enough to counter the increased
cost of processing in 64-bits.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Mon, Aug 21, 2006 at 05:13:58PM +0200, Martijn van Oosterhout wrote:
> On the whole, PostgreSQL is still in the phase where we're trying to
> find the best algorithms. When that's done (if ever), we can start
> worrying about processor optimisations...

I don't disagree. :-)

You claimed that PostgreSQL is not tuned to a specific processor, and
relies on the kernel, the compiler, and libc to perform some tuning. I
agree with this as well.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PostgreSQL on 64 bit Linux

2006-08-21 Thread mark

On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> >> What re-architecting would be needed?
> 
> > I'm asking that it be admitted that it has not been looked at. Not
> > seriously. Nor is this unique to PostgreSQL. I expect the performance
> > for Linux applications in general to slowly improve on 64-bit
> > processors as more and more people begin to use it. I recall a few
> > threads on other mailing lists where the primary developers admitted
> > to not having ready access to a 64-bit machine. Until they do, the
> > chance that these applications are hand-tuned for 64-bit is unlikely.
> 
> What you seem to be asking for is that we engage in a lot of
> machine-specific tuning for some particular processor or other.
> That has not happened and will not happen.  This project believes
> in portable code, not "optimized for Xeon-of-the-month" code.

Hi Tom.

I'm asking for a statement such as the above, that you provide. :-)

The question was whether PostgreSQL was optimized for 64-bit Linux.
The answer is "not specifically".

The answer is quite acceptable to me. It's not the original answer that
was given to the original poster though... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] jabber.postgresql.org is up

2006-08-27 Thread mark

On Sun, Aug 27, 2006 at 08:49:21PM -0700, Joshua D. Drake wrote:
> AgentM wrote:
> >Sorry, but I don't get it. Why offer a closed forum for an open project?

Open source developers already send private emails, private instant
messages, or join private chat rooms.

> Because jabber is a live medium, unlike email. I don't want people 
> pinging me, out of the blue. It is the whole reason I don't use any of 
> the public networks. The point is for the people who are actually part 
> of the project infrastructure to be able to communicate.

> In the end it will likely be opened up more, but for now we are taking 
> baby steps.

Assuming the conversations are not secret, and they are being used for an
official purpose, should they not be displayed as a public log, accessible
from a www.postgresl.org?

Baby steps that are away from an open model would serve to discourage
public knowledge or public contribution. I find myself already surprised
about features finding their way into PostgreSQL. Most often the surprise
is pleasant, however, it seems there is a disconnect in the communication
if somebody monitoring the mailing lists cannot determine what will be
included in, say, PostgreSQL 8.2, without asking "what will be in
PostgreSQL 8.2?"

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread mark

On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote:
> With time, it becomes ever clearer to me that prepared SQL
> statements are just a really bad idea.  On some days, it seems like
> half the performance problems in PostgreSQL-using systems are
> because a bad plan was cached somewhere.  I'd say, in the majority
> of cases the time you save parsing and planning is irrelevant
> compared to the possibly disastrous effects of wrong or suboptimal
> plans.  I wonder if other people have similar experiences.
> ...
> Comments?

Hello.

I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.

Wouldn't this require that the tables are completely re-written, or
that their data is drastically updated? For my own tables, most of the
data remains static for months on end. Data is accumulated. Small
changes are made. I don't see why a prepared statement used over a
24 hour period would ever become disastrous.

This suggests to me that you are doing either:

   1) Maintaining prepared statements for weeks or months at a time.

   2) Churning your tables up into a froth.

I'm guessing, as you mentioned JDBC, that you might be hitting 1), in
the context of JDBC being used from a Web Application, where the
application server holds a connection open for weeks or months at a
time. If so, it does sound as if JDBC is doing wrong by keeping
prepared queries around for that long. A time limit of an hour, or
even a few minutes would make sense.

My experience does not match yours. Prepared queries have always
significantly improved my execution times. They do have a place.
Whatever the scenarios you are hitting should be dealt with, possibly
in JDBC.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1446 matches

Mail list logo