Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Stephen,

On 12/2/05 12:18 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:


Just a thought, but couldn't psql be made to use the binary mode of
libpq and do at least some of the conversion on the client side?  Or
does binary mode not work with copy (that wouldn't suprise me, but
perhaps copy could be made to support it)?


Yes - I think this idea is implicit in what David suggested, and my response
as well.  The problem is that the way the client does conversions can
potentially differ from the way the backend does.  Some of the types in
Postgres are machine intrinsic and the encoding conversions use on-machine
libraries, each of which preclude the use of client conversion methods
(without a lot of restructuring).  We'd tackled this problem in the past and
concluded that the parse / convert stage really belongs in the backend.


I'll bet this parsing cost varys greatly with the data types used, I'm 
also willing to bet that for the data types that hae different encoding on 
different systems there could be a intermediate encoding that is far 
faster to parse then ASCII text is.


for example, (and I know nothing about the data storage itself so this is 
just an example), if the issue was storing numeric values on big endian 
and little endian systems (and 32 bit vs 64 bit systems to end up with 4 
ways of holding the data) you have a substantial cost in parseing the 
ASCII and converting it to a binary value, but the client can't (and 
shouldn't) know which endian type and word size the server is. but it 
could create a big endian multi-precision encoding that would then be very 
cheap for the server to split and flip as nessasary. yes this means more 
work is done overall, but it's split between different machines, and the 
binary representation of the data will reduce probably your network 
traffic as a side effect.


and for things like date which get parsed in multiple ways until one is 
found that seems sane, there's a significant amount of work that the 
server could avoid.


David Lang


The other thought, of course, is that you could use PITR for your
backups instead of pgdump...


Totally - great idea, if this is actually a backup / restore then PITR plus
filesystem copy (tarball) is hugely faster than dump / restore.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Michael Stone wrote:


On Fri, Dec 02, 2005 at 01:24:31PM -0800, Luke Lonergan wrote:

From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can 


Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win.


it's a lot easier to throw hardware at the problem by spliting your 
incomeing data between multiple machines and have them all working in 
parallel throwing the data at one database then it is to throw more 
hardware at the database server to speed it up (and yes, assuming that MPP 
splits the parseing costs as well, it can be an answer for some types of 
systems)


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:



I've used the binary mode stuff before, sure, Postgres may have to
convert some things but I have a hard time believing it'd be more
expensive to do a network_encoding -> host_encoding (or toasting, or
whatever) than to do the ascii -> binary change.


From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can (I
know *we* can :-).  It's a matter of safety and generality - in general you
can't be sure that client machines / OS'es will render the same conversions
that the backend does in all cases IMO.


One more thing - this is really about the lack of a cross-platform binary
input standard for Postgres IMO.  If there were such a thing, it *would* be
safe to do this.  The current Binary spec is not cross-platform AFAICS, it
embeds native representations of the DATUMs, and does not specify a
universal binary representation of same.

For instance - when representing a float, is it an IEEE 32-bit floating
point number in little endian byte ordering? Or is it IEEE 64-bit?  With
libpq, we could do something like an XDR implementation, but the machinery
isn't there AFAICS.


This makes sense, however it then raises the question of how much effort 
it would take to define such a standard and implement the shim layer 
needed to accept the connections vs how much of a speed up it would result 
in (the gain could probaly be approximated with just a little hacking to 
use the existing binary format between two machines of the same type)


as for the standards, standard network byte order is big endian, so that 
should be the standard used (in spite of the quantity of x86 machines out 
there). for the size of the data elements, useing the largest size of each 
will probably still be a win in size compared to ASCII. converting between 
binary formats is useally a matter of a few and and shift opcodes (and 
with the core so much faster then it's memory you can afford to do quite a 
few of these on each chunk of data without it being measurable in your 
overall time)


an alturnative would be to add a 1-byte data type before each data element 
to specify it's type, but then the server side code would have to be 
smarter to deal with the additional possibilities.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Micahel,

On 12/2/05 1:46 PM, "Michael Stone" <[EMAIL PROTECTED]> wrote:


Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win.


If it were possible in light of the issues on client parse / convert, then
we should analyze whether it's a performance win.

In the restore case, where we've got a dedicated server with a dedicated
client machine, I don't see why there would be a speed benefit from running
the same parse / convert code on the client versus running it on the server.
Imagine a pipeline where there is a bottleneck, moving the bottleneck to a
different machine doesn't make it less of a bottleneck.


your database server needs to use it's CPU for 
other things besides the parseing. you could buy a bigger machine, but 
it's useally far cheaper to buy two dual-proc machines then it is one 
quad proc machine (and if you load is such that you already have a 
8-proc machine as the database, swallow hard when you ask for the price 
of a 16 proc machine), and in addition there is a substantial efficiancy 
loss in multi-proc machines (some software, some hardware) that may give 
you more available work cycles on the multiple small machines.


if you can remove almost all the parsing load (CPU cycles, memory 
footprint, and cache thrashing effects) then that box can do the rest of 
it's stuff more efficiantly. meanwhile the client can use what would 
otherwise be idle CPU to do the parseing.


if you only have a 1-1 relationship it's a good question  as to if it's a 
win (it depends on how much other stuff each box is having to do to 
support this), but if you allow for multiple clients it easily becomes a 
win.


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:

And how do we compose the binary data on the client?  Do we trust that 
the client encoding conversion logic is identical to the backend's?  If 
there is a difference, what happens if the same file loaded from 
different client machines has different results?  Key conflicts when 
loading a restore from one machine and not from another? - Luke


the same way you deal with text data that could be in different encodings, 
you tag your message with the format version you are useing and throw an 
error if you get a format you don't understand how to deal with.


if a client claims to be useing one format, but is instead doing something 
different you will be in deep trouble anyway.


remember, we aren't talking about random application code here, we are 
talking about postgres client code and libraries, if the library is 
incorrect then it's a bug, parsing bugs could happen in the server as 
welll. (in fact, the server could parse things to the intermediate format 
and then convert them, this sounds expensive, but given the high clock 
multipliers in use, it may not end up being measurable)


David Lang

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Database restore speed

2005-12-03 Thread Simon Riggs
On Fri, 2005-12-02 at 15:18 -0500, Stephen Frost wrote:

> The other thought, of course, is that you could use PITR for your
> backups instead of pgdump...

Yes, it is much faster that way.

Over on -hackers a few optimizations of COPY have been discussed; one of
those is to optimize COPY when it is loading into a table created within
the same transaction as the COPY. This would allow pg_dumps to be
restored much faster, since no WAL need be written in this case.
I hope to work on this fairly soon.

Dumping/restoring data with pg_dump has wider uses than data protecting
backup.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Jan Wieck

On 12/2/2005 6:01 PM, Michael Riess wrote:


Hi,

thanks for your comments so far - I appreciate it. I'd like to narrow 
down my problem a bit:


As I said in the other thread, I estimate that only 20% of the 15,000 
tables are accessed regularly. So I don't think that vacuuming or the 
number of file handles is a problem. Have a look at this:


What makes you think that? Have you at least tried to adjust your shared 
buffers, freespace map settings and background writer options to values 
that match your DB? How does increasing the kernel file desctriptor 
limit (try the current limit times 5 or 10) affect your performance?



Jan






content2=# select relpages, relname from pg_class order by relpages desc 
limit 20;

  relpages | relname
--+-
 11867 | pg_attribute
 10893 | pg_attribute_relid_attnam_index
  3719 | pg_class_relname_nsp_index
  3310 | wsobjects_types
  3103 | pg_class
  2933 | wsobjects_types_fields
  2903 | wsod_133143
  2719 | pg_attribute_relid_attnum_index
  2712 | wsod_109727
  2666 | pg_toast_98845
  2601 | pg_toast_9139566
  1876 | wsod_32168
  1837 | pg_toast_138780
  1678 | pg_toast_101427
  1409 | wsobjects_types_fields_idx
  1088 | wso_log
   943 | pg_depend
   797 | pg_depend_depender_index
   737 | wsod_3100
   716 | wp_hp_zen

I don't think that postgres was designed for a situation like this, 
where a system table that should be fairly small (pg_attribute) is this 
large.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Michael Riess

Jan Wieck schrieb:

On 12/2/2005 6:01 PM, Michael Riess wrote:


Hi,

thanks for your comments so far - I appreciate it. I'd like to narrow 
down my problem a bit:


As I said in the other thread, I estimate that only 20% of the 15,000 
tables are accessed regularly. So I don't think that vacuuming or the 
number of file handles is a problem. Have a look at this:


What makes you think that? Have you at least tried to adjust your shared 
buffers, freespace map settings and background writer options to values 
that match your DB? How does increasing the kernel file desctriptor 
limit (try the current limit times 5 or 10) affect your performance?





Of course I tried to tune these settings. You should take into account 
that the majority of the tables are rarely ever modified, therefore I 
don't think that I need a gigantic freespace map. And the background 
writer never complained.


Shared memory ... I currently use 1500 buffers for 50 connections, and 
performance really suffered when I used 3000 buffers. The problem is 
that it is a 1GB machine, and Apache + Tomcat need about 400MB.


But thanks for your suggestions! I guess that I'll have to find a way to 
reduce the number of tables. Unfortunately my application needs them, so 
I'll have to find a way to delete rarely used tables and create them on 
the fly when they're accessed again. But this will really make my 
application much more complex and error-prone, and I had hoped that the 
database system could take care of that. I still think that a database 
system's performance should not suffer from the mere presence of unused 
tables.


Mike

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Alvaro Herrera
Michael Riess wrote:

> Shared memory ... I currently use 1500 buffers for 50 connections, and 
> performance really suffered when I used 3000 buffers. The problem is 
> that it is a 1GB machine, and Apache + Tomcat need about 400MB.

Well, I'd think that's were your problem is.  Not only you have a
(relatively speaking) small server -- you also share it with other
very-memory-hungry services!  That's not a situation I'd like to be in.
Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
to Postgres.  With 1500 shared buffers you are not really going
anywhere -- you should have ten times that at the very least.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Michael Riess

Alvaro Herrera schrieb:

Michael Riess wrote:

Shared memory ... I currently use 1500 buffers for 50 connections, and 
performance really suffered when I used 3000 buffers. The problem is 
that it is a 1GB machine, and Apache + Tomcat need about 400MB.


Well, I'd think that's were your problem is.  Not only you have a
(relatively speaking) small server -- you also share it with other
very-memory-hungry services!  That's not a situation I'd like to be in.
Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
to Postgres. 


No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)



With 1500 shared buffers you are not really going
anywhere -- you should have ten times that at the very least.



Like I said - I tried to double the buffers and the performance did not 
improve in the least. And I also tried this on a 2GB machine, and 
swapping was not a problem. If I used 10x more buffers, I would in 
essence remove the OS buffers.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Jaime Casanova
On 12/3/05, Michael Riess <[EMAIL PROTECTED]> wrote:
> Alvaro Herrera schrieb:
> > Michael Riess wrote:
> >
> >> Shared memory ... I currently use 1500 buffers for 50 connections, and
> >> performance really suffered when I used 3000 buffers. The problem is
> >> that it is a 1GB machine, and Apache + Tomcat need about 400MB.
> >
> > Well, I'd think that's were your problem is.  Not only you have a
> > (relatively speaking) small server -- you also share it with other
> > very-memory-hungry services!  That's not a situation I'd like to be in.
> > Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
> > to Postgres.
>
> No can do. I can try to switch to a 2GB machine, but I will not use
> several machines. Not for a 5GB database. ;-)
>

No for a 5GB database but because of the other services you have running

> > With 1500 shared buffers you are not really going
> > anywhere -- you should have ten times that at the very least.
> >
>
> Like I said - I tried to double the buffers and the performance did not
> improve in the least. And I also tried this on a 2GB machine, and
> swapping was not a problem. If I used 10x more buffers, I would in
> essence remove the OS buffers.
>

How many disks do you have?  (i wonder if you say 1)
- in most cases is good idea to have the WAL file in another disk...

What type of disks (ide, scsi, etc)?
How many processors?

What other services (or applications) do you have in that machine?

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables - next step

2005-12-03 Thread Jan Wieck

On 12/3/2005 11:41 AM, Michael Riess wrote:


Alvaro Herrera schrieb:

Michael Riess wrote:

Shared memory ... I currently use 1500 buffers for 50 connections, and 
performance really suffered when I used 3000 buffers. The problem is 
that it is a 1GB machine, and Apache + Tomcat need about 400MB.


Well, I'd think that's were your problem is.  Not only you have a
(relatively speaking) small server -- you also share it with other
very-memory-hungry services!  That's not a situation I'd like to be in.
Try putting Apache and Tomcat elsewhere, and leave the bulk of the 1GB
to Postgres. 


No can do. I can try to switch to a 2GB machine, but I will not use 
several machines. Not for a 5GB database. ;-)


What version of PostgreSQL are we talking about? If it is anything older 
than 8.0, you should upgrade at least to that. With 8.0 or better try 
2 shared buffers or more. It is well possible that going from 1500 
to 3000 buffers made things worse. Your buffer cache can't even hold the 
system catalog in shared memory. If those 50 backends serve all those 
500 apps at the same time, they suffer from constant catalog cache 
misses and don't find the entries in the shared buffers either.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-03 Thread Luke Lonergan
Tom,

On 12/2/05 3:00 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:
> 
> Sure it does ... at least as long as you are willing to assume everybody
> uses IEEE floats, and if they don't you have semantic problems
> translating float datums anyhow.
> 
> What we lack is documentation, more than functionality.

Cool - sounds like the transport part might be there - the thing we desire
is a file format that allows for efficient representation of portable binary
datums.

Last I looked at the Postgres binary dump format, it was not portable or
efficient enough to suit the need.  The efficiency problem with it was that
there was descriptive information attached to each individual data item, as
compared to the approach where that information is specified once for the
data group as a template for input.

Oracle's format allows for the expression of fixed width fields within the
input file, and specifies the data type of the fields in the metadata.  We
could choose to support exactly the specification of the SQL*Loader format,
which would certainly be general enough, and would have the advantage of
providing a compatibility option with Oracle SQL*Loader input.

Note that Oracle does not provide a similar functionality for the expression
of *output* files, those that can be dumped from an Oracle database.  Their
mechanism for database dump is the exp/imp utility pair, and it is a
proprietary "shifting sands" specification AFAIK.  This limits the benefit
of implementing the Oracle SQL*Loader compatibility to those customers who
have designed utilities to emit that format, which may still be valuable.

The alternative is to design a Postgres portable binary input file format.
I'd like to see a record oriented format like that of FORTRAN unformatted,
which uses bookends around each record to identify the length of each
record.  This allows for fast record oriented positioning within the file,
and provides some self-description for integrity checking, etc.

- Luke  



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Database restore speed

2005-12-03 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes:
> Last I looked at the Postgres binary dump format, it was not portable or
> efficient enough to suit the need.  The efficiency problem with it was that
> there was descriptive information attached to each individual data item, as
> compared to the approach where that information is specified once for the
> data group as a template for input.

Are you complaining about the length words?  Get real...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Database restore speed

2005-12-03 Thread Luke Lonergan
Tom,

On 12/3/05 12:32 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> "Luke Lonergan" <[EMAIL PROTECTED]> writes:
>> Last I looked at the Postgres binary dump format, it was not portable or
>> efficient enough to suit the need.  The efficiency problem with it was that
>> there was descriptive information attached to each individual data item, as
>> compared to the approach where that information is specified once for the
>> data group as a template for input.
> 
> Are you complaining about the length words?  Get real...

Hmm - "" repeat, efficiency is 1/2 of "" repeat.  I
think that's worth complaining about.

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Faster db architecture for a twisted table.

2005-12-03 Thread Rodrigo Madera
Imagine a table named Person with "first_name" and "age".

Now let's make it fancy and put a "mother" and "father" field that is
a reference to the own table (Person). And to get even fuzzier, let's
drop in some siblings:

CREATE TABLE person(
   id bigint PRIMARY KEY,
   first_name TEXT,
   age INT,
   mother bigint REFERENCES person,
   father biging REFERENCES person,
   siblings array of bigints  (don't remember the syntax, but you get the point)
);

Well, this is ok, but imagine a search for  "brothers of person id
34". We would have to search inside the record's 'siblings' array. Is
this a bad design? is this going to be slow?

What would be a better design to have these kind of relationships?
(where you need several references to rows inside the table we are).

Thanks for any help,
Rodrigo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Database restore speed

2005-12-03 Thread Mitch Skinner
On Fri, 2005-12-02 at 23:03 -0500, Luke Lonergan wrote:
> And how do we compose the binary data on the client?  Do we trust that the 
> client encoding conversion logic is identical to the backend's?

Well, my newbieness is undoubtedly showing already, so I might as well
continue with my line of dumb questions.  I did a little mail archive
searching, but had a hard time coming up with unique query terms.

This is a slight digression, but my question about binary format query
results wasn't rhetorical.  Do I have to worry about different platforms
when I'm getting binary RowData(s) back from the server?  Or when I'm
sending binary bind messages?

Regarding whether or not the client has identical encoding/conversion
logic, how about a fast path that starts out by checking for
compatibility?  In addition to a BOM, you could add a "float format
mark" that was an array of things like +0.0, -0.0, min, max, +Inf, -Inf,
NaN, etc.

It looks like XDR specifies byte order for floats and otherwise punts to
IEEE.  I have no experience with SQL*Loader, but a quick read of the
docs appears to divide data types into "portable" and "nonportable"
groups, where loading nonportable data types requires extra care.

This may be overkill, but have you looked at HDF5?  Only one hit came up
in the mail archives.
http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html
For (e.g.) floats, the format includes metadata that specifies byte
order, padding, normalization, the location of the sign, exponent, and
mantissa, and the size of the exponent and mantissa.  The format appears
not to require length information on a per-datum basis.  A cursory look
at the data format page gives me the impression that there's a useful
streamable subset.  The license of the implementation is BSD-style (no
advertising clause), and it appears to support a large variety of
platforms.  Currently, the format spec only mentions ASCII, but since
the library doesn't do any actual string manipulation (just storage and
retrieval, AFAICS) it may be UTF-8 clean.

Mitch

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-03 Thread Andreas Pflug

Rodrigo Madera wrote:


Imagine a table named Person with "first_name" and "age".

Now let's make it fancy and put a "mother" and "father" field that is
a reference to the own table (Person). And to get even fuzzier, let's
drop in some siblings:

CREATE TABLE person(
  id bigint PRIMARY KEY,
  first_name TEXT,
  age INT,
  mother bigint REFERENCES person,
  father biging REFERENCES person,
  siblings array of bigints  (don't remember the syntax, but you get the point)
);

Well, this is ok, but imagine a search for  "brothers of person id
34". We would have to search inside the record's 'siblings' array. Is
this a bad design? is this going to be slow?

What would be a better design to have these kind of relationships?
(where you need several references to rows inside the table we are).
 



Create a table "sibling" with parent_id, sibling_id and appropriate FKs, 
allowing the model to reflect the relation. At the same time, you can 
drop "mother" and "father", because this relation is covered too.


Regards,
Andreas


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Sat, 3 Dec 2005, Luke Lonergan wrote:


Tom,

On 12/3/05 12:32 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:


"Luke Lonergan" <[EMAIL PROTECTED]> writes:

Last I looked at the Postgres binary dump format, it was not portable or
efficient enough to suit the need.  The efficiency problem with it was that
there was descriptive information attached to each individual data item, as
compared to the approach where that information is specified once for the
data group as a template for input.


Are you complaining about the length words?  Get real...


Hmm - "" repeat, efficiency is 1/2 of "" repeat.  I
think that's worth complaining about.


but how does it compare to the ASCII representation of that int? (remember 
to include your seperator characters as well)


yes it seems less efficiant, and it may be better to do something like 
send a record description header that gives the sizes of each item and 
then send the records following that without the size items, but either 
way should still be an advantage over the existing ASCII messages.


also, how large is the  in the message?

there are other optimizations that can be done as well, but if there's 
still a question about if it's worth it to do the parseing on the client 
then a first implmentation should be done without makeing to many changes 
to test things.


also some of the optimizations need to have measurements done to see if 
they are worth it (even something that seems as obvious as seperating the 
sizeof from the data itself as you suggest above has a penalty, namely it 
spreads the data that needs to be accessed to process a line between 
different cache lines, so in some cases it won't be worth it)


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-03 Thread Hélder M . Vieira


- Original Message - 
From: "Andreas Pflug" <[EMAIL PROTECTED]>


Create a table "sibling" with parent_id, sibling_id and appropriate FKs, 
allowing the model to reflect the relation. At the same time, you can drop 
"mother" and "father", because this relation is covered too



Something like a table describing relationships and a table reflecting 
relationships from both sides, I guess:



create table relationship_type
(
relationship_type_id serial,
relationship_type_description varchar(20)
)

populated with values such as:
1 Child_of
2 Father_of
3 Brother_of
4 Sister_of
...


And then


create table person_relationships
(
source_person_id int4,
relationship_type_id int4,
target_person_id int4
)

populated with values such as:
1 1 2  (person 1 is child of person 2)
2 2 1  (person 2 is father of person 1)
...


It requires a careful maintenance, as almost all (I'd stick with ALL) 
relationships will require a person to appear twice (as source and as 
target), but flexible and easy to query.



Helder M. Vieira






---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Very slow queries - please help

2005-12-03 Thread Thomas F. O'Connell


On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote:


The consensus seems to be that I need more indexes and I also need to
look into the NOT IN statement as a possible bottleneck. I've
introduced the indexes which has led to a DRAMATIC change in response
time. Now I have to experiment with INNER JOIN -> OUTER JOIN
variations, SET ENABLE_SEQSCAN=OFF.

Forgive me for not mentioning each person individually and by name.
You have all contributed to confirming what I had suspected (and
hoped): that *I* have a lot to learn!

I'm attaching table descriptions, the first few lines of top output
while the queries were running, index lists, sample queries and
EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
vary very much, it's nearly 100% anyway, but the memory usage drops
markedly, which is another very nice result of the index introduction.

Any more comments and tips would be very welcome.


You might find the following resources from techdocs instructive:

http://techdocs.postgresql.org/redir.php?link=/techdocs/ 
pgsqladventuresep2.php


http://techdocs.postgresql.org/redir.php?link=/techdocs/ 
pgsqladventuresep3.php


These documents provide some guidance into the process of index  
selection. It seems like you could still stand to benefit from more  
indexes based on your queries, table definitions, and current indexes.


--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster