from:"Wes"

Re: [GENERAL] Linux v.s. Mac OS-X Performance

2007-11-26 Thread Wes

On 11/13/07 10:02 AM, "Scott Ribe" <[EMAIL PROTECTED]> wrote:

> What you're referring to must be that the kernel was essentially
> single-threaded, with a single "kernel-funnel" lock. (Because the OS
> certainly supported threads, and it was certainly possible to write
> highly-threaded applications, and I don't know of any performance problems
> with threaded applications.)
> 
> This has been getting progressively better, with each release adding more
> in-kernel concurrency. Which means that 10.5 probably obsoletes all prior
> postgres benchmarks on OS X.

While I've never seen this documented anywhere, it empirically looks like
10.5 also (finally) adds CPU affinity to better utilize instruction caching.
On a dual CPU system under 10.4, one CPU bound process would use two CPU's
at 50%. Under 10.5 it uses one CPU at 100%.

I never saw any resolution to this thread - were the original tests on the
Opteron and OS X identical, or were they two different workloads?

Wes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Linux v.s. Mac OS-X Performance

2007-11-29 Thread Wes

Regarding the various kernel bottlenecks, have there been any tests with
Google's malloc (libtcmalloc)?  Perhaps PostgreSQL isn't heavily threaded
enough to make a difference, but on one of our heavily threaded applications
(unrelated to Postgres), it made a night and day difference.  Instead of
memory and CPU usage growing and growing, both stabilized quickly at less
than half of what the linux malloc produced.  Linux (2.6) RH malloc stinks
in heavily threaded applications.  The benchmark that got us looking at this
was a MySQL benchmark showing performance scaling by number of threads on
various linux operating systems.  The difference in our application by
simply relinking at run time (LD_PRELOAD) with libtcmalloc was astounding.

Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] REINDEX on large DB vs. DROP INDEX/CREATE INDEX

2008-01-24 Thread Wes

Seems like it would be a common question, but I'm having problems finding an
answer in the archives on this...

I have a large database (now about 2 billion records), and about once a year
I have been dropping and recreating the indexes.  Recreating the foreign key
constraints takes as long or longer than the index creation.  Last year the
whole process took about 72 hours.  This year the DB is bigger.

I'm running 8.1.4.  Assume I have exclusive access to the DB.

1. Is there any advantage to doing the DROP/CREATE over just doing a REINDEX
DATABASE.  I seem to remember DROP/CREATE had some advantages, but can't
find the information.  Is there a performance hit with REINDEX during
creation because of locking issues?

2. I'm assuming REINDEX would avoid the time involved in recreating the
foreign key constraints?

3. With a REINDEX DATABASE, how can I monitor progress?

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] REINDEX on large DB vs. DROP INDEX/CREATE INDEX

2008-01-24 Thread Wes

On 1/24/08 12:48 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Wes <[EMAIL PROTECTED]> writes:
>> I'm running 8.1.4.  Assume I have exclusive access to the DB.
> 
> You really ought to update to 8.1.something-newer, but I digress.

I was planning on upgrading to 8.x at the same time as this reindex - just
do a dump/reload, but...

I guess I've been asleep at the wheel and didn't realize 8.1.11 was out.
Since that wouldn't require a DB reload, I guess that would be highly
recommended?

>> 1. Is there any advantage to doing the DROP/CREATE over just doing a REINDEX
>> DATABASE.
> 
> No, not if you don't mind exclusive locks.  DROP together with CREATE
> INDEX CONCURRENTLY might be nicer if you were trying to do this without
> completely shutting down the DB, but if you aren't running normal
> operations then just use REINDEX.

I was thinking there was some clean up that didn't happen with REINDEX,
related to disk space allocation maybe?  Perhaps this was just on older 7.x
versions - something I vaguely recall back under 7.x when I was having some
corruption issues.

>> 3. With a REINDEX DATABASE, how can I monitor progress?
> 
> It should give you a NOTICE after each table.

Is there anything that shows up in ps for each index it is working on?

> BTW, what have you got maintenance_work_mem set to?

It is currently set to 983025.  Not sure where I got that strange number
from.  It's a 2 GB machine.  I've been trying to get more, but when it runs
fine day to day, it's kind of hard to justify.  Lots of disks, not so much
memory.

I guess I should also turn off fsync for the duration.

Wes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] REINDEX on large DB vs. DROP INDEX/CREATE INDEX

2008-01-25 Thread Wes

On 1/25/08 5:40 AM, "Gregory Stark" <[EMAIL PROTECTED]> wrote:

> It shouldn't make a big difference. fsync only happens at the end of a
> transaction or at a checkpoint.
> 
> Since you're concerned with very long operations the slowdown at the end of
> the transaction won't make a big difference.

Ok, I didn't realize that.

> Checkpoints could be an issue, but you would be better off just raising
> checkpoint_segments and/or checkpoint_timeout to make sure you don't get one
> more often than once every few minutes when you're doing large operations like
> this.

I have checkpoint_segments set to 60, and no warnings showing up in the log.

Wes



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] REINDEX on large DB vs. DROP INDEX/CREATE INDEX

2008-02-04 Thread Wes

Just a follow-up on this...  The REINDEX took about 2 1/2 days.  I didn't
gain much disk space back - a full backup takes just as long as before, but
the vacuum time dropped from 30 hours to 3 hours.

Wes

>> 1. Is there any advantage to doing the DROP/CREATE over just doing a REINDEX
>> DATABASE.
> 
> No, not if you don't mind exclusive locks.  DROP together with CREATE
> INDEX CONCURRENTLY might be nicer if you were trying to do this without
> completely shutting down the DB, but if you aren't running normal
> operations then just use REINDEX.
> 
>> 2. I'm assuming REINDEX would avoid the time involved in recreating the
>> foreign key constraints?
> 
> Right, that's one reason to do it that way.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] REINDEX on large DB vs. DROP INDEX/CREATE INDEX

2008-02-04 Thread Wes

On 2/4/08 9:53 AM, "Vivek Khera" <[EMAIL PROTECTED]> wrote:

> what you need to do is compare the relpages from the pg_class table
> for that index before and after.
> 
> if you didn't get much disk space back, make sure you have no long
> running transactions that may have kept some older files open.

I can check that next time, but I only reindex about once a year.  There
definitely should be no outstanding transactions.

The reason for the huge change in the vacuum time is that the indexes are
scanned in index order instead of disk order.  I understand that is fixed in
8.2 or 8.3 (don't recall which I saw it in), but have never gotten
confirmation from anyone on that.

Wes

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Vacuum time degrading

2005-02-28 Thread Wes

We've been doing a vacuum every night.  With about 254 million rows, it was
taking abut 0.8 hours.  Now a few months later with about 456 million rows
it is taking 4.8 hours.  I did check the archives and didn't find anything
like this.

Why is the vacuum time not going up linearly?  Since vacuum has to
sequentially read the entire database, I would have expected a linear
increase - about 1.5 hours now.

There are currently no deletes or modifies to the database - only inserts.

This is on PostgreSQL 7.4.5, RedHat ES 3.0.

Wes



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] Vacuum time degrading

2005-03-01 Thread Wes

On 2/28/05 6:53 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> If you are suffering bloat, the fastest route to a solution would
> probably be to CLUSTER your larger tables.  Although VACUUM FULL
> would work, it's likely to be very slow.

How can there be bloat if there are no deletes or modifies?

Even if there were deletes or modifies (there will be in about another year
and a half), if a vacuum is being performed every night, how can there be
bloat?  The vacuum should release the dead space and it should be reused.
Am I missing something?

>> There are currently no deletes or modifies to the database - only inserts.
> 
> You *certain* about that?  It's hard to see how the vacuum time wouldn't
> be linear in table size if there's nothing to do and no dead space.

Absolutely sure.  The only case that would approach a delete is if a batch
load fails, the transaction is rolled back.  That very seldom happens.

Why am I running vacuum nightly if I have no deletes or updates, you ask?
Two reasons - to have it in the cron schedule for when there are deletes
(there will never be updates), and as a check on database integrity.  If
there is a database problem, vacuum at least has a chance of flagging it
since it reads the entire database.  This was instigated after we had a
couple of instances of corruption a while back that went undetected for too
long.  I'm also doing a weekly pg_dumpall as an additional check/fallback.

> Again, VACUUM VERBOSE info would be informative (it's sufficient to look
> at your larger tables for this).

I'll set that up to run tonight and see if it gives any clues.  Last night,
vacuum ran over 5 hours.

Wes

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Vacuum time degrading

2005-03-01 Thread Wes

On 2/28/05 6:53 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> It's hard to see how the vacuum time wouldn't
> be linear in table size if there's nothing to do and no dead space.

I am doing 'vacuum analyze' rather than just 'vacuum'.  Could that have
anything to do with the non-linear behavior?

Wes

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Vacuum time degrading

2005-03-02 Thread Wes

On 2/28/05 6:53 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Again, VACUUM VERBOSE info would be informative (it's sufficient to look
> at your larger tables for this).

It took 5.2 hours again tonight to do the vacuum.  I don't see anything out
of the ordinary - no explanation for the non-linear increases in vacuum
time.

This is what shows up at the end:

INFO:  free space map: 93 relations, 282 pages stored; 1712 total pages
needed
DETAIL:  Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared
memory.

The tables all show something like:

INFO:  "blah": found 0 removable, 366326534 nonremovable row versions in
3241829 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.

Wes

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Vacuum time degrading

2005-03-02 Thread Wes

On 3/2/05 12:16 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Would you post the complete VACUUM VERBOSE log?  The CPU/elapsed time lines
> would help us identify where the time is going.

I'll send it to you directly - its rather long.

>> DETAIL:  Allocated FSM size: 1000 relations + 100 pages = 5920 kB shared
>> memory.
> 
> Well, you don't have a problem with FSM being too small anyway ;-)

Nope...  Preparation for when deletes start kicking in down the road.  If I
can only do a vacuum once a week, I've got to have lots of space.

Wes

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Vacuum time degrading

2005-03-02 Thread Wes

On 3/2/05 12:16 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Would you post the complete VACUUM VERBOSE log?  The CPU/elapsed time lines
> would help us identify where the time is going.

Mailed.

I do see stats like:

CPU 518.88s/25.17u sec elapsed 10825.33 sec.
CPU 884.96s/64.35u sec elapsed 13793.13 sec.
CPU 132.46s/6.66u sec elapsed 2435.42 sec.
CPU 49.25s/4.15u sec elapsed 414.71 sec.

This is a dual CPU hyperthreaded (which probably makes little difference
here) 2.4Ghz RedHat 3.0.  The database is on an 8-disk SCSI hardware RAID 5
with 10k rpm disks.  Pg_xlog is on a separate volume.

I thought it was a 2.6 kernel, but it looks like it is 2.4.20.  I need to
monitor the system when the vacuum is running to see if sar/top show
anything.  I wonder if it's hitting the kswapd thrashing problem?

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] Vacuum time degrading

2005-03-02 Thread Wes

Watching the system as vacuum is running, I can see that we are encountering
the kswapd/kscand problem in the 2.4.20 kernel.  This could very well
account for the non-linear increase in vacuum time.

This problem is fixed in the 2.6 kernel, but we can't upgrade because DELL
is dragging their feet in releasing hardware monitoring compatible with 2.6
kernel.  So, we're going to try a 2.4.29 kernel and hope that the problem is
fixed there.  With any luck, by Friday I'll know if the kswapd problem is
fixed in 2.4.29 and if that solves the excessive vacuum times.

Wes



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Vacuum time degrading

2005-03-08 Thread Wes

Well, the good news is that the 2.24.29 kernel solved the kswapd problem.
That bad news is that it didn't help the vacuum time.  In fact, the vacuum
time is now over 6 hours instead of 5 hours.  Whether that is a direct
result of the 2.24.29 kernel, or a coincidence, I don't know at this time.

I guess the next step is to try reindexing a couple of the big indexes and
see if that helps.

Wes



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] Recovering real disk space

2005-04-02 Thread Wes

On 3/30/05 12:09 PM, "Adam Siegel" <[EMAIL PROTECTED]> wrote:

> How can we physically recover "real" disk space for the rows that were
> deleted from the table?

vacuum full

Wes



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Vacuum time degrading

2005-04-04 Thread Wes

On 3/2/05 10:50 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> It wouldn't be easy --- there are some locking considerations that say
> btbulkdelete needs to scan the index in the same order that an ordinary
> scan would do.  See the nbtree README for details.

Just a follow-up on this..

The vacuum time has been steadily increasing at a seemingly increasing rate,
although there are no deletes or updates to the database.  The current DB
size is just over 500 million rows.  Last week it was up to 6.84 hours to do
a vacuum.  Over the weekend I reindexed all the major indexes.  The two
largest indexes took about 10 hours to reindex both.  After the reindexing,
the vacuum took only 1.44 hours.  This is pretty much a linear scaling from
the original vacuum time I reported.

So, the increasing vacuum times would appear to be as Tom suggested - due to
the fact that vacuum processes indexes in index order, not physical disk
order.  I guess we add a periodic reindex to our maintenance procedures...

Wes

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Vacuum time degrading

2005-04-05 Thread Wes

On 4/4/05 8:50 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> That doesn't follow from what you said.  Did you check that the physical
> sizes of the indexes were comparable before and after the reindex?

No, how do I do that (or where is it documented how to do it)?

How is it not consistent?  I believe you suggested the reindex.  The initial
timing was shortly after a database reload.  The index would have been built
in sorted order, correct?  This was the 1 hour time.  After a period of
months, the index values have been inserted in random order.  The DB size is
up 50% but the vacuum time is up a factor of 6+.  Presumably the index is
being read by moving the heads all over the place.  I reindex, and the index
is rebuilt in sorted order.  Vacuum is now down to 1.5 hours - a linear
scaling from the original numbers.  The locality of reference in reading the
indexes in order should be much better.

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] Vacuum time degrading

2005-04-05 Thread Wes

On 4/5/05 11:15 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> I didn't say it wasn't consistent, just that it doesn't prove the
> point.  The speedup you saw could have been from elimination of index
> bloat more than from bringing the index into physically sorted order.
> An estimate of the overall database size doesn't really tell us how
> much this particular table's indexes changed in size.

Ok, now I follow.  Taking the biggest indexes:

The weekend before:

INFO:  index "message_recipients_i_recip_date" now contains 393961361 row
versions in 2435100 pages

INFO:  index "message_recipients_i_message" now contains 393934394 row
versions in 1499853 pages

After reindex:

INFO:  index "message_recipients_i_recip_date" now contains 401798357 row
versions in 1765613 pages

INFO:  index "message_recipients_i_message" now contains 401787237 row
versions in 1322974 pages



Wes



---(end of broadcast)---
TIP 8: explain analyze is your friend

[GENERAL] To SPAM or not to SPAM...

2005-06-07 Thread Wes

It seems that for some time this list has been mirrored to Usenet without
changing the originating email addresses.  Since Usenet is spammers Nirvana,
that one little act results in everyone that posts to the postgres lists
becoming SPAM fodder.

Can something not be changed in the mailinglist-Usnet gateway such that the
originating email addresses are either removed or scrambled so that posting
to the mailing list doesn't result in your email address being plastered all
over Usenet?  People that intentionally post to Usenet generally don't use a
replyable email address.

Wes



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[GENERAL] 8.1 'make check' fails

2005-10-01 Thread Wes

System: Mac OS X 10.4.2
PostgreSQL: 8.1 b2

Running 'make check', I get the following failure:

rm -rf ./testtablespace
mkdir ./testtablespace
/bin/sh ./pg_regress --temp-install --top-builddir=../../..
--temp-port=55432 --schedule=./parallel_schedule --multibyte=SQL_ASCII
--load-language=plpgsql
== creating temporary installation==

pg_regress: installation failed
Examine ./log/install.log for the reason.

make[2]: *** [check] Error 2
make[1]: *** [check] Error 2
make: *** [check] Error 2


Looking at the install log, I see it apparently does not handle directories
that contain blanks - the paths are not being quoted:

gzip -d -c ./postgres.tar.gz | ( cd /Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/doc/html && /usr/bin/tar xf - )
for file in man1/*.1 man7/*.7 ; do \
  /bin/sh ../config/install-sh -c -m 644 $file /Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/man/$file || exit; \
done
cp: /Volumes/G4 is a directory (not copied).
make[4]: *** [install] Error 1
make[3]: *** [install] Error 2



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] 8.1 'make check' fails

2005-10-02 Thread Wes

On 10/2/05 7:48 AM, "William ZHANG" <[EMAIL PROTECTED]> wrote:

> Yes, the Makefiles cannot deal with spaces correctly.
> Seems we should avoid use the `complicated' path.

Such paths are normal on systems with a GUI interface.  They are not out of
the ordinary nor complicated.

Wes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] 8.1 'make check' fails

2005-10-02 Thread Wes

On 10/2/05 4:13 PM, "Martijn van Oosterhout"  wrote:

> Problem is, the space is also the word seperator. So if you have a
> space seperated list of words (ie normal for make), names with spaces
> are a pain.
> 
> It occurred to me that you might be able to replace the spaces in the
> paths with question marks. Then any invokation of the shell will expand
> the question mark back to a space as part of parameter expansion.

The normal build seems to work, as does 'make install' (although the install
/path is /usr/local).

I would have thought it was just a matter of quoting file paths in all the
makefiles.  Instead of

gzip -d -c $(srcdir)/postgres.tar.gz | ( cd $(DESTDIR)$(docdir)/html...

It should be

gzip -d -c "$(srcdir)/postgres.tar.gz" | ( cd "$(DESTDIR)$(docdir)/html"...

But it's not quite that simple.  See below.  To resolve my immediate
problem, I can just temporarily rename my hard drive.  I'm just reporting
this as a problem that should be fixed.

Wes

install.log has:

make -C doc install
make[4]: Nothing to be done for `install'.
make -C src install
/bin/sh ../config/mkinstalldirs "/Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib/pgxs/src"
mkdir /Volumes/G4/Boot
mkdir /Volumes/G4/Boot/Users
mkdir /Volumes/G4/Boot/Users/pgsql
mkdir /Volumes/G4/Boot/Users/pgsql/src
mkdir /Volumes/G4/Boot/Users/pgsql/src/postgres
mkdir /Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2
mkdir /Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src
mkdir /Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr/local
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr/local/pgsql8.1b2
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr/local/pgsql8.1b2/lib
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr/local/pgsql8.1b2/lib/pgxs
mkdir 
/Volumes/G4/Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regre
ss/./tmp_check/install/usr/local/pgsql8.1b2/lib/pgxs/src
/bin/sh ../config/install-sh -c -m 644 Makefile.global "/Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib/pgxs/src/Makefile.global"
/bin/sh ../config/install-sh -c -m 644 Makefile.port "/Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib/pgxs/src/Makefile.port"
/bin/sh ../config/install-sh -c -m 644 "./Makefile.shlib" "/Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib/pgxs/src/Makefile.shlib"
/bin/sh ../config/install-sh -c -m 644 "./nls-global.mk" "/Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib/pgxs/src/nls-global.mk"
make -C port install
/bin/sh ../../config/install-sh -c -m 644  libpgport.a /Volumes/G4
Boot/Users/pgsql/src/postgres/postgresql-8.1beta2/src/test/regress/./tmp_che
ck/install/usr/local/pgsql8.1b2/lib
cp: /Volumes/G4 is a directory (not copied).
make[5]: *** [install] Error 1
make[4]: *** [install] Error 2
make[3]: *** [install] Error 2

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Database-based alternatives to tsearch2?

2006-12-12 Thread Wes

I've seen questions asked on the list about alternatives to tsearch2, but
not for the type of full text indexing I'm looking for.

I'm looking for a non index-based full text indexing - one that stores the
information as table data instead of index data.  I do not need to implement
SQL operators for searches.  The application library would need to implement
the actual word search.

Indexes are too fragile.  Our documents will be offline, and re-indexing
would be impossible.  Additionally, as I undertstand it, tsearch2 doesn't
scale to the numbers I need (hundreds of millions of documents).

Is anyone aware of any such solutions for PostgreSQL, open source or
otherwise?

Thanks

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Index vacuum improvements in 8.2

2006-12-29 Thread Wes

>From 8.2 release notes:
Speed up vacuuming of B-Tree indexes (Heikki Linnakangas, Tom)

>From <http://archives.postgresql.org/pgsql-patches/2006-05/msg7.php>

"2. Vacuum is now done in one phase, scanning the index in physical order.
That significantly speeds up index vacuums of large indexes that don't fit
into memory. However, btbulkdelete doesn't know if the vacuum is a full or
lazy one. The patch just assumes it's a lazy vacuum, but the API really
needs to be changed to pass that information earlier than at
vacuum_cleanup."

Are these talking about the same thing - 8.2 now vacuums indexes in disk
order instead of index sort order?  If so, that will make a tremendous
difference in vacuum speed for us (when we can afford the downtime to do a
dump/upgrade/reload).  For us, vacuum starts at about 4 hours after a
reindex, then goes downhill.  We reindex when it starts taking 24 hours to
vacuum.

The above referenced posting is from May 1, 2006.  Is pgsql 8.2 the first
release to have this code?

Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Index vacuum improvements in 8.2

2007-01-05 Thread Wes

>From 8.2 release notes:
Speed up vacuuming of B-Tree indexes (Heikki Linnakangas, Tom)

>From <http://archives.postgresql.org/pgsql-patches/2006-05/msg7.php>

"2. Vacuum is now done in one phase, scanning the index in physical order.
That significantly speeds up index vacuums of large indexes that don't fit
into memory. However, btbulkdelete doesn't know if the vacuum is a full or
lazy one. The patch just assumes it's a lazy vacuum, but the API really
needs to be changed to pass that information earlier than at
vacuum_cleanup."

Are these talking about the same thing - 8.2 now vacuums indexes in disk
order instead of index sort order?  If so, that will make a tremendous
difference in vacuum speed for us (when we can afford the downtime to do a
dump/upgrade/reload).  For us, vacuum starts at about 4 hours after a
reindex, then goes downhill.  We reindex when it starts taking 24 hours to
vacuum.

The above referenced posting is from May 1, 2006.  Is pgsql 8.2 the first
release to have this code?

Wes




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Ridiculous load

2004-12-11 Thread Wes

On 12/9/04 9:23 AM, "Peter Haworth" <[EMAIL PROTECTED]> wrote:

> It runs RHEL ES v3, kernel 2.4.21-20.ELsmp
> It's generally a very stable box which runs a number of postgresql
> instances.  But last night we hit very high low averages - 10+, vs the
> normal 0-2.
> The culprit appeared to be kswapd, which was using huge amounts of cpu.
> I'd like to know why!

There were some major changes in swap management somewhere in the linux 2.4
kernel (also RH 3.0 ES).  I don't off hand remember exactly which level.
Under a heavy I/O load, we also saw kswapd going nuts with the 2.4.21
kernel, destroying system performance.  The only solution we found was to
upgrade to a 2.6 kernel.  The problem has not reoccurred since then.  Our
database is currently about 67 gigabytes and is growing at about 1 million
records per day.  We are using Postgresql 7.4.x.

Wes

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] OSX 10.3.7 broke Postgresql 8.0.0b5?

2004-12-28 Thread Wes

On 12/18/04 11:03 AM, "Scott Ribe" <[EMAIL PROTECTED]> wrote:

> All the recent OS X .x updates have replaced /etc/rc. For all I know they
> may always have done so, but of course prior to 10.3 we didn't have to
> modify /etc/rc directly.

You shouldn't be modifying /etc/rc directly anyway.  Create a startup script
in /Library/StartupItems, or /etc/mach_init.d.  I believe the latter is new
with 10.3 and is now the preferred method.  Granted, modifying /etc/rc is a
lot easier, but you pay for it later when an update wipes out your changes
and you beat your head against the wall trying to find the problem.

Wes

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [GENERAL] hundreds of millions row dBs

2005-01-04 Thread Wes

> We're getting about 64 million rows inserted in about 1.5 hrs into a
> table with a multiple-column primary key - that's the only index.
> That's seems pretty good to me - SQL Loader takes about 4 hrs to do the
> same job.

As I recall, the last time we rebuilt our database, it took about 3 hours to
import 265 million rows of data.  It then took another 16 hours to rebuild
all the indexes.  I think the entire pg_dumpall/reload process took about 21
hours +/-.  I wonder what it will be like with 1.5 billion rows...

Wes



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] hundreds of millions row dBs

2005-01-05 Thread Wes

> Out of curiosity, what value of sort_mem were you using?
> 
> (In PG 8.0, the sort memory setting used by CREATE INDEX will be
> maintenance_work_mem not work_mem, which should help in getting larger
> values to be used.  But in existing releases you usually need to think
> about a manual tweak.)

Normally it is set to 65535.  However, during the load I bump it up to
655350.  The system has 2GB ECC memory.

> Here is an instance where a really big ram disk might be handy.
> You could create a database on a big ram disk and load it, then build
> the indexes.

I'm afraid we don't have quite that much RAM...  With just under 400 million
rows right now, it is 74 GB.  That will probably grow to around 300 GB or so
before it stabilizes.

> Actually, if you have a RAM disk, just change the $PGDATA/base/nnn/pgsql_tmp
> subdirectory into a symlink to some temp directory on the RAM disk.
> Should get you pretty much all the win with no need to move stuff around
> afterwards.
> 
> You have to be sure the RAM disk is bigger than your biggest index though.

Hmm.  That's a thought.  I expect our largest index will still be bigger
than available RAM though.  How can I check index sizes?

We already have pg_xlog on a dedicated mirrored disk.  Would it help
significantly to give pgsql_tmp its own mirrored disk?  PGDATA is on an 8
disk hardware RAID 5.

Wes



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [GENERAL] [HACKERS] Much Ado About COUNT(*)

2005-01-13 Thread Wes

On 1/13/05 9:50 AM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

> I think the Oracle syntax looks like
> 
>   SELECT * FROM foo SAMPLE (0.1)
> 
> I don't think I would have picked this syntax but it seems like a better idea
> to copy the existing practice rather than invent a new one.

Of course, in Oracle 'count(*)' is instantaneous.  It doesn't have to count
the physical records one by one.

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] [HACKERS] Much Ado About COUNT(*)

2005-01-14 Thread Wes

On 1/13/05 6:44 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

> That's simply false. Oracle does indeed have to count the records one by one.
> 
> It doesn't have to read and ignore the dead records since they're in a
> separate place (but on the other hand it sometimes have to go read that
> separate place when it sees records that were committed after your
> transaction).
> 
> It can also do index-only scans, which often helps, but it's still not
> instantaneous.

Ok, I stand corrected - I was given some wrong information.  However, my
experience has been that count(*) on Oracle is a whole lot faster than
PostgreSQL - what appeared instantaneous on Oracle took some time on
PostgreSQL.  That was one of the first things I noticed when moving a
database application to PostgreSQL.  I've since disposed of the Oracle
database, so can't go back and retest.

Wes

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] [HACKERS] Much Ado About COUNT(*)

2005-01-14 Thread Wes

On 1/14/05 12:47 PM, "Frank D. Engel, Jr." <[EMAIL PROTECTED]> wrote:

> It's probably too messy to be worthwhile this
> way, though.  More trouble than it would be worth.

It would be rather useful if there was a way to get a reasonably accurate
count (better than analyze provides) in a very short period.  When you've
got a relatively wide table that has hundreds of millions to over a billion
rows, and you need to report on how many rows in the table, that can take a
long time.

Wes

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] [HACKERS] Much Ado About COUNT(*)

2005-01-14 Thread Wes

On 1/14/05 5:37 PM, "Martijn van Oosterhout"  wrote:

> If that's all you want, what about the row estimate from pg_class? It
> has the number of rows active at last vacuum... For really large tables
> I imagine it'd be easily close enough...

For showing the changes in a given day (or even week), that isn't accurate
enough.

Wes



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[GENERAL] Shared memory and Mac OS X

2005-01-20 Thread Wes

The problem with not being able to set shmmax and shmall in a startup script
in Mac OS X is not that you are setting them too late in the boot process.
It is that you can set them only once.  In fact, you can set them from a
terminal window after booting and logging in - as long as they haven't
already been set.

If you comment out the sysctl's for shmall and shmmax in /etc/rc, reboot,
then log in and bring up a terminal window, you will see that they are set
to -1.  You can then set them to whatever you want (as root of course).
However, if you try to set the value again, the first value remains set.

Unfortunately, this doesn't help much, as Apple's default /etc/rc sets
values.  If you comment out these, you can set the values in a startup
script, but you're still screwed when the next update re-enables the
settings in /etc/rc, which are set before SystemStarter is called.  I guess
the best you can do is have a startup script that notifies you if the values
are wrong.

Wes



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] Shared memory and Mac OS X

2005-01-20 Thread Wes

On 1/20/05 10:27 PM, "Jonel Rienton" <[EMAIL PROTECTED]> wrote:

> have you tried using /etc/sysctl.conf and saving the shmax value there?

Unfortunately, the -p parameter does not appear to be valid, nor does
'strings -a' show 'conf' in the binary (unlike the RedHat sysctl).

Wes



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] Shared memory and Mac OS X

2005-01-21 Thread Wes

On 1/21/05 1:50 AM, "Jonel Rienton" <[EMAIL PROTECTED]> wrote:

> I have found this at the docs of Postgres:

Yes, I knew about that.  My email was in regards to an earlier discussion on
why you had to update /etc/rc as of 10.3 (probably should have dug that one
up and replied to it).  The previous statement had been that they had to be
in /etc/rc because startup scripts were now too late in the boot process.  I
was just clarifying the reason.

Wes



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

[GENERAL] How are foreign key constraints built?

2005-01-23 Thread Wes

How are foreign key constraints built?  In loading my database into
PostgreSQL 8.0, on the command:

ALTER TABLE ONLY TABLEA
ADD CONSTRAINT "$1" FOREIGN KEY (mkey) REFERENCES tableb(mkey) ON DELETE
CASCADE;

I ended up with the following in pg_tmp as it is adding the constraint:

-rw---1 postgres  admin  1073741824 23 Jan 06:09 pgsql_tmp2559.25
-rw---1 postgres  admin  1073741824 23 Jan 06:24 pgsql_tmp2559.26
-rw---1 postgres  admin  1073741824 23 Jan 06:39 pgsql_tmp2559.27
-rw---1 postgres  admin   636526592 23 Jan 06:48 pgsql_tmp2559.28
-rw---1 postgres  admin  1073741824 23 Jan 11:51 pgsql_tmp2559.29
-rw---1 postgres  admin  1073741824 23 Jan 11:34 pgsql_tmp2559.30
-rw---1 postgres  admin  1073741824 23 Jan 11:36 pgsql_tmp2559.31
-rw---1 postgres  admin  1073741824 23 Jan 11:37 pgsql_tmp2559.32
-rw---1 postgres  admin  1073741824 23 Jan 11:38 pgsql_tmp2559.33
-rw---1 postgres  admin  1073741824 23 Jan 11:39 pgsql_tmp2559.34
-rw---1 postgres  admin  1073741824 23 Jan 11:41 pgsql_tmp2559.35
-rw---1 postgres  admin  1073741824 23 Jan 11:42 pgsql_tmp2559.36
-rw---1 postgres  admin  1073741824 23 Jan 11:43 pgsql_tmp2559.37
-rw---1 postgres  admin  1073741824 23 Jan 11:45 pgsql_tmp2559.38
-rw---1 postgres  admin  1073741824 23 Jan 11:46 pgsql_tmp2559.39
-rw---1 postgres  admin  1073741824 23 Jan 11:47 pgsql_tmp2559.40
-rw---1 postgres  admin  1073741824 23 Jan 11:49 pgsql_tmp2559.41
-rw---1 postgres  admin  1073741824 23 Jan 11:50 pgsql_tmp2559.42
-rw---1 postgres  admin   603136000 23 Jan 11:51 pgsql_tmp2559.43

I believe files 25-28 are about the size of one of the indexes, but 29-43 is
about 35% larger than the tmp files when building the other index (the other
index is about 10GB).

There's no problem here, I'd just like to understand what it is doing.  I
expected adding the foreign key constraint would just use the existing
indexes to verify the database is currently consistent.  Is this just
working space to more efficiently build the initial constraint, or does it
actually write this to the database?

Wes



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] Tablespaces and primary keys

2005-01-24 Thread Wes

Implicit indexes created by a constraint do not appear to honor the default
tablespace.  The index gets created in the "null" tablespace.

I took pg_dumpall output and modified the schema to place everything in
specific table spaces.   When the statement:

  ALTER TABLE ONLY addresses
ADD CONSTRAINT addresses_pkey PRIMARY KEY (address_key);

is executed, this results in an implicitly created index:

   NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
  "addresses_pkey" for table "addresses"

However, it ignores my previous command:

  SET default_tablespace = indexes;

and creates the index without a tablespace.  Of course, I can do an

  ALTER INDEX addresses_pkey SET tablespace indexes;

but that's a bit of a hassle where there are a number of very large indexes.

I can't pre-create the index, as the constraint will try to create it's own
index anyway.

Is there something I'm overlooking, or is this a bug?

Wes



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] How are foreign key constraints built?

2005-01-25 Thread Wes

On 1/23/05 1:01 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> If you like you can try the operation with "set
> enable_seqscan = off", but I bet it will take longer.

Ouch!  That was a big mistake.  Since inquiring minds want to know, I
decided to give that a try.  The expected outcome is to beat the heck out of
the index disks as it read one index and referenced the other to see if the
value existed.  What appears to have happened is that it went through the
same process as before, but read each data record via the index.  It still
created all the pgsql_tmp files, the data disk was still the heaviest hit
(expected no or little access there), and it beat the living daylights out
of my swap - pageins/outs like crazy.  The I/O on the index disks was
negligible compared to the data and swap disks.  I won't try that again...

Wes

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Indexed leading substring searches - worked, now doesn't

2005-02-03 Thread Wes

I know my leading substring searches used to be done via indexes.  We
specifically tested that.  Since the last time I tested it, the database has
probably been reloaded to fix a corruption problem.  Now all I can get is
sequential leading substring searches.  In the examples below, the database
was vacuumed last night.

The database is very large (currently about 100 GB, and will be 100's of
gigabytes), so performance is important.  This particular table in the
example has only about 2.8 million rows.

PostgreSQL version is 7.4.5.  Prior to the reload, it was 7.4.1.

The locale is showing up as en_US.iso885915.  As far as I know, it was
always this (default RedHat install), so I don't understand why it worked
before.  Did something change between 7.4.1 and 7.4.5?  I supposed it's
possible that I specified locale=C on the original database and don't
remember that...

I'm not going to have to "initdb --locale=C" and am I?  I looked at index
classes, and that doesn't appear to be something I want to do, due to
performance.  What kind of performance hit do you actually take by using an
index class?

Wes


Pg_controldata shows:

Maximum length of locale name:128
LC_COLLATE:   en_US.iso885915
LC_CTYPE: en_US.iso885915


narc=> \d addresses
 Table "public.addresses"
   Column|  Type  | Modifiers
-++---
 address_key | numeric(12,0)  | not null
 address | character varying(255) | not null
Indexes:
"addresses_pkey" primary key, btree (address_key)
"addresses_i_address" btree (address)

narc=> select count(*) from addresses;
  count  
-
 2829640
(1 row)

narc=> explain select * from addresses where address = 'blah';
  QUERY PLAN

--
 Index Scan using addresses_i_address on addresses  (cost=0.00..2.81 rows=1
width=40)
   Index Cond: ((address)::text = 'blah'::text)
(2 rows)

narc=> explain select * from addresses where address like 'blah%';
  QUERY PLAN
--
 Seq Scan on addresses  (cost=0.00..61244.68 rows=2 width=40)
   Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)

narc=> explain analyze select * from addresses where address like 'blah%';
  QUERY PLAN

--
 Seq Scan on addresses  (cost=0.00..61244.68 rows=2 width=40) (actual
time=1445.386..8913.435 rows=6 loops=1)
   Filter: ((address)::text ~~ 'blah%'::text)
 Total runtime: 8913.504 ms
(3 rows)


Something else doesn't make sense..  I did the same query a few minutes ago
and the cost was totally different:


narc=> explain select * from addresses where address like 'blah%';
QUERY PLAN
--
 Seq Scan on addresses  (cost=1.00..100061244.67 rows=2 width=40)
   Filter: ((address)::text ~~ 'blah%'::text)
(2 rows)




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] Indexed leading substring searches - worked, now

2005-02-03 Thread Wes

On 2/3/05 2:29 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> You could use the alternative operator classes for pattern searches, but
> if you want the same indexes to also serve for normal text sorting,
> C locale is the better bet.

As an interim solution until I can reload the database (takes a weekend),
can I define two indexes on the same field, one using operator classes and
one not (and have them automatically used as appropriate)?

Because of the time involved, I'm trying to hold off on another reload until
we upgrade to 8.x.

Wes

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] Deadlock Detected (revisited)

2005-11-24 Thread Wes

I haven't seen anything to indicate that 8.x improves foreign key
refererence locking and fixes the foreign key 'deadlock detected' issue.
Has that been addressed in 8.1?

I sort my records before loading, but am still getting 'deadlock detected'
(still running 7.3.4).

I have

  B references C
  B references A
  A references C

Any new records for C are loaded first (both A and B depend on C).

A is loaded next (B depends on A).

B is loaded last.

Both A and B are loaded in 'Table C' sorted order.  However, a deadlock can
occur with one process's A records and another's B records, due to the fact
that both have a foreign key reference to C.

The only way I know of to eliminate the deadlocks without serializing the
processes is to remove the foreign key reference from either A or B.
However, jeopardizes referential integrity if something goes wrong with the
load process.

I've never quite understood why a READ of a record with a foreign key
reference results in the referenced record being locked with more than a
shared lock.

Wes



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Deadlock Detected (revisited)

2005-11-25 Thread Wes

On 11/25/05 2:40 AM, "Stephan Szabo" <[EMAIL PROTECTED]> wrote:

> 8.1 should be using the new shared row locks for doing the checks.  This
> should fix the case mentioned.

Hmm.  I'm looking in the "What's new" for 8.1 and see that now.  I could
have sworn it wasn't there the last time I looked.  Sorry.

> Up until now, we didn't have one to get on a per-record basis.

Ok, I guess I'm confused...  From the 7.3.2 manual (6.5 updates):

"Multiversion concurrency control(MVCC) This removes our old table-level
locking, and replaces it with a locking system that is superior to most
commercial database systems. In a traditional system, each row that is
modified is locked until committed, preventing reads by other users. MVCC
uses the natural multiversion nature of PostgreSQL to allow readers to
continue reading consistent data during writer activity. Writers continue to
use the compact pg_log transaction system. This is all performed without
having to allocate a lock for every row like traditional database systems.
So, basically, we no longer are restricted by simple table-level locking; we
have something better than row-level locking."

So, until 8.1 PostgreSQL had "something better than row-level locking" for
some things, but no row locking when needed?  Or was it row locking is
there, but just no shared row locking?

Wes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Deadlock Detected (revisited)

2005-11-25 Thread Wes

On 11/25/05 9:12 PM, "Stephan Szabo" <[EMAIL PROTECTED]> wrote:

> The latter, the row locks before were single owner and were such that a
> second lock request for the same row would wait for the first to be
> released. Now effectively you have two levels of locks at the row level,
> the weaker of which conflicts with the stronger but not with itself.

Now I understand.  Thanks for clarifying.

It sounds like there may be some performance improvements that will be
beneficial to us also.  Hopefully I can convince them to agree to the 8.1
upgrade.  Besides verifying our application, it takes a full weekend to do
the dump/load - the database is just under a billion rows.

Wes

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Excessive vacuum times

2005-12-12 Thread Wes

Some time ago I reported a problem where right after an index rebuild vacuum
might take a couple of hours, but gets slower and slower at a seeming
increasing rate as time goes on.  What takes a couple of ours after a full
index rebuild now takes 12 hours or more.

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order.  I don't see anything about this
in the "What's new" for 8.1.  Has anything been done to resolve this?

Scheduled index rebuilds are not feasible.  I really don't even want to take
a chance on starting one.  The database is just a hair under one billion
rows, and could take the entire weekend or more to rebuild.

Wes



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Excessive vacuum times

2005-12-13 Thread Wes

On 12/12/05 5:26 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

>> The problem was determined to be due to the fact that indexes are vacuumed
>> in index order, not in disk storage order.  I don't see anything about this
>> in the "What's new" for 8.1.  Has anything been done to resolve this?
> 
> No.  Avoiding that would require a new approach to
> vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
> someone has a Bright Idea (tm).

Any ideas on how I might I reconfigure to mitigate the issue?  Separating
the most offending indexes to separate drives probably isn't an option.

Wes



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] ./configure --with-openssl=path fails

2006-01-06 Thread Wes

Because of our build environment, we need to build PostgreSQL specifying the
location of the openssl libraries to insure everyone is using the same
libraries, regardless of the system software is built and executed on.  At
7.4.5, we used:

 ./configure --with-openssl=path

This no longer works with 8.x. See:

 <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>

What is the correct way to work around this restriction?  Set
LD_LIBRARY_PATH?

This is on a variety of unix platforms.

Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] ./configure --with-openssl=path fails

2006-01-06 Thread Wes

On 1/6/06 2:47 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

>>  ./configure --with-openssl=path
>> This no longer works with 8.x. See:
>>  <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>
>> What is the correct way to work around this restriction?
> 
> Use --with-includes and --with-libraries as needed.

That allows me to build, but doesn't set the library search path in the
binaries ("-Wl,-rpath /blah/blah" on linux or "-R /blah/blah" for Solaris).
This prevents programs from loading, since they can't find the libraries if
they are not in the default library search path of the user.

Wes

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] ./configure --with-openssl=path fails

2006-01-11 Thread Wes

On 1/6/06 2:47 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

>>  ./configure --with-openssl=path
>> This no longer works with 8.x. See:
>>  <http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php>
>> What is the correct way to work around this restriction?
> 
> Use --with-includes and --with-libraries as needed.

That doesn't get the library paths into the binary.  If the libraries are
not in the default system search path, the user is screwed.  Is there a way
to solve this?  I think the previous --with-openssl=path set the search path
so the libraries would always be found.

Wes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Finding orphan records

2006-01-11 Thread Wes

I'm trying to find/delete all records in table A that are no longer
referenced by tables B or C.  There are about 4 million records in table A,
and several hundred million in tables B and C.

Is there something more efficient than:

select address_key, address from addresses where ( not exists(select 1 from
B where BField=addresses.address_key limit 1) ) and ( not exists(select 1
from C where CField=addresses.address_key limit 1) )

Of course, all fields above are indexed.

There are foreign key references in B and C to A.  Is there some way to
safely leverage that?

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Finding orphan records

2006-01-11 Thread Wes

On 1/12/06 12:23 AM, "Jonel Rienton" <[EMAIL PROTECTED]> wrote:

> Resending sample query, darn where clause didn't wrap
> 
> select a.*,b.* from a
> left outer join b on a.id = b.a_id
> where b.id is null;

I tried something along those lines a while back, and it was orders of
magnitude slower.  The above produces:

explain select address_key, address from addresses left outer join messages
on addresses.address_key=originator where originator is null;
   QUERY PLAN

-
 Merge Left Join  (cost=35684870.14..38457295.97 rows=4090203 width=40)
   Merge Cond: ("outer".address_key = "inner".originator)
   Filter: ("inner".originator IS NULL)
   ->  Index Scan using addresses_pkey on addresses  (cost=0.00..97213.17
rows=4090203 width=40)
   ->  Sort  (cost=35684870.14..36129462.74 rows=177837040 width=11)
 Sort Key: messages.originator
 ->  Seq Scan on messages  (cost=0.00..7215718.40 rows=177837040
width=11)


This appears to be very inefficient.  B is almost two orders of magnitude
larger than A.  C is about 3-4 times as big as B (record count).  My
statement (with the same single 'B' table as above) produces:

narc=> explain select address_key, address from addresses where ( not
exists(select 1 from messages where originator=addresses.address_key limit
1) );
 QUERY PLAN


 Seq Scan on addresses  (cost=0.00..3398462.98 rows=2045102 width=40)
   Filter: (NOT (subplan))
   SubPlan
 ->  Limit  (cost=0.00..0.81 rows=1 width=0)
   ->  Index Scan using messages_i_orig_mdate on messages
(cost=0.00..35148.46 rows=43301 width=0)
         Index Cond: (originator = $0)


Which seems like it should be much more efficient.

Wes



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Finding orphan records

2006-01-12 Thread Wes

On 1/12/06 10:37 AM, "Greg Stark" <[EMAIL PROTECTED]> wrote:

> If it's only a factor of 3-4 then the merge join should be faster. If it's
> really two orders of magnitude (100x?) then the nested loop below would be
> faster. I think in 8.1 (and I think in 8.0 too) the planner is capable of
> coming up with both plans for the NOT IN query though.

Yep, it's really two orders of magnitude.  A is about 4 million records.  B
is around 200 million, and C is about 800 million records.  That's why I
figured a sequential pass on A with index lookups on B and C would be
fastest.  It took about 3.25 hours to complete the SELECT.

I'm working on the 8.1 upgrade now.  My biggest concern is making sure no
bugs have crept in to ecpg that would affect us (bitten once before, so
being careful).  So far everything else is looking good with 8.1.

I'm pondering dumping the keys for A to a file, and B and C to another file,
sorting with uniqueness on the B/C file, then programmatically determining
which keys have been freed.  I'm pretty sure this will be much faster, but I
had hoped to avoid an external process like this.

Wes

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] ECPG and COPY

2006-02-23 Thread Wes

I found this in the archives:

  <http://archives.postgresql.org/pgsql-interfaces/2004-04/msg5.php>

With 8.x, is it still true that with ECPG, it is not possible to use COPY
inline - I'd have to write the data to a file, then 'COPY FROM filename' (or
use direct libpq API calls instead of ECPG)?

Wes



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] ECPG and COPY and PQputCopyData - don't get errors

2006-02-25 Thread Wes

Since I can't use ECPG COPY, I'm trying to use libpq PQputCopyData.  Not
realizing I had to use CSV format to include column header names, I used:

   PQexec(conn, "COPY detail FROM stdin")

I sent the column headers as the first line of the input.  While this I now
know this is wrong, the problem here is that I got no error response back.
In the postmaster log, I see:

ERROR:  invalid input syntax for type numeric: "messageid"
CONTEXT:  COPY detail, line 1, column messagesize: "messageid"

However, every call to PQputCopyData, as well as PQputCopyEnd returned a
result of 1.

Ok, so I fixed the above error, but any other data load errors are only
showing up in the postgres log file.  I tried defining

  PQsetNoticeReceiver(conn, &pgsqlNoticeReceiver, (void *) &dummy);

For testing, pgsqlNoticeReceiver just writes a message and aborts.  However,
pgsqlNoticeReceiver is never called.

What am I missing?  I need to abort the transaction on any errors in the
COPY.

Wes



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] ECPG and COPY and PQputCopyData - don't get errors

2006-02-26 Thread Wes

On 2/26/06 12:12 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> That just indicates that the data was sent to the backend without error.
> You forgot to check the copy command result afterward.

Ok, after re-reading the docs for the 10th time, I see I missed that I need
to paragraph about calling 'PQgetResult' after the PQputCopyEnd.  I was just
checking the result and PQerrorMessage.  I think what threw me off was:

"If a COPY command is issued via PQexec in a string that could contain
additional commands, the application must continue fetching results via
PQgetResult after completing the COPY sequence. Only when PQgetResult
returns NULL is it certain that the PQexec command string is done and it is
safe to issue more commands."

(looked like this was needed only if multiple commands were in the Pqexec)

and

"PQgetResult
Waits for the next result from a prior PQsendQuery, PQsendQueryParams,
PQsendPrepare, or PQsendQueryPrepared call, and returns it."

(doesn't mention COPY).

I have it working now.  Thanks.

Wes

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Wisconsin Circuit Court Access (WCCA) on

2006-03-14 Thread Wes

On 3/13/06 5:50 PM, "Kevin Grittner" <[EMAIL PROTECTED]> wrote:

> The license for the commercial product contains a clause which
> prohibits disclosing benchmarks of their product without their written
> permission.  (Heaven only knows why they would include such a clause.)
> My comment is not based on any formal benchmarks, but on the graphs
> produced by our monitoring of the production software under real loads,
> and on the query metrics from our middle tier software in the production
> environment.  Even though it would be a huge stretch to call the
> comparison a benchmark under these conditions, this is a litigious
> society.  I'm sure you understand my concern.
> 
> Short of being compelled by law to open our records, I'm not
> comfortable providing any performance comparison which names the
> vendor.

Hmm..  What vendor throws such a clause into all their licensing agreements.
Sounds suspiciously like Microsoft...

I'm not a lawyer, but I believe things such as what vendors a government
entity is using is required by law to be public information.

Wes



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Build failures on RedHat 3.0 with openssl/kerberos

2006-03-14 Thread Wes

I try to build 8.1.3 with:

  ./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam
--enable-thread-safety

It fails the openssl test, saying openssl/ssl.h is unavailable.  Digging
deeper, I find that it is because the test program with

  #include 

is failing because it can't include krb5.h.

Based on another post, I tried adding "--with-krb5".  That explicitly
aborted with it unable to find krb5.h.  I then tried:

./configure --prefix=/usr/local/pgsql8.1.3 --with-openssl --with-pam
--enable-thread-safety --with-krb5 --with-includes=/usr/kerberos/include

Now it gets past both the openssl and kerberos, but bites the dust with:

configure: error:
*** Thread test program failed.  Your platform is not thread-safe.
*** Check the file 'config.log'for the exact reason.
***
*** You can use the configure option --enable-thread-safety-force
*** to force threads to be enabled.  However, you must then run
*** the program in src/tools/thread and add locking function calls
*** to your applications to guarantee thread safety.

If I remove the --with-krb5, it works.  Why does enabling Kerberos break
threads?

I haven't been able to find any issues in the archives with krb5 and
threads.  Am I missing something here?

Wes



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Build failures on RedHat 3.0 with openssl/kerberos

2006-03-14 Thread Wes

On 3/14/06 2:55 PM, "Louis Gonzales" <[EMAIL PROTECTED]> wrote:

> Did you try to ./configure w/out  "--enable-thread-safety?"  I recently
> compiled postgreSQL 8.0.1 on Solaris and _needed_ --enable-thread-safety
> strictly for building Slony-I against postgresql with that feature enabled.
> 
> What is the reason you are compiling this _with_ the feature?
> If it's necessary, then you may need to --with-includes= and/or --with-libs=
> with additional include directories, such as   /usr/include:/usr/include/sys
> where-ever the thread .h files are for your OS.
> 
> This configure attempt could be failing, because it can't locate the
> correct thread headers and/or libraries

Why would I not want to specify enable-thread-safety?  I want to be able to
write threaded programs.

--enable-thread-safety works fine until I enable --with-krb5, so it is
finding the thread libraries.

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] invalid UTF-8 byte sequence detected

2006-03-15 Thread Wes

Based on a couple of other posts, I think I'm out of luck, but I'm hoping
something might have changed recently.

I'm loading a very high volume of data with COPY using libpq - about 100+
million rows per day. The problem is that the COPY sometimes aborts with

  invalid UTF-8 byte sequence detected

I'd prefer not to add to the overhead by pre-validating every string, since
PostgreSQL validates it already.  Is there a way to get the server to
replace invalid characters with something like blank instead of generating a
fatal error?

Wes



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] invalid UTF-8 byte sequence detected

2006-03-15 Thread Wes

On 3/15/06 11:42 PM, "mike" <[EMAIL PROTECTED]> wrote:

> If you don't care to store those characters then maybe you should change
> the database character set to use SQL_ASCII instead of UTF-8.  I believe
> ASCII will quietly discard those characters when converting from UTF-8.

I thought about that, but I do want to allow UTF-8 to be stored.  I just
want it to replace illegal characters with some valid character so that
invalid records will load.   Even if I preprocess the data, I can do no more
than that.

Wes

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] invalid UTF-8 byte sequence detected

2006-03-15 Thread Wes

On 3/16/06 12:13 AM, "Michael Glaesemann" <[EMAIL PROTECTED]> wrote:

> SQL_ASCII is *not* ASCII. See the "What's a good default encoding?"
> thread on this same list from today.
> 
> http://archives.postgresql.org/pgsql-general/2006-03/msg00685.php
> 
> I don't believe it will discard anything on import if the database is
> SQL_ASCII encoded.

That might be worth a shot.  I don't really understand the ramifications,
though, especially given Tom's warning.  I guess as long as I don't care
about things like sort order for those fields, it may not matter much.

Wes

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] invalid UTF-8 byte sequence detected

2006-03-15 Thread Wes

On 3/16/06 12:41 AM, "Michael Glaesemann" <[EMAIL PROTECTED]> wrote:

> I hope you didn't take my comments as a suggestion: I think you
> *should* preprocess your data and use UTF8 as the default encoding
> (or another encoding that includes all of the characters you hope to
> use) for your database.

Yeah, after thinking about it some more, it probably isn't such a good idea
- it would make the database act rather strange with non U.S. Data.  I
really hate to have to burn the cycles to validate anything that looks like
extended UTF-8 characters, only to have Postgres do it again.  But maybe
there's no choice.

Wes



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] ERROR: end-of-copy marker corrupt

2006-03-21 Thread Wes

Version: 8.1.3
System: RedHat Linux 3.0 ES

In doing a bulk load with libpq (PQexec, PQputCopyData, PQputCopyEnd), the
COPY is failing with:

  COPY (d) command did not complete successfully:
  ERROR:  end-of-copy marker corrupt CONTEXT:  COPY detail, line 47201: ""

There are 50,000 lines in this transaction, so this is not the last line.
Other postings indicate this is from a '\.'.  On line 47201, I do see a '\.'
embedded in the middle of the line.  The documentation states that this
should be a problem only if those are the only two characters on a line.

A posting from December 2005 against 8.1.0 (bug #2114) seems to indicate
this should be fixed.

Am I missing something, or is there still a problem?  Do I still need to
change '\.\ to '\\.'?

Should I be using PQescapeString on strings being passed to COPY?

Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Partitioning - when is it too many tables?

2006-03-23 Thread Wes

I'm working on a database that will (at current load) add 100 million
records per day to a database, and needs to keep around 6 months of data
online.  Of course, we don't want the loads to be running all day while
queries are going on.

Using COPY with indexes active runs great with an empty database, then drops
precipitously as the index size grows.  It looks like I should be able to
hit the numbers if I insert the data using COPY with no indexes, then add
the indexes.  I'm looking at partitioning with one table per day.  So, we'd
be looking at about 180 tables with 100 million rows each.  Searches would
typically be within a single day, although they could span multiple days.

This keeps the indexes a more or less reasonable size, and allows quick
deleting of the old data.  Is there any problem with 180 child tables?  How
many would be too many (e.g. if I did one table per 6 hours?)

Thanks

Wes



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Where is client/server compatibility documented?

2006-04-13 Thread Wes

Where can I find documentation on what client library version interoperate
with which server versions?  I've been unable to find anything in the manual
or archives.  The only thing I've found is some information in the archives
on certain psql versions.  Is there a compatibility matrix somewhere for the
client libraries?

Specifically, I have a C application (ecpg/libpq), as well as perl DBI,
built against PostgreSQL 7.4.1.  Am I safe in upgrading the server to 8.1.3
before the application can be rebuilt?  I've seen no problems on the test
server, but...

There's no system catalog monkey business - just plain old SQL.

Wes



---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] Unexplained lock creating table

2006-04-21 Thread Wes

I have a C application (libpq) that uses partitioning.  I create parent
tables 'header' and 'detail'.  The application reads opens multiple
connections, creates the child tables, and uses COPY to import the data:

  open connection 1
  begin
  create table header_1
  COPY into header_1

  open connection 2
  begin
  create table header_2
  COPY into header_2

  open connection 3
  begin
  create table header_3
  COPY into header_3

   [ potentially more connections ]

  end copy 1
  end copy 2
  end copy 3

  commit 1
  commit 2
  commit 3

After the database is initialized, I run the application.  It creates table
header_1 and initiates the copy (transaction still pending).  However, when
it tries to create table header_2, it hangs.  A ps shows

   postgres: test trace [local] COPY
   postgres: test trace [local] CREATE TABLE waiting

However, if I force table header_1 to be created outside the COPY
transaction (using psql, manually committing the transaction from within
gdb, etc.), then run the application, it works regardless of the number of
open connections/transactions.

I then drop all the child tables, leaving the parent table, and rerun the
application.  It again works for all connections.

The problem occurs only when the database has been freshly initialized and
no child table has ever existed.

I confirm this by:

  1. Reinitialize database.
  2. Run application.  Verify hang occurs.  I can rerun step 2 any number of
times and it continues to hang.
  3. Create header_dummy using psql
  4. Drop header_dummy
  5. Run application - works.

I can repeat this with the 'detail' table.  It is 100% reproducible.

What's going on?

Wes



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Unexplained lock creating table

2006-04-22 Thread Wes

> If this is the first child table of "header", which it presumably is, then
> the pg_class row for "header" has to be updated to set relhassubclass. Since
> you do not commit the create before continuing, this backend continues to
> hold a row lock on the updated, uncommitted pg_class row while you do...

[snip]

> relhassubclass isn't reset to false when all child tables of a parent
> table are removed. So next time through there is no need to update the
> pg_class row for the parent table.
>
> So the simplest workaround is probably to ensure that you create at least
> one partition in each table at the outset, before trying to actually load
> any data.

That's exactly what I was looking for.  Now I know it is safe to do a
'create table', 'drop table' in my database creation script to circumvent
the problem - or to be 100% safe should I not do the 'drop table'?.  I was
afraid I might have to move the 'create table' outside of the transactions.

Many thanks

Wes



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Concurrency problem building indexes

2006-04-23 Thread Wes

Postgres: 8.1.3
OS: Mac OS X 10.4.6

I've run into another concurrency issue - parallel building of indexes.
When I try to build multiple indexes at a time, I randomly get:

  ERROR:  tuple concurrently updated

The following thread talks about this, but there is no answer.

  <http://archives.postgresql.org/pgsql-hackers/2002-07/msg00969.php>

How can I safely build indexes in parallel?

At this point, I'm only trying to build two at a time.  I will be building
indexes for tables with any where from a few rows to 100 million rows on a
daily basis - I need to maximize performance.

Wes



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Leading substrings - alternatives with 8.1.3?

2006-05-01 Thread Wes

Back in the 7.4 era, to make leading substrings be indexed, I had to set
locale=C.  See thread:

  <http://archives.postgresql.org/pgsql-general/2005-02/msg00159.php>

I couldn't find any updates to this issue in the archives since then.

With 8.1.3, the default locale for my system is en_US.UTF8.  Leading
substring searches result in a sequential search instead of an indexed
search.  Do I still have to initdb to locale=C, or is there a better option
now?

Wes



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Leading substrings - alternatives with 8.1.3?

2006-05-01 Thread Wes

On 5/1/06 12:47 PM, "Martijn van Oosterhout"  wrote:

> Check the documentation, but if you declare a index with (for example)
> text_pettern_ops, then LIKE will work even if the rest of your DB is
> UTF-8.

My understanding of using operator classes is that I'd have to create two
indexes for each column - one with and one without the operator class.  That
is also what was indicated in the original thread.  Defining multiple
indexes on a given column isn't feasible, due to the database size (100
million rows per day).

Wes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] Online backups and tar

2006-06-16 Thread Wes

1. Section 23.3.2 (Making a Base backup) of the 8.1.0 doc says:

"Also, some versions of GNU tar consider it an error if a file is changed
while tar is copying it. There does not seem to be any very convenient way
to distinguish this error from other types of errors, other than manual
inspection of tar¹s messages. GNU tar is therefore not the best tool for
making base backups."

What then, on linux (Red Hat 3.0 ES), is the recommended backup program for
online backups?  It appears cpio does the same thing.

2. During a restore, postmaster tries to restore a file '0001.history'.
Neither that file, nor any *.history file, is anywhere to be found.  I can
not find this documented anywhere.  What is this file?  Does it have to do
with timelines?


Wes



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Adding foreign key constraints without integrity check?

2006-06-18 Thread Wes

Is there a way to add a foreign key constraint without having to wait for it
to check the consistency of all existing records?  If a database is being
reloaded (pg_dumpall then load), it really shouldn't be necessary to check
the referential integrity - or at least I should be able to stipulate that I
am accepting that risk.

My database reload is currently taking about 6 hours to load the data, 42
hours to reindex, and about another 40 hours or so to check the foreign key
constraints (about 1.2 billion rows).  That's a very long time to be out of
commission.  I'd really like to eliminate that second 40 hours so I can get
it down to a normal weekend.

Wes 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-18 Thread Wes

On 6/18/06 10:48 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

>> My database reload is currently taking about 6 hours to load the data, 42
>> hours to reindex, and about another 40 hours or so to check the foreign key
>> constraints (about 1.2 billion rows).
> 
> What PG version is this, and what have you got maintenance_work_mem set to?
> Undersized m_w_m would hurt both index build and FK checking ...

Sorry, forgot that info..

PG is 8.1.4 (I'm testing an upgrade from 7.3.4 to 8.1.4).

maintenance_work_mem  is set to 983025 - 1.5 times the previous value of
655350.  Current RSS of postmaster is about 1.3 GB.

System memory is 2GB (would like more, but...).  Data is on one array
(hardware RAID 5 of eight 73 GB 10k SCSI drives), indexes on another
(hardware RAID 5 of five 15K 146 GB SCSI drives on a separate channel), and
pg_xlog on a third RAID 1 on a third channel).  There are two 2.4 GHz Xeon
processors).

Wes

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

> You could create the fk-constraints _first_, then disable them, load
> the data, reindex, and reenable them afterwards.
> 
> pg_dump/pg_restore can enable and disable fk-constraints before restoring
> the data, I believe. It does so by tweaking the system catalogs.

Are referring to '--disable-triggers' on pg_dump?  Will this work for
foreign key constraints?  The doc talks about triggers, but doesn't say
anything about FK constraints (are these implemented as triggers?)  I don't
use pg_restore, just psql.

> The only problem I can see is that creating the fk-constraints might create
> some indices too. But maybe you can manually drop those indices afterwards - I
> don't know if the fk really _depends_ on the index, or if it creates it only
> for convenience.

I don't see any indexes being added to the table beyond what I add, and
those added as a primary key constraint.  Currently, pg_dump outputs the FK
constraints after the indexes are built, as the last steps.  If I try to add
the FK constraints after loading the database definitions, but without any
indexes, I'm not sure what would happen.
 
Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

> What PG version is this, and what have you got maintenance_work_mem set to?
> Undersized m_w_m would hurt both index build and FK checking ...

Looking at the stats again, I did see something unexpected.  With MWM set to
983025, index builds were running about 1.3 GB for both RSS and virtual.
Adding foreign key constraints, RSS is about 1.1 GB but virtual is slightly
over 2 GB.

Wes



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

On 6/19/06 3:24 PM, "Scott Marlowe" <[EMAIL PROTECTED]> wrote:

> Are you sure that's really the problem?  Do you have indexes on the
> referring tables (i.e. the foreign key that points to the other table's
> primary key).  Not having an index on the subordinate table makes each
> and every check on the FK->PK relationship require a seq scan of the
> subordinate table.

Yes, I am sure I have the indexes for both sides.  (I just double checked).
If not, wouldn't I have abysmal load performance in production, or does that
function differently?

The constraint that just finished after 18+ hours has a non-composite index
on both sides.  The other constraint:

ADD CONSTRAINT "$2" FOREIGN KEY (recipient) REFERENCES
addresses(address_key);

has a normal index on address_key.  The recipient index is (recipient, date)
composite index.  This index has always been used when searching on just
recipient, and the last time I rebuilt the database (a year ago), the FK
addition was inline with expectations.

In every case, it takes about the same time to add the foreign key
constraint as to create the index, maybe a little more.  This is true
regardless of whether one of the indexes is composite or not.  One
constraint build just finished after a little more than 18 hours.  The table
where the constraint is being created has about 900 million rows.  The
'references x' table has about 200 million rows.  It churns along eventually
eating up about 50 GB or so in the 'base' directory. When that stops
growing, it took maybe another hour or two to complete.

I'd just like to be able to tell it to simply add the constraint without
doing any checking.  I know the data is consistent.  I just dumped it from a
database that has referential integrity enabled.  Even if there were an
error that crept in to the old database, I don't care - just add the
constraint so I can get back online.  Right now I'm looking at around 80-90
hours total to do a reload.  What's it going to be a year or two from now?
I could be out of commission for a week.

Wes

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

On 6/19/06 3:47 PM, "Jim Nasby" <[EMAIL PROTECTED]> wrote:

> FWIW, RAID5 isn't normally a good recipe for good database performance.

Understood, but RAID 1 isn't always feasible.  The database performs very
well with the current hardware configuration - I have no complaints.  The
only time we have issues is during index rebuilds or a reload.

> This won't help with constraint checking, but you should consider
> turning fsync off during your restore.

I didn't think about doing that, but I wonder how much it would help?  The
actual data load only takes about 6 hours or so.  It's all the index
building and constraint checking that takes the time.  This time seems to be
spent mostly writing temp files and sorting.

> Also, why are you frequently dumping and restoring?

We have to periodically reindex due to the fact that vacuum processes
indexes in index order instead of disk space order.  A 'vacuum full' is
currently taking about 24 hours.  After a reindex, I expect that to drop to
3-4 hours until the index becomes fragmented.  I don't think much can be
done about the time to reindex (or the order vacuum processes indexes).

I wouldn't say we dump/reload frequently.  Right now I'm doing a trial run
for the 8.x upgrade - I have to make sure I can complete it in the allotted
time.  However, about a year ago, I had to do multiple reloads in a 2 week
period.  We had database corruption, and had to reload from source to get a
good database - multiple times.  We had a defective disk array that wasn't
reporting any errors but kept corrupting the database.

Besides PG upgrades, I have to consider how long it takes to recover from
system failures if I can't use the offline file system backup.  I'll be
switching to online backups once we get upgraded, but if a reload fails
there, I'll again have to fall back to the weekly source backup.

Wes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

On 6/19/06 4:48 PM, "Scott Marlowe" <[EMAIL PROTECTED]> wrote:

> 
> I think Jim was thinking of RAID 1+0.  If you gotta use RAID 5, at least
> have a contoller with several hundred megs of battery backed cache.
> Better yet, use a RAID controller with a gig or so of BBU cache and run
> RAID 1+0 on it. If you can.

Yes, I realize that is ideal (I meant to write RAID 10 not 1).  That takes
hardware I don't have, and can't justify spending the money on right now.
I/O doesn't appear to be a major issue, although I wasn't constantly
monitoring it.  Most of the time, the I/O load was not that high - CPU was
pegged. Normal production updates and queries are well within (not at all
low) expectations.

> Actually, the same question about why your frequently dumping and
> restoring applies to full vacuums.  Why are you doing them?  A properly
> running database should not need vacuum fulls.  Nor reindexing.

Simple...  VACUUM FULL reads the entire database.  Between a full source
dump and a full vacuum, I've got a better chance of finding and recovering
from corruption sooner.  Once bit...  There is no (last time I checked)
utility to verify full database integrity, and there are no CRC's on data
blocks.  A pg_dumpall doesn't touch the indexes.  While this won't catch
everything, seems pretty good at catching corrupted indexes.  I've seen no
more effective ways of verifying the entire database.  The kind of hardware
problem we encountered last year is rare, but it does happen.  I've seen
similar things many times over the years.  RAID doesn't help you when a
controller mirrors garbage.

> You should look into slony.  You can replicate from one version of pgsql
> to another, a feature intended to be used for updating.

I'd love to, but that requires hardware I don't have.

Wes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-19 Thread Wes

BTW, I do appreciate any and all suggestions.  I hope my last message didn't
come across otherwise.  That's not what was intended.  It's just that
adding/replacing hardware is not an option right now.   Maybe next year...
I'm still trying to dig up another 2GB memory.

The database actually performs extremely well with the current hardware in
the vast majority of cases.  I just wish I could cut off the wasted 40+
hours adding the foreign key constraints - it would cut the reload time in
half.

It's too bad the suggestion of creating the FK before hand didn't pan out.
That looked like a good shot.

Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-20 Thread Wes

On 6/20/06 5:07 AM, "Florian G. Pflug" <[EMAIL PROTECTED]> wrote:

> My suggestion was to create the fk _before_ loading the data, and disable it
> similarly to what "--disable-triggers" doest. It turned out, however, that a
> FK always depends on a unique index (be it a primary key, or not), which
> prevents
> my plan from working :-(

That was a great idea - too bad it didn't pan out.

I don't suppose there's any (reasonable) way to directly insert into the
system tables to create the constraint?  I could knock almost 2 days off of
the almost 4 days to reload if I could solve this.

Wes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-21 Thread Wes

On 6/20/06 8:17 PM, "Florian G. Pflug" <[EMAIL PROTECTED]> wrote:

> Maybe you could ask at some postgresql support companies how much effort it
> would
> be to add a "without check" flag to "alter table add constraint foreign key",
> and
> how much they'd charge for it...

Or if I get ambitious, dig into the code myself if I can figure out where to
start...

Wes

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Adding foreign key constraints without integrity

2006-06-23 Thread Wes

On 6/22/06 2:57 PM, "Jim Nasby" <[EMAIL PROTECTED]> wrote:

> If nothing else, you should bring it up on -hackers and ask to have
> this added as a TODO. It seems like a worth-while addition to pg_dump/
> restore to me...

Thanks for the suggestion.

> To answer another of your emails in this thread... a LAZY vacuum of
> the entire database will read every table and index in the database.
> Wanting to read the entire database to check for corruption is no
> reason to do a VACUUM FULL.

Sorry, I misspoke.  I do not do a VACUUM FULL.  I do a VACUUM VERBOSE
ANALYZE on a weekly basis.  The verbose allows me to see which
tables/indexes are getting bad (vacuum time wise).  This is currently taking
about 24 hours on a weekend (very little production load competing with it).
The time drops dramatically after a reindex, then creeps up again as the
indexes are updated in random order.

> Also, if you're curious about restore
> time for your upgrade, you should be doing the restore to an 8.1.4
> database, not to your current version. There's been a ton of
> performance improvements made. In fact, I'm wondering if constraint
> checking in restore has been improved...

Yes, that is what I did.  I'm in the process of testing an upgrade from
7.3.x to 8.1.4 - export from 7.3.x and import into 8.1.4.  Unfortunately,
I'm sitting at about 90 hours when I've got about an 80 hour window on a
long weekend...

Wes

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Lock changes with 8.1 - what's the right lock?

2006-07-16 Thread Wes

8.1 improved locking for foreign key references but had an unexpected
consequence to our application - no parallel loads in our application.

The application does an EXCLUSIVE lock on 'addresses'.  It then gets all of
the keys from 'addresses' it needs, and adds new ones encountered in this
load.  It then completes the transaction, releases the exclusive lock, and
inserts the other table's records using the values read from/inserted into
'addresses'.

There are foreign key constraints between the various tables and 'addresses'
to insure referential integrity.

Previously (pgsql 7.4.5), multiple loads would run simultaneously - and
occasionally got 'deadlock detected' with the foreign key locks even though
they were referenced in sorted order.  When loading tables other than
'addresses', foreign key locks did not prevent other jobs from grabbing the
exclusive lock on 'addresses'.

With 8.1.4, the foreign key locks prevent other instances from grabbing the
lock, so they wait until the first job is complete - only one job loads at a
time.

About EXCLUSIVE locks, the manual says: "...only reads from the table can
proceed in parallel with a transaction holding this lock mode."

What is now the appropriate lock?  It needs to:

  1. Prevent others from updating the table
  2. Block other jobs that are requesting the same lock (if job 2 does a
SELECT and finds nothing, it will try to create the record that job 1 may
already have created in its transaction).
  3. Not conflict with foreign key reference locks

SHARE does not appear to be appropriate - it would fail #2.  Maybe "SHARE
UPDATE EXCLUSIVE"?

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Lock changes with 8.1 - what's the right lock?

2006-07-25 Thread Wes

> Did you try the example I posted?  What happens if you do the
> following?  First set up a couple of tables:

I no longer have 7.4 running.  I'll have to check and see if can test.

> Is it possible that your foreign key constraints are DEFERRED in
> 7.4 but not in 8.1?  That would change transaction behavior that
> depends on locks.

No, there was no change to the database other than to start using table
spaces in 8.1.  Previously when multiple jobs were running concurrently,
they'd block while updating 'addresses', then take off and complete - unless
they hit a foreign key that someone else had referenced in which case they'd
block again.  Now one job waits at the initial addresses lock until the
other is completely done running (i.e. not updating tables that reference
addresses).

> An ordinary SELECT acquires ACCESS SHARE, which conflicts only with
> the strongest lock, ACCESS EXCLUSIVE.  If you're modifying a table
> that has a foreign key constraint then 8.1 does a SELECT FOR SHARE
> on the referenced table and earlier versions do a SELECT FOR UPDATE;
> both cases acquire ROW SHARE on the referenced table.  Two lock
> types are stronger than SHARE ROW EXCLUSIVE: EXCLUSIVE and ACCESS
> EXCLUSIVE, both of which conflict with ROW SHARE.  So instead of
> saying that SHARE ROW EXCLUSIVE is the weakest lock that meets your
> requirements, I should have said that it's the only lock that does.

That's what I thought from reading the manual, but wasn't sure.  Thanks.

I found the information that led me to believe the locking was fixed in 8.1.
The problem I had with 7.x was the 'deadlock detected' if the foreign keys
weren't referenced in sorted order, as the records were locked with more
than a share lock.

>> I haven't seen anything to indicate that 8.x improves foreign key
>> refererence locking and fixes the foreign key 'deadlock detected' issue.
>> Has that been addressed in 8.1?
> 
> 8.1 should be using the new shared row locks for doing the checks.  This
> should fix the case mentioned.
>
>> I've never quite understood why a READ of a record with a foreign key
>> reference results in the referenced record being locked with more than a
>> shared lock.
> 
> Up until now, we didn't have one to get on a per-record basis.

and 

>> So, until 8.1 PostgreSQL had "something better than row-level locking" for
>> some things, but no row locking when needed?  Or was it row locking is
>> there, but just no shared row locking?
> 
> The latter, the row locks before were single owner and were such that a
> second lock request for the same row would wait for the first to be
> released. Now effectively you have two levels of locks at the row level,
> the weaker of which conflicts with the stronger but not with itself. The
> thing about MVCC is that readers do not have to get either lock if they
> aren't trying to prevent modifications.


Wes



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Lock changes with 8.1 - what's the right lock?

2006-07-25 Thread Wes

> Unless I'm misunderstanding you or a bug was fixed between 7.4.5
> and 7.4.13 (the version I'm running), I'm not convinced that last
> statement is true.  EXCLUSIVE conflicts with all lock types except
> ACCESS SHARE; foreign key references prior to 8.1 use SELECT FOR
> UPDATE and in 8.1 they use SELECT FOR SHARE, but in both cases they
> acquire ROW SHARE on the referenced table, which conflicts with
> EXCLUSIVE.

My apologies for being so unclear.  I had intended to just indicate that the
problem occurred when we upgraded from 7.4.5 to 8.1, as opposed to this
being fixed specifically in 8.1.  I didn't realize this was fixed in a 7.4
release - I thought it was 8.x.  The last time I'd checked, it hadn't been
fixed in 7.x.

>> What is now the appropriate lock?  It needs to:
>> 
>>   1. Prevent others from updating the table
>>   2. Block other jobs that are requesting the same lock (if job 2 does a
>> SELECT and finds nothing, it will try to create the record that job 1 may
>> already have created in its transaction).
>>   3. Not conflict with foreign key reference locks
> 
> SHARE ROW EXCLUSIVE is the weakest lock that meets these requirements.
> It conflicts with itself (#2) and with ROW EXCLUSIVE, which UPDATE,
> DELETE, and INSERT acquire (#1), but doesn't conflict with ROW SHARE,
> which is what SELECT FOR UPDATE/SHARE acquire (#3).

Thanks for the confirmation.  Is there any stronger lock that would not
block SELECT foreign key references?  I didn't find any documentation on
what type of lock is grabbed by a when a foreign key is referenced during
SELECT (or other).

Wes



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[GENERAL] How to determine a database is intact?

2004-09-02 Thread Wes

On a nightly basis, we shut the database down and do a file system backup.

A short chronology of our database problem:

8/21 - count(*) of user tables succeeded (done once a week to get
statistics)

8/23 - A specific search on a specific value (one out of over 2 million)
caused postmaster to SEGV.  I dropped the index in question and rebuilt it.
All appeared ok.

8/28 - count(*) failed - postmaster aborted on a SEGV.  I attempted a
pg_dumpall.  Postmaster aborted with SEGV about 240 MB into the dump (the
database is about 50 GB).

I reloaded 8/25 database and attempted a pg_dumpall.  Same failure.

I reloaded from the 8/21 file system dump.  Pg_dumpall worked on this one.
I spent the next almost 48 hours dumping and reloading 240 million rows and
reprocessing the several million additions since 8/21.  I didn't dare use
the 8/21 database without reloading because I didn't know if it was good or
not.


Question:

How can we tell that a database is intact?  In the above example, pg_dumpall
worked on the 8/21 database.  Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse?  Pg_dumpall tells you nothing about
the condition of indexes.  Could a corrupt index corrupt data blocks?

I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database.  When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.

Back to the original question...  How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [GENERAL] How to determine a database is intact?

2004-09-03 Thread Wes

On 9/3/04 3:11 AM, "Richard Huxton" <[EMAIL PROTECTED]> wrote:

> You shouldn't have to verify anything. PG's job is to never corrupt your
> data, and providing your hardware is good it should do so. If you are
> getting problems almost daily that would suggest a RAM/disk problem to
> me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
> record of reliability is pretty good.

I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL  or out of range).  The problem is not occurring on a daily basis.  The
database has been in service since December of last year.  It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.

> Steps I'd take:
> 1. Check your version number against the release notes and see if you
> should upgrade. You don't mention your version, but it's always worth
> having the last dot-release (7.2.5, 7.3.7, 7.4.5)
> 2. Schedule time to run memory/disk tests against your hardware. Finding
> 48 hours might not be easy, but you need to know where you stand.
> 3. Setup slony or some other replication so I can schedule my downtime.

I thought I mentioned the level in my original mail - 7.4.1.  We are
planning on running some diagnostics.

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question.  The database can become
corrupt.  How can I tell that a database is fully intact at any given point
in time?  If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] How to determine a database is intact?

2004-09-08 Thread Wes

On 9/5/04 9:04 AM, "Jan Wieck" <[EMAIL PROTECTED]> wrote:

> 24 hours to do what? The pg_dumpall, the restore or both?

There's more than 250 million rows.  If I remember right, it's ballpark 25%
data reload, 75% index/foreign constraint rebuild.  Pg_dumpall is something
like 3 hours or so.

Wes


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Wes

On 9/9/04 11:07 AM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote:

> Believe it or not, we haven't gotten many requests for this feature,
> partly because such corruption is so rare.  Also, any checker isn't
> going to find a change from "Baker" to "Faker" in a text field.

Yep, unless you added a CRC (and accepted the performance hit) to each
record the best you could do is verify that the database is consistent.
That would still be quite valuable, though - all block headers are valid,
indexes don't point out into oblivion, etc.

I expect there are only a handful of huge databases running a heavy load -
the vast majority are probably tens no larger than 10's (maybe hundreds) of
megabytes, or do not experience a heavy update load?

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Wes

On 9/9/04 9:27 AM, "Vivek Khera" <[EMAIL PROTECTED]> wrote:

> I shaved significant time from 7.4.x restores by bumping up the
> checkpoint_segments to 50.  My dumps currently take about 1.5 hours
> over a 100Mbit ethernet (server is on gig ethernet, dumping client is
> on 100Mbit).

Mine are already set to 30, but in a recent re-index, I saw warnings in the
log so need to bump it up some more.  I'm going to likely be doing a reload
this weekend, so will include this in the run.

Wes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[GENERAL] Returning a row from a function with an appended array field

2011-11-09 Thread Wes Cravens

I have an adjacency list kind of table

CREATE TABLE thingy (
id int,
parent int
);

I'd like to be able to write a procedural function that returns a row or
rows from this table with an appended field that represents the children.

Something like this pseudo code:

FOR row IN SELECT * FROM thingy
LOOP
  RETURN NEXT row,[SELECT id FROM thingy WHERE parent_id = id]
END LOOP,
RETURN

Any help much appreciated,

Wes

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Returning a row from a function with an appended array field

2011-11-09 Thread Wes Cravens

On 11/9/2011 7:34 PM, David Johnston wrote:
> On Nov 9, 2011, at 20:19, Wes Cravens  wrote:
> 
>> I have an adjacency list kind of table
>>
>> CREATE TABLE thingy (
>>id int,
>>parent int
>> );
>>
>> I'd like to be able to write a procedural function that returns a row or
>> rows from this table with an appended field that represents the children.
>>
>> Something like this pseudo code:
>>
>> FOR row IN SELECT * FROM thingy
>> LOOP
>>  RETURN NEXT row,[SELECT id FROM thingy WHERE parent_id = id]
>> END LOOP,
>> RETURN
>>
>> Any help much appreciated,
>>
>> Wes
>>
> 
> Use "WITH RECURSIVE" instead of a function.
> 

I apologize but I don't know how that would work.  An example would help.

Also... my pseudo code above was a little flawed:
  RETURN NEXT row,[SELECT id FROM thingy WHERE parent_id = id]
should be...
  RETURN NEXT row,[SELECT id FROM thingy WHERE parent_id = row.id]


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Returning a row from a function with an appended array field

2011-11-10 Thread Wes Cravens

On 11/9/2011 7:19 PM, Wes Cravens wrote:
> I have an adjacency list kind of table
> 
> CREATE TABLE thingy (
>   id int,
>   parent int
> );
> 
> I'd like to be able to write a procedural function that returns a row or
> rows from this table with an appended field that represents the children.

Just in case someone else want's an answer to this tread... and despite
the fact that the postgresql documentation is excellent and has plenty
of examples, WITH RECURSIVE is still a bad solution...

I already needed an independent get_children function:

  CREATE OR REPLACE
FUNCTION get_children (
  lookup_id INT
) RETURNS
int[] AS
$$
  SELECT array_agg( id )
  FROM (
SELECT id
  FROM thingy
  WHERE parent_id = $1
  ORDER BY id
  ) t;
$$LANGUAGE
'sql';

And I just used that in a view to get what I wanted:

CREATE OR REPLACE VIEW thingy_view AS
SELECT *,get_children(id) AS children FROM thingy;

I then updated all of my other get_ accessor postgresql functions to use
the view instead of the base table.

FTW

Wes

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Returning a row from a function with an appended array field

2011-11-10 Thread Wes Cravens

On 11/10/2011 12:05 PM, David Johnston wrote:
> -Original Message-
> From: pgsql-general-ow...@postgresql.org
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Wes Cravens
> Sent: Thursday, November 10, 2011 11:54 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Returning a row from a function with an appended
> array field
> 
> On 11/9/2011 7:19 PM, Wes Cravens wrote:
>> I have an adjacency list kind of table
>>
>> CREATE TABLE thingy (
>>  id int,
>>  parent int
>> );
>>
>> I'd like to be able to write a procedural function that returns a row 
>> or rows from this table with an appended field that represents the
> children.
> 
> 
> If you only care about one level of hierarchy then, yes, WITH RECURSIVE is
> overkill.  You want to use WITH RECURSIVE in those situations where the
> depth of the hierarchy is unknown.

Yes agreed... WITH RECURSIVE would be handy for something like
get_ancestors or get_descendents.

Wes

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Vacuum time degrading

2005-03-04 Thread Wes Palmer

On 3/2/05 3:51 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> I was going to suggest
> REINDEXing those indexes to see if that cuts the vacuum time at all.

The problem with that is it takes a very long time.  I've got a couple of
things to try yet on the kswapd problem.  If that doesn't work, maybe I can
rebuild one of the indexes and see how much that one improves.  I wasn't
aware that the indexes were scanned non-sequentially.  The under one hour
time was probably shortly after a full reload.  Any chance of change that
behavior to scan in physical storage order?

The index from the largest table that has:

  CPU 216.15s/18.13u sec elapsed 2110.84 sec.

is inserted in sequential order.  The index

  CPU 518.88s/25.17u sec elapsed 10825.33 sec.

has records inserted in essentially a random order, and is also something
like twice as large (key size).

We're going to try to test the 2.4.29 kernel tomorrow.

Wes

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] password

2005-10-21 Thread Wes Williams




What 
platform?  Solaris, FreeBSD, Linux, Windows
 

  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]On Behalf Of Michael 
  UvhagenSent: Friday, October 21, 2005 6:10 AMTo: 
  pgsql-general@postgresql.orgSubject: [GENERAL] 
  passwordHi.I was installing postgresql 8.0 and let it 
  generate a password for me, then i changed my mind and went back to change the 
  password, but then i was stuck. How do i find the password it generated for 
  me? How do i change it? How can i start over and install postgresql somehow? 
  Thanks!/Mike

Re: [GENERAL] PostgreSQL vs mySQL, any performance difference for

2005-10-25 Thread Wes Williams

For what it may be worth, executing the same commands into MySQL
5.0.15-nt-max (Win XP Pro) the following it received:

mysql> create table test (i1 int);
Query OK, 0 rows affected (0.41 sec)

mysql> insert into test values (123913284723498723423);
ERROR 1264 (22003): Out of range value adjusted for column (i1) at row 1

mysql> select * from test;
Empty set (0.03 sec)


Finally an improvement!


-Original Message-
mysql> create table test (i1 int);
Query OK, 0 rows affected (0.07 sec)

mysql> insert into test values (123913284723498723423);
Query OK, 1 row affected, 2 warnings (0.07 sec)

mysql> select * from test;
++
| i1 |
++
| 2147483647 |
++
1 row in set (0.00 sec)

And other fun things.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] PostgreSQL vs mySQL, any performance difference for

2005-10-25 Thread Wes Williams

set sql_mode='MYSQL323';
Query OK, 0 rows affected (0.00 sec)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Scott Marlowe
Sent: Tuesday, October 25, 2005 3:24 PM
To: Wes Williams
Cc: 'Jan'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL vs mySQL, any performance difference
for


Cool.  Does it still error out after issueing:

set sql_mode='MYSQL323';

???

Just wondering if bounds checking is still optional but is now op-out
instead of opt-in, or if it's something that you can no longer turn off.

The whole idea of correct behaviour being an option is pretty bad, but
hopefully at least the defaults are for proper behaviour, or are heading
that way.

On Tue, 2005-10-25 at 13:52, Wes Williams wrote:
> For what it may be worth, executing the same commands into MySQL
> 5.0.15-nt-max (Win XP Pro) the following it received:
> 
> mysql> create table test (i1 int);
> Query OK, 0 rows affected (0.41 sec)
> 
> mysql> insert into test values (123913284723498723423);
> ERROR 1264 (22003): Out of range value adjusted for column (i1) at row 1
> 
> mysql> select * from test;
> Empty set (0.03 sec)
> 
> 
> Finally an improvement!
> 
> 
> -Original Message-
> mysql> create table test (i1 int);
> Query OK, 0 rows affected (0.07 sec)
> 
> mysql> insert into test values (123913284723498723423);
> Query OK, 1 row affected, 2 warnings (0.07 sec)
> 
> mysql> select * from test;
> ++
> | i1 |
> ++
> | 2147483647 |
> ++
> 1 row in set (0.00 sec)
> 
> And other fun things.
> 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Why database is corrupted after re-booting

2005-10-26 Thread Wes Williams

Type the following at the Windows command prompt (start, run, "cmd"):

convert c: /fs:ntfs /v

It will complain about locked files and perform the convert at the next
reboot, which you should do immediately.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Joshua D. Drake
Sent: Wednesday, October 26, 2005 1:10 PM
To: Andrus
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Why database is corrupted after re-booting


On Wed, 2005-10-26 at 19:14 +0300, Andrus wrote:
> > To change partition types you need to re-format (resetting partitions
> > will lose data structure - reformat required).
>
> Troy,
>
> Whole my IDE drive is 20 GB FAT32 C: drive booting XP
> I have a lot of data in this drive so it is not possible to re-format.
Also
> I do'nt want to create two logical disks in single drive.
>
> Is this prevents data corruption for Postgres, is there some utility which
> can convert C: drive to NTFS ?
> Can Partition Magic help ?

XP at least on install I believe has the ability to convert to NTFS.

Have you tried just right clicking on your C: selecting properties
and then seeing if there is a convert option?

>
> Andrus
>
>
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

1 2 >

1 - 100 of 118 matches

Mail list logo