Re: [HACKERS] bgwriter changes

2004-12-15 Thread Simon Riggs
On Tue, 2004-12-14 at 13:30, Neil Conway wrote:
> In recent discussion[1] with Simon Riggs, there has been some talk of 
> making some changes to the bgwriter. To summarize the problem, the 
> bgwriter currently scans the entire T1+T2 buffer lists and returns a 
> list of all the currently dirty buffers. It then selects a subset of 
> that list (computed using bgwriter_percent and bgwriter_maxpages) to 
> flush to disk. Not only does this mean we can end up scanning a 
> significant portion of shared_buffers for every invocation of the 
> bgwriter, we also do the scan while holding the BufMgrLock, likely 
> hurting scalability.

Neil's summary is very clear, many thanks.

There has been many suggestions, patches and test results, so I have
attempted to summarise everything here, using Neil's post to give
structure to the other information:

> I think a fix for this in some fashion is warranted for 8.0. Possible 
> solutions:

I add 2 things to this structure
i) the name of the patch that implements that (authors initials)
ii) benchmark results published that run those

> (1) Special-case bgwriter_percent=100. The only reason we need to return 
> a list of all the dirty buffers is so that we can choose n% of them to 
> satisfy bgwriter_percent. That is obviously unnecessary if we have 
> bgwriter_percent=100. I think this change won't help most users, 
> *unless* we also change bgwriter_percent=100 in the default configuration.

100pct.patch (SR)

Test results to date:
1. Mark Kirkwood ([HACKERS] [Testperf-general] BufferSync and bgwriter)
pgbench 1xCPU 1xDisk shared_buffers=1
showed 8.0RC1 had regressed compared with 7.4.6, but patch improved
performance significantly against 8.0RC1

Discounted now by both Neil and myself, since the same idea has been
more generally implemented as ideas (2) and (3) below.

> (2) Remove bgwriter_percent. I have yet to hear anyone argue that 
> there's an actual need for bgwriter_percent in tuning bgwriter behavior, 
> and one less GUC var is a good thing, all else being equal. This is 
> effectively the same as #1 with the default changed, only less flexibility.

There are 2 patches published which do same thing:
- Partially implemented following Neil's suggestion: bg3.patch (SR)
- Fully implemented: bgwriter_rem_percent-1.patch (NC)
Patches have an identical effect on performance.

Test results to date:
1. Neil's testing was "inconclusive" for shared_buffers = 2500 on a
single cpu, single disk system (test used bgwriter_rem_percent-1.patch)
2. Mark Wong's OSDL tests published as test 211
analysis already posted on this thread; 
dbt-2 4 CPU, many disk, shared_buffers=6 (test used bg3.patch)
3% overall benefit, greatly reduced max transaction times
3. Mark Kirkwood's tests
pgbench 2xCPU 2xdisk, shared_buffers=1 (test used
bgwriter_rem_percent-1.patch)
Showed slight regression against RC1 - must be test variability because
the patch does less work and is very unlikely to cause a regression

> (3) Change the meaning of bgwriter_percent, per Simon's proposal. Make 
> it mean "the percentage of the buffer pool to scan, at most, to look for 
> dirty buffers". I don't think this is workable, at least not at this 
> point in the release cycle, because it means we might not smooth of 
> checkpoint load, one of the primary goals of the bgwriter (in this 
> proposal bgwriter would only ever consider writing out a small subset of 
> the total shared buffer cache: the least-recently-used n%, with 2% being 
> a suggested default). Some variant of this might be worth exploring for 
> 8.1 though.

Implemented as bg2.patch (SR)
Contains a small bug, easily fixed, which would not effect performance

Test results to date:
1. Mark Kirkwood's tests
pgbench 2xCPU 2xdisk, shared_buffers=1 (test used bg2.patch)
Showed improvement on RC1 and best option out of all three tests
(compared RC1, bg2.patch, bgwriter_rem_percent-1.patch), possibly
similar within bounds of test variability - but interesting enough to
investigate further.

Current situation seems to be:
- all test results indicate performance regressions in RC1 when
shared_buffers >= 1 and using multi-cpu/multi-disk systems
- option (2) has the most thoroughly confirmable test results and is
thought by all parties to be the simplest and most robust approach.
- some more test results would be useful to compare, to ensure that
applying the patch would be useful in all circumstances.

Approach (3) looks interesting and should be investigated for 8.1, since
it introduces a subtlely different algorithm that may have "interesting
flight characteristics" and is more of a risk to the 8.0 release.

Thanks very much to all performance testers. It's important work.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread Jan Wieck
On 12/12/2004 5:08 PM, Simon Riggs wrote:
On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
Simon Riggs wrote:
> If the bgwriter_percent = 100, then we should actually do the sensible
> thing and prepare the list that we need, i.e. limit
> StrategyDirtyBufferList to finding at most bgwriter_maxpages.
Is the plan to make bgwriter_percent = 100 the default setting?
Hmm...must confess that my only plan is:
i) discover dynamic behaviour of bgwriter
ii) fix any bugs or wierdness as quickly as possible
iii) try to find a way to set the bgwriter defaults
I'm worried that we're late in the day for changes, but I'm equally
worried that a) the bgwriter is very tuning sensitive b) we don't really
have much info on how to set the defaults in a meaningful way for the
majority of cases c) there are some issues that greatly reduce the
effectiveness of the bgwriter in many circumstances.
The 100pct.patch was my first attempt at getting something acceptable in
the next few days that gives sufficient room for the DBA to perform
tuning.
Doesn't cranking up the bgwriter_percent to 100 effectively make the 
entire shared memory a write-through cache? In other words, with 100% 
the bgwriter will allways write all dirty blocks out and it becomes 
unlikely to avoid an IO for subsequent modificaitons to the same data block.

Jan
On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
I wonder if we even need to retain the bgwriter_percent GUC var. Is 
there actually a situation in which the combination of bgwriter_maxpages 
and bgwriter_delay does not give the DBA sufficient flexibility in 
tuning bgwriter behavior?
Yes, I do now think that only two GUCs are required to tune the
behaviour; but you make me think - which two? Right now, bgwriter_delay
is useless because the O(N) behaviour makes it impossible to set any
lower when you have a large shared_buffers. (I see that as a bug)
Your question has made me rethink the exact objective of the bgwriter's
actions: The way it is coded now the bgwriter looks for dirty blocks, no
matter where they are in the list. What we are bothered about is the
number of clean buffers at the LRU, which has a direct influence on the
probability that BufferAlloc() will need to call FlushBuffer(), since
StrategyGetBuffer() returns the first unpinned buffer, dirty or not.
After further thought, I would prefer a subtle change in behaviour so
that the bgwriter checks that clean blocks are available at the LRUs for
when buffer replacement occurs. With that slight change, I'd keep the
bgwriter_percent GUC but make it mean something different.
bgwriter_percent would be the % of shared_buffers that are searched
(from the LRU end) to see if they contain dirty buffers, which are then
written to disk.  That means the number of dirty blocks written to disk
is between 0 and the number of buffers searched, but we're not hugely
bothered what that number is... [This change to StrategyDirtyBufferList
resolves the unusability of the bgwriter with large shared_buffers]
Writing away dirty blocks towards the MRU end is more likely to be
wasted effort. If a block stays near the MRU then it will be dirty again
in the wink of an eye, so you gain nothing at checkpoint time by
cleaning it. Also, since it isn't near the LRU, cleaning it has no
effect on buffer replacement I/O. If a block is at the LRU, then it is
by definition the least likely to be reused, and is a candidate for
replacement anyway. So concentrating on the LRU, not the number of dirty
buffers seems to be the better thing to do.
That would then be a much simpler way of setting the defaults. With that
definition, we would set the defaults:
bgwriter_percent = 2 (according to my new suggestion here)
bgwriter_delay = 200
bgwriter_maxpages = -1 (i.e. mostly ignore it, but keep it for fine
tuning)
Thus, for the default shared_buffers=1000 the bgwriter would clear a
space of up to 20 blocks each cycle.
For a config with shared_buffers=6, the bgwriter default would clear
space for 600 blocks (max) each cycle - a reasonable setting.
Overall that would need very little specific tuning, because it would
scale upwards as you changed the shared_buffers higher.
So, that interpretation of bgwriter_percent gives these advantages:
- we bound the StrategyDirtyBufferList scan to a small % of the whole
list, rather than the whole list...so we could realistically set the
bgwriter_delay lower if required
- we can set a default that scales, so would not often need to change it
- the parameter is defined in terms of the thing we really care about:
sufficient clean blocks at the LRU of the buffer lists
- these changes are very isolated and actually minor - just a different
way of specifying which buffers the bgwriter will clean
Patch attached...again for discussion and to help understanding of this
proposal. Will submit to patches if we agree it seems like the best way
to allow the bgwriter defaults to be sensibly set.
[...and yes, everybody, I do know where we are in the release cycle]

-

Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread Jan Wieck
On 12/12/2004 9:43 PM, Neil Conway wrote:
On Sun, 2004-12-12 at 22:08 +, Simon Riggs wrote:
> On Sun, 2004-12-12 at 05:46, Neil Conway wrote:
> Is the plan to make bgwriter_percent = 100 the default setting?
Hmm...must confess that my only plan is:
i) discover dynamic behaviour of bgwriter
ii) fix any bugs or wierdness as quickly as possible
iii) try to find a way to set the bgwriter defaults
I was just curious why you were bothering to special-case
bgwriter_percent = 100 if it's not going to be the default setting (in
which case I would be surprised if more than 1 in 10 users would take
advantage of the patch).
Right now, bgwriter_delay
is useless because the O(N) behaviour makes it impossible to set any
lower when you have a large shared_buffers.
BTW, I wouldn't be _too_ worried about O(N) behavior, except that we do
this scan while holding the BufMgrLock, which is a well known source of
contention. So reducing the time we hold that lock would be good.
Your question has made me rethink the exact objective of the bgwriter's
actions: The way it is coded now the bgwriter looks for dirty blocks, no
matter where they are in the list.
Not sure what you mean. StrategyDirtyBufferList() returns the specified
number of dirty buffers in order, starting with the T1/T2 LRUs and going
back to the MRUs of both lists. bgwriter_percent effectively ignores
some portion of the tail of that list, so we end up just flushing the
buffers closest to the L1/L2 LRUs. How is this different from what
you're describing?
bgwriter_percent would be the % of shared_buffers that are searched
(from the LRU end) to see if they contain dirty buffers, which are
then written to disk.
By definition, buffers closest to the LRU end of the lists are not
frequently accessed. If we only search the N% of the lists closest to
LRU, we will probably end up flushing just those pages to disk -- and
then not flushing anything else to disk in the subsequent bgwriter calls
because all the buffers close to the LRU will be non-dirty. That's okay
if all we're concerned about is avoiding write() by a real backend, but
we also want to smooth out checkpoint load, which I don't think this
approach would do well.
I suggest just getting rid of bgwriter_percent: AFAICS bgwriter_maxpages
is all the tuning we need, and I think "max # of pages to write" is a
simpler and more logical tuning knob than "% of the buffer pool to scan
looking for dirty buffers." So at each bufmgr invocation, we pick the at
most bgwriter_maxpages dirty pages from the pool, using the pages
closest to the LRUs of T1 and T2. I'd be happy to supply a patch to
implement that if you think it sounds okay.
I too don't think that this approach will retain the checkpoing smooting 
effect, the current implementation has.

The real problem is that the "cleaner" the buffer pool is, the longer 
the scan for dirty buffers will take because the dirty blocks tend to be 
at the very end of the scan order. The real solution for this would be 
not to scan the whole pool, but to maintain a separate chain of only 
dirty buffers in LRU order.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] production server down

2004-12-15 Thread Joe Conway
Tom Lane wrote:
My advice is to backup the $PGDATA tree (which you said was in
progress), then pg_resetxlog, then cross-check the hell out of the data
you see.  Only if you can detect some data problems can we guess at
something else to do ...
Before running pg_resetxlog, a couple of questions:
1. Since it appears that pg_control is suspect, should I force it to be
   rebuilt, and if so, how?
2. At the end of GuessControlValues is this comment:
  /*
   * XXX eventually, should try to grovel through old XLOG to develop
   * more accurate values for startupid, nextXID, and nextOID.
   */
   What would be involved in doing this, and do you think it would be
   worth it?
Thanks,
Joe
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Jan Wieck
On 12/14/2004 2:40 PM, Tom Lane wrote:
"Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
Is it possible to do a patch that produces a dirty buffer list in LRU order
and stops early when eighter maxpages is reached or bgwriter_percent
pages are scanned ?
Only if you redefine the meaning of bgwriter_percent.  At present it's
defined by reference to the total number of dirty pages, and that can't
be known without collecting them all.
If it were, say, a percentage of the total length of the T1/T2 lists,
then we'd have some chance of stopping the scan early.
That definition is identical to a fixed maximum number of pages to write 
per call. And since that parameter exists too, it would be redundant.

The other way around would make sense. In order to avoid writing the 
busiest buffers at all (except for checkpoinging), the parameter should 
mean "don't scan the last x% of the queue at all".

Still, we need to avoid scanning over all the clean blocks of a large 
buffer pool, so there is need for a separate dirty-LRU.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] [INTERFACES] postgresql-7.4.5

2004-12-15 Thread Dave Cramer
That driver should work, however you might have something else 
misconfigured.
What error are you getting ?

You should post this to the jdbc list
Dave
ElayaRaja S wrote:
Hi,
   I am using postgresql-7.4.5. I nee to use the jdbc connection So i
downloaded 4 versions of driver( pg74.215.jdbc1.jar,
pg74.215.jdbc2.jar, pg74.215.jdbc2ee.jar, pg74.215.jdbc3.jar). I am
uanble to connect it. Please let me know which version of driver i
have to use for postgresql-7.4.5 ?
 

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] production server down

2004-12-15 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> I don't trust it at all. So does that imply that I should override next 
> transaction id and WAL starting address per the manpage?

Yes, override everything there's a switch for.  Also check that the
other values shown by pg_controldata look reasonable (the locale
settings are probably the only ones you might get burned on).

>> What if anything have you got in $PGDATA/pg_xlog?
>> 
> -rw---  1 postgres postgres 16777216 Dec 13 15:39 01650077

Um.  That's so far from the values shown in pg_control that it's not funny.

This is 7.4, right?  I have a crude xlog dump tool that I'll send you
off-list.  We should be able to identify the latest checkpoint in the
existing XLOG files, and that will give you something to work with.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] production server down

2004-12-15 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> > The server experienced a hang (as yet unexplained) yesterday and was 
> > restarted at 2004-12-13 16:38:49 according to syslog. I'm told by the 
> > network admin that there was a problem with the network card on restart, 
> > so the nfs mount most probably disappeared and then reappeared 
> > underneath a quiescent postgresql at some point between 2004-12-13 
> > 16:39:55 and 2004-12-14 15:36:20 (but much closer to the former than the 
> > latter).
> 
> I've always felt that running a database across NFS was a Bad Idea ;-)

Well not that I disagree with that sentiment, but NFS was specifically
designed to handle this particular scenario. *UNLESS* you use the "soft"
option. As popular as it is, this is precisely the scenario where it causes
problems.

(The "intr" option as well, but I don't think that would be relevant for
postgres).


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread simon

Josh Berkus <[EMAIL PROTECTED]> wrote on 15.12.2004, 18:36:53:
> Hmmm, I've not seen this.  For example, with people who are having trouble 
> with checkpoint spikes on Linux, I've taken to recommending that they call 
> sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).   
> Believe it or not, this does help smooth out the spikes and give better 
> overall performance in a many-small-writes situation.

Yes, but bgwriter needs to issue the writes first before the kernel
cache can be flushed, which is the activity I've been focusing on. If
the bgwriter isn't writing enough, flushing the cache is pointless. If
the bgwriter is writing too much, then thats a waste and likely causing
buffer list contention.

> Simon, one of the problems with the OSDL-DBT2 test is that it's too steady.   
> DBT2 gives a constant stream of small writes at a regular, predictable rate.  
>  
> This does not, in fact, match any real-world application I know.

Clearly, OSDL-DBT2 is not a real world test! That is its benefit, since
it is heavily instrumented and we are able to re-run it many times
without different parameter settings. The application is well known and
doesn't suffer that badly from factors that would allow certain effects
to be swamped. If it had too much randomness or variation, it would be
difficult to interpret. 

My goal has been to tune the server, not to derive marketing numbers.
What DBT-2 does reveal is where contention occurs within the PostgreSQL
server. If things are bad in a static, well known workload then they
will be much more erratic in the choppy waters of the real world.
Simulating reality isn't what any of us need to do - there's always a
system to look at and be confused by its peaks and troughs, user
complaints and hardware idiosyncracies.

DBT2 is just one workload amongst many you can choose as your "tuning
goal". The investigations on that have, IMHO, been genuinely useful in
discovering performance problems in the server. Mark's efforts to
improve the instrumentation of the tests will be useful on other
workloads also.

I'd encourage you to develop variations of DBT2 that can also be used to
tune the server, hopefully running on OSDL. I support open testing
methods as much as I support open development methods and projects.

DBT3 next...

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] production server down

2004-12-15 Thread Joe Conway
Tom Lane wrote:
Joe Conway <[EMAIL PROTECTED]> writes:
I don't trust it at all. So does that imply that I should override
next transaction id and WAL starting address per the manpage?
Yes, override everything there's a switch for.  Also check that the 
other values shown by pg_controldata look reasonable (the locale 
settings are probably the only ones you might get burned on).
OK
What if anything have you got in $PGDATA/pg_xlog?
-rw---  1 postgres postgres 16777216 Dec 13 15:39
01650077
Um.  That's so far from the values shown in pg_control that it's not
funny.
This is 7.4, right?
Correct.

I have a crude xlog dump tool that I'll send you off-list.  We should
be able to identify the latest checkpoint in the existing XLOG files,
and that will give you something to work with.
Thanks,
Joe
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Identifying time of last stat reset via sql

2004-12-15 Thread Ed L.

I asked this on general, but didn't receive any responses.  Is it possible 
via SQL to identify the time of the last stat reset (or pg_stat_reset() 
call)?  This is what I'm lacking to be able to measure query activity 
volume over time via SQL.  Maybe a function similar to the fictitious 
pg_stat_get_last_reset_time() below?

select sum(n_tup_ins + n_tup_upd + n_tup_del)
/ (now() - pg_stat_get_last_reset_time()) as write_qps
from pg_stat_all_tables

If not, would this be considered worthy of a TODO item?  
Or is there another built-in approach designed for determining QPS, etc?

Ed



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] possible wierd boolean bug?

2004-12-15 Thread Merlin Moncure
> "Merlin Moncure" <[EMAIL PROTECTED]> writes:
> > I confirmed the problem on a linux server running beta3...so this
> > problem is quite reproducible by running the attached scripts on a
> > freshly loaded database.
> 
> The attached patch fixes the problem for me.
> 
>   regards, tom lane
>

Yup.  I just confirmed it from cvs.  btw, the 4 rows I was complaining
about was due to my erroneously reading the analyze output, so
everything works ok now.  Thanks!

Anyways, it would be nice to be able to use the sql row constructor to
do equality/comparison...wouldn't get caught writing such silly sql
statements :)

Merlin

 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] race condition for drop schema cascade?

2004-12-15 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I have seen this failure several times, but not consistently, on the 
> buildfarm member otter (Debian/MIPS) and possible on others, and am 
> wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.

Hard to see what, considering that there's only one backend touching
that tablespace in the test.  I'd be inclined to wonder if there's
a filesystem-level problem on that platform.  What filesystem are you
running on anyway?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] strange regression failure

2004-12-15 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
 

Has anyone got any idea on why I see things like this from time to time? 
It seems to be intermittent, which is odd.
   

I have a suspicion that the unexpected result occurs when the client
encoding is UTF8, because psql/mbprint.c's ucs_wcwidth() function goes
belly up on control characters (which it probably shouldn't; for that
matter, that entire file should be removed and replaced with our real
multibyte support...).
 


Well, based on your recent classification of the TODO list as items that 
are either very hard or that noone wants to do, perhaps it deserves an 
entry ;-)

What's not apparent is why the failure wouldn't be consistent.  You'd
think the buildfarm would always run the test with the same encoding
selection.  Any idea how that might not happen?
 

No. buildfarm itself does nothing to set the encoding for either the 
server or the client. I guess we should make sure to use C/SQL_ASCII as 
the locale/encoding settings when we initdb after the temp install. 
Right now any environment settings inherited by the script are passed 
through. I guess for the sake of completeness I should also set 
PGCLIENTENCODING.

But I too find the inconsistency a mystery. I have seen ity once on my 
test machine (dog) here: 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dog&dt=2004-11-30%2019:22:01

The time suggests that unlike my usual tests which are run from cron at 
6 minutes past the hour this one was probably run by hand from the 
command line for some reason, where I have LANG set to "en_US.UTF-8" - 
don't know if that makes a difference.

cheers
andrew

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] strange regression failure

2004-12-15 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
   

Has anyone got any idea on why I see things like this from time to time? 
It seems to be intermittent, which is odd.
 

 

I have a suspicion that the unexpected result occurs when the client
encoding is UTF8,
   

In fact, I was able to duplicate the failure after "initdb -E unicode".
So it would seem that the question is why the buildfarm environment
isn't controlling the locale setting.  You must sometimes be running
with a LANG or LC_ALL setting that selects a unicode-based locale.
 

Our emails just crossed. You got it. I will have buildfarm force SQL_ASCII.
cheers
andrew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] production server down

2004-12-15 Thread Joe Conway
Greg Stark wrote:
Tom Lane <[EMAIL PROTECTED]> writes:
I've always felt that running a database across NFS was a Bad Idea ;-)
Well not that I disagree with that sentiment, but NFS was specifically
designed to handle this particular scenario. *UNLESS* you use the "soft"
option. As popular as it is, this is precisely the scenario where it causes
problems.
(The "intr" option as well, but I don't think that would be relevant for
postgres).
I checked; doesn't seem to be an issue:
# cat /etc/fstab
csddata7-vlan35:/vol/vol0/replica /replica nfs 
proto=tcp,suid,rw,vers=3,proto=tcp,timeo=600,retrans=2,hard,fg,rsize=8192,wsize=8192 
0 0

I'd be interested in any feedback on the settings. The connection 
between server and NetApp is on a private vlan on a separate network 
interface than the rest of the LAN. I'm not positive, but I think we 
have jumbo frames (9K) enabled on that interface ... looks like we do:

# ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:09:6B:E6:33:B7
  [...]
  UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
Thanks,
Joe
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] strange regression failure

2004-12-15 Thread Tom Lane
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> Has anyone got any idea on why I see things like this from time to time? 
>> It seems to be intermittent, which is odd.

> I have a suspicion that the unexpected result occurs when the client
> encoding is UTF8,

In fact, I was able to duplicate the failure after "initdb -E unicode".
So it would seem that the question is why the buildfarm environment
isn't controlling the locale setting.  You must sometimes be running
with a LANG or LC_ALL setting that selects a unicode-based locale.

> because psql/mbprint.c's ucs_wcwidth() function goes
> belly up on control characters (which it probably shouldn't; for that
> matter, that entire file should be removed and replaced with our real
> multibyte support...).

I'm more than half tempted to do this, but I suppose it's too late in
the release cycle ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] strange regression failure

2004-12-15 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> On Wed, Dec 15, 2004 at 05:20:28PM -0500, Andrew Dunstan wrote:
>> Our emails just crossed. You got it. I will have buildfarm force SQL_ASCII.

> But then you will lose reports using other encodings ...

"make check" forces C locale anyway.  It's only the installed server
used by the contrib tests that could be affected.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] somebody working on: Prevent default re-use of sysids

2004-12-15 Thread Bruce Momjian
Matthias Schmidt wrote:
> Hi Tom,
> 
> after beeing offline because of a chrashed box, I able to mail again.
> I  would like to volunteer for the uptime() function. Is that OK?

Sure.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] possible wierd boolean bug?

2004-12-15 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> Anyways, it would be nice to be able to use the sql row constructor to
> do equality/comparison...wouldn't get caught writing such silly sql
> statements :)

You mean like this?

regression=# select row(1,2,3) = row(1,2,3);
 ?column?
--
 t
(1 row)

regression=# select row(1,2,3) = row(1,2,4);
 ?column?
--
 f
(1 row)

The semantics aren't right yet for non-equality comparisons, but it
works fine for = and != ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] getting 'order by' working with unicode locale? ICU?

2004-12-15 Thread Palle Girgensohn
Hi!
I'm using Postgresql on FreeBSD, and would like to get "order by" to work 
with unicode. The OS does have collation implemented for unicode (UTF-8) 
locales. Some freebsd people point me towards IBM:s ICU kit.

How much effort would be required to get postgresql to sort properly, 
mainly using the sv_SE.UTF-8 locale (so the problem is not *that* hard, I 
don't need to sort Chinese [yet] :). What needs to be done to get 
postgresql to use ICU (or some other working mechanism?)

Thanks,
Palle

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] strange regression failure

2004-12-15 Thread Andrew Dunstan

Tom Lane wrote:
Alvaro Herrera <[EMAIL PROTECTED]> writes:
 

On Wed, Dec 15, 2004 at 05:20:28PM -0500, Andrew Dunstan wrote:
   

Our emails just crossed. You got it. I will have buildfarm force SQL_ASCII.
 

 

But then you will lose reports using other encodings ...
   

"make check" forces C locale anyway.  It's only the installed server
used by the contrib tests that could be affected.
 

Yes, quite so. The tiny change has gone into buildfarm's CVS.
Alvaro does have a point, though - there are many things we don't test 
that possibly we should - including locale and encoding differences. A 
better testing regime for the PLs is another thing that springs to mind.

Also, currently buildfarm just runs postgres' own test suites. I'm not 
averse to supporting a more extensive test suite just for farm members, 
if people think that's a good idea.

just some blue sky ideas
cheers
andrew

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] strange regression failure

2004-12-15 Thread Neil Conway
On Wed, 2004-12-15 at 21:33 -0500, Andrew Dunstan wrote:
> Also, currently buildfarm just runs postgres' own test suites. I'm not 
> averse to supporting a more extensive test suite just for farm members, 
> if people think that's a good idea.

I think you'd get more mileage out of expanding the existing PG test
suites, or adding new ones -- and then running those additional tests in
the build farm. But I agree that there's a lot we're not testing that we
should.

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD

> > > and stops early when eighter maxpages is reached or bgwriter_percent
> > > pages are scanned ?
> > 
> > Only if you redefine the meaning of bgwriter_percent.  At present it's
> > defined by reference to the total number of dirty pages, and that can't
> > be known without collecting them all.
> > 
> > If it were, say, a percentage of the total length of the T1/T2 lists,
> > then we'd have some chance of stopping the scan early.
> 
> ...which was exactly what was proposed for option (3).

But the benchmark run was with bgwriter_percent 100. I wanted to point out,
that I think 100% is too much (writes hot pages multiple times between 
checkpoints).
In the benchmark, bgwriter obviously falls behind, the delay is too long. But 
if you
reduce the delay you will start to see what I mean.

Actually I think what is really needed is a max number of pages we want dirty 
during checkpoint. Since that would again require scanning all pages, the next 
best
definition would imho be stop at a percentage (or a number of pages short) of 
total T1/T2. 
Then you can still calculate a worst case IO for checkpoint (assume that all 
hot pages are dirty) 

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: Re: [HACKERS] bgwriter changes

2004-12-15 Thread simon

Zeugswetter Andreas DAZ SD <[EMAIL PROTECTED]> wrote on
15.12.2004, 11:39:44:
> 
> > > > and stops early when eighter maxpages is reached or bgwriter_percent
> > > > pages are scanned ?
> > > 
> > > Only if you redefine the meaning of bgwriter_percent.  At present it's
> > > defined by reference to the total number of dirty pages, and that can't
> > > be known without collecting them all.
> > > 
> > > If it were, say, a percentage of the total length of the T1/T2 lists,
> > > then we'd have some chance of stopping the scan early.
> > 
> > ...which was exactly what was proposed for option (3).
> 
> But the benchmark run was with bgwriter_percent 100. 

Yes, it was for run 211, but the patch that was used effectively
disabled bgwriter_percent in favour of looking only at
bgwriter_maxpages.

The patch used was not exactly what was being discussed here. In that
patch, StrategyDirtyBufferList scans until it find bgwriter_maxpages
dirty pages, then stops. That means a varying number of buffers on the
list are scanned, starting from the LRU.

What is being suggested here was implemented for bg2.patch. The
algorithm in there was for StrategyDirtyBufferList to scan until it had
looked at the dirty/clean status of bgwriter_maxpages buffers. That
means a constant number of buffers on the list are scanned, starting
from the LRU. 

The two alternative algorithms are similar, but have these differences:
The former (option (2)) finds a constant number of dirty pages, though
has varying search time. The latter (option (3)) has constant search
time, yet finds a varying number of dirty pages. Both alternatives
avoid scanning the whole of the buffer list, as is the case in 8.0RC1,
allowing the bgwriter to act more frequently at lower cost.

There's some evidence that the second algorithm may be better, but may
have other characteristics or side-effects that we don't yet know. So
At this stage of the game, I'm happier not to progress option (3) any
further for r8.0, since option(2) is closest to the one that has been
through beta-testing.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Zeugswetter Andreas DAZ SD

> The two alternative algorithms are similar, but have these 
> differences:
> The former (option (2)) finds a constant number of dirty pages, though
> has varying search time.

This has the disadvantage of converging against 0 dirty pages.
A system that has less than maxpages dirty will write every page with 
every bgwriter run.

> The latter (option (3)) has constant search
> time, yet finds a varying number of dirty pages.

This might have the disadvantage of either leaving too much for the 
checkpoint or writing too many dirty pages in one run. Is writing a lot 
in one run actually a problem though ? Or does the bgwriter pause
periodically while writing the pages of one run ?
If this is expressed in pages it would naturally need to be more than the 
current maxpages (to accomodate for clean pages). The suggested 2% sounded 
way too low for me (that leaves 98% to the checkpoint).

Also I think we are doing too frequent checkpoints with bgwriter in 
place. Every 15-30 minutes should be sufficient, even for benchmarks.
We need a tuned bgwriter for this though.

Andreas

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] production server down

2004-12-15 Thread Alvaro Herrera
On Tue, Dec 14, 2004 at 09:22:42PM -0800, Joe Conway wrote:

> # pg_controldata /replica/pgdata

> Current log file ID:  0
> Next log file segment:1
> Latest checkpoint location:   0/9B0B8C
> Prior checkpoint location:0/9AA1B4
> Latest checkpoint's REDO location:0/9B0B8C
> Latest checkpoint's UNDO location:0/0
> Latest checkpoint's StartUpID:12
> Latest checkpoint's NextXID:  536
> Latest checkpoint's NextOID:  17142

Isn't it strange that these values are so close to the values found in a
just-initdb'd cluster?

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: RE: Re: [HACKERS] bgwriter changes

2004-12-15 Thread simon

Zeugswetter Andreas DAZ SD <[EMAIL PROTECTED]> wrote on
15.12.2004, 15:33:16:
> 
> > The two alternative algorithms are similar, but have these 
> > differences:
> > The former (option (2)) finds a constant number of dirty pages, though
> > has varying search time.
> 
> This has the disadvantage of converging against 0 dirty pages.
> A system that has less than maxpages dirty will write every page with 
> every bgwriter run.

Yes, that is my issue with that algorithm it causes more contention
when there are less dirty pages.
 
> > The latter (option (3)) has constant search
> > time, yet finds a varying number of dirty pages.
> 
> This might have the disadvantage of either leaving too much for the 
> checkpoint or writing too many dirty pages in one run. Is writing a lot 
> in one run actually a problem though ? Or does the bgwriter pause
> periodically while writing the pages of one run ?
> If this is expressed in pages it would naturally need to be more than the 
> current maxpages (to accomodate for clean pages). The suggested 2% sounded 
> way too low for me (that leaves 98% to the checkpoint).

This remains to be seen. We have Mark Kirkwood's test results that show
that the algorithm may work better, but no large scale OSDL run as yet.

My view is that the 2% is misleading. The whole buffer list is like a
conveyor belt moving towards the LRU. It is my *conjecture* that
cleaning the LRU would be sufficient to clean the whole list
eventually. Blocks in the buffer list that always stay near the MRU
would be dirtied again quickly even if you did clean them, so if they
don't reach nearly to the LRU then there is less benefit in cleaning
them. (1%, 2% or 5% would need to be a tunable factor; 2% was the
suggested default)

If the bgwriter writes too often it would get in the way of other work,
so there is clearly an optimum setting for any workload.

> Also I think we are doing too frequent checkpoints with bgwriter in 
> place. Every 15-30 minutes should be sufficient, even for benchmarks.
> We need a tuned bgwriter for this though.

Well, yes, you're right. ...but the bug limiting us to 255 files
restricts us there for higher performance situations.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[HACKERS] strange regression failure

2004-12-15 Thread Andrew Dunstan
Has anyone got any idea on why I see things like this from time to time? 
It seems to be intermittent, which is odd.

You can see it on occasional ContribCheck failures on buildfarm.
cheers
andrew
= pgsql.2428/contrib/tsearch2/regression.diffs 
===
*** ./expected/tsearch2.out Wed Dec 15 10:34:05 2004
--- ./results/tsearch2.out  Wed Dec 15 10:47:34 2004
***
*** 2090,2096 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('sea&thousand&years'));
 headline   
 
! 
---
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
--- 2090,2096 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('sea&thousand&years'));
headline  
! 
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
***
*** 2107,2113 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('granite&sea'));
   headline 

! 
--
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
--- 2107,2113 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('granite&sea'));
 headline   
! --
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
***
*** 2124,2130 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('sea'));
   headline 
 
! 
---
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
--- 2124,2130 
  Upon a woman s face. E.  J.  Pratt  (1882 1964)
  ', to_tsquery('sea'));
 headline   
! --
   sea a thousand years,
  A thousand years to trace
  The granite features of this cliff
***
*** 2144,2150 
  ', 
  to_tsquery('sea&foo'), 'HighlightAll=true');

headline
   
! 
-
   
  
  
--- 2144,2150 
  ', 
  to_tsquery('sea&foo'), 'HighlightAll=true');
   headline 
! --
   
  
  

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] production server down

2004-12-15 Thread Tom Lane
Joe Conway <[EMAIL PROTECTED]> writes:
> Before running pg_resetxlog, a couple of questions:

> 1. Since it appears that pg_control is suspect, should I force it to be
> rebuilt, and if so, how?

pg_resetxlog will rebuild it in any case.  However it will re-use the
existing contents as much as it can (if you don't use any of the command
line options to override values).  Given Alvaro's observation that the
existing file looks suspiciously close to a freshly-initdb'd one, I
don't think you want to trust the existing contents.

> 2. At the end of GuessControlValues is this comment:
>/*
> * XXX eventually, should try to grovel through old XLOG to develop
> * more accurate values for startupid, nextXID, and nextOID.
> */
> What would be involved in doing this, and do you think it would be
> worth it?

What if anything have you got in $PGDATA/pg_xlog?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Tom Lane
Jan Wieck <[EMAIL PROTECTED]> writes:
> Still, we need to avoid scanning over all the clean blocks of a large 
> buffer pool, so there is need for a separate dirty-LRU.

That's not happening, unless you want to undo the cntxDirty stuff,
with unknown implications for performance and deadlock safety.  It's
definitely not happening in 8.0 ;-)

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] production server down

2004-12-15 Thread Joe Conway
Tom Lane wrote:
pg_resetxlog will rebuild it in any case.  However it will re-use the
existing contents as much as it can (if you don't use any of the command
line options to override values).  Given Alvaro's observation that the
existing file looks suspiciously close to a freshly-initdb'd one, I
don't think you want to trust the existing contents.
I don't trust it at all. So does that imply that I should override next 
transaction id and WAL starting address per the manpage?

What if anything have you got in $PGDATA/pg_xlog?
# pwd
/replica/pgdata/pg_xlog
# ll
total 688836
drwx--  2 postgres postgres32768 Dec 13 15:47 .
drwx--  6 postgres postgres 4096 Dec 14 17:45 ..
-rw---  1 postgres postgres 16777216 Dec 13 16:02 0165004E
-rw---  1 postgres postgres 16777216 Dec 13 06:42 0165004F
-rw---  1 postgres postgres 16777216 Dec 13 06:55 01650050
-rw---  1 postgres postgres 16777216 Dec 13 07:21 01650051
-rw---  1 postgres postgres 16777216 Dec 13 07:41 01650052
-rw---  1 postgres postgres 16777216 Dec 13 07:57 01650053
-rw---  1 postgres postgres 16777216 Dec 13 08:00 01650054
-rw---  1 postgres postgres 16777216 Dec 13 08:04 01650055
-rw---  1 postgres postgres 16777216 Dec 13 08:09 01650056
-rw---  1 postgres postgres 16777216 Dec 13 08:13 01650057
-rw---  1 postgres postgres 16777216 Dec 13 08:26 01650058
-rw---  1 postgres postgres 16777216 Dec 13 08:42 01650059
-rw---  1 postgres postgres 16777216 Dec 13 09:09 0165005A
-rw---  1 postgres postgres 16777216 Dec 13 09:23 0165005B
-rw---  1 postgres postgres 16777216 Dec 13 09:40 0165005C
-rw---  1 postgres postgres 16777216 Dec 13 09:51 0165005D
-rw---  1 postgres postgres 16777216 Dec 13 09:58 0165005E
-rw---  1 postgres postgres 16777216 Dec 13 10:03 0165005F
-rw---  1 postgres postgres 16777216 Dec 13 10:09 01650060
-rw---  1 postgres postgres 16777216 Dec 13 10:24 01650061
-rw---  1 postgres postgres 16777216 Dec 13 10:37 01650062
-rw---  1 postgres postgres 16777216 Dec 13 10:56 01650063
-rw---  1 postgres postgres 16777216 Dec 13 11:11 01650064
-rw---  1 postgres postgres 16777216 Dec 13 11:38 01650065
-rw---  1 postgres postgres 16777216 Dec 13 11:52 01650066
-rw---  1 postgres postgres 16777216 Dec 13 11:56 01650067
-rw---  1 postgres postgres 16777216 Dec 13 12:04 01650068
-rw---  1 postgres postgres 16777216 Dec 13 12:07 01650069
-rw---  1 postgres postgres 16777216 Dec 13 12:17 0165006A
-rw---  1 postgres postgres 16777216 Dec 13 12:29 0165006B
-rw---  1 postgres postgres 16777216 Dec 13 12:52 0165006C
-rw---  1 postgres postgres 16777216 Dec 13 13:15 0165006D
-rw---  1 postgres postgres 16777216 Dec 13 13:36 0165006E
-rw---  1 postgres postgres 16777216 Dec 13 13:51 0165006F
-rw---  1 postgres postgres 16777216 Dec 13 13:59 01650070
-rw---  1 postgres postgres 16777216 Dec 13 14:06 01650071
-rw---  1 postgres postgres 16777216 Dec 13 14:10 01650072
-rw---  1 postgres postgres 16777216 Dec 13 14:15 01650073
-rw---  1 postgres postgres 16777216 Dec 13 14:37 01650074
-rw---  1 postgres postgres 16777216 Dec 13 14:51 01650075
-rw---  1 postgres postgres 16777216 Dec 13 15:17 01650076
-rw---  1 postgres postgres 16777216 Dec 13 15:39 01650077
Joe
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Jan Wieck
On 12/15/2004 12:10 PM, Tom Lane wrote:
Jan Wieck <[EMAIL PROTECTED]> writes:
Still, we need to avoid scanning over all the clean blocks of a large 
buffer pool, so there is need for a separate dirty-LRU.
That's not happening, unless you want to undo the cntxDirty stuff,
with unknown implications for performance and deadlock safety.  It's
definitely not happening in 8.0 ;-)
Sure not.
Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread Josh Berkus
Jan,

> I too don't think that this approach will retain the checkpoing smooting
> effect, the current implementation has.
>
> The real problem is that the "cleaner" the buffer pool is, the longer
> the scan for dirty buffers will take because the dirty blocks tend to be
> at the very end of the scan order. The real solution for this would be
> not to scan the whole pool, but to maintain a separate chain of only
> dirty buffers in LRU order.

Hmmm, I've not seen this.  For example, with people who are having trouble 
with checkpoint spikes on Linux, I've taken to recommending that they call 
sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!).   
Believe it or not, this does help smooth out the spikes and give better 
overall performance in a many-small-writes situation.

Simon, one of the problems with the OSDL-DBT2 test is that it's too steady.   
DBT2 gives a constant stream of small writes at a regular, predictable rate.   
This does not, in fact, match any real-world application I know.

To allow DBT2 to be used for real bgwriter benchmarking, Mark would need to 
change the following:

1) Randomize the timing of the commits, so that sometimes there is only 30 
writes/minute, and other times there is 300.   A timing pattern that would 
produce a "sine wave" with occasional random spikes would be best; in my 
experience, OLTP applications tend to have wave-like spikes and lulls.

2) Include a sprinkling of random or regular "large writes" which affect 
several tables and 1000's of rows.   For example, once per hour, change 
10,000 pending orders to "shipped", and archive 10,000 "old orders" to an 
archive table.

However, this would require "splitting" DBT2; there's the DBT2 which simulates 
the TPC-C test, and the DBT2 which will help us tune for real-world 
applications.   The two tests will not be the same.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread Josh Berkus
Folks,

> To allow DBT2 to be used for real bgwriter benchmarking, Mark would need to
> change the following:
>
> 1) Randomize the timing of the commits, so that sometimes there is only 30
> writes/minute, and other times there is 300.   A timing pattern that would
> produce a "sine wave" with occasional random spikes would be best; in my
> experience, OLTP applications tend to have wave-like spikes and lulls.
>
> 2) Include a sprinkling of random or regular "large writes" which affect
> several tables and 1000's of rows.   For example, once per hour, change
> 10,000 pending orders to "shipped", and archive 10,000 "old orders" to an
> archive table.

Oh, also we need to:
3) Run the test for 3+ hours after scaling up, and turn on autovacuum.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [Testperf-general] BufferSync and bgwriter

2004-12-15 Thread Josh Berkus
Simon,

> Clearly, OSDL-DBT2 is not a real world test! That is its benefit, since
> it is heavily instrumented and we are able to re-run it many times
> without different parameter settings. The application is well known and
> doesn't suffer that badly from factors that would allow certain effects
> to be swamped. If it had too much randomness or variation, it would be
> difficult to interpret.

I don't think you followed me.   The issue is that for parameters designed to 
"smooth out spikes" like bgwriter and vacuum delay, it helps to have really 
bad spikes to begin with.   There's a possibility that the parameters (and 
calculations) that work well for for a "steady-state" OLTP application are 
actually bad for an application with much more erratic usage, just as a high 
sort_mem is good for DSS and bad for OLTP.

> Mark's efforts to
> improve the instrumentation of the tests will be useful on other
> workloads also.

Yep, it's been a lot of help.   Heck, this is the first time we've had 
parameters based on planned tests and not just anecdotes.  That's a huge step 
forward.

I'm just suggesting that we can improve the test still further specifically 
for testing things like bgwriter.

> I'd encourage you to develop variations of DBT2 that can also be used to
> tune the server, hopefully running on OSDL. I support open testing
> methods as much as I support open development methods and projects.

Yeah, I'll just have to do it in a different programming language ;-b

>
> DBT3 next...

Yes, I started setting up a 200GB DBT3 database on one of OSDL's machines.  
You're welcome to it, I don't see myself completeting those tests before the 
holidays.  Want login?

--Josh

-- 
__Aglio Database Solutions___
Josh BerkusConsultant
[EMAIL PROTECTED]www.agliodbs.com
Ph: 415-752-2500Fax: 415-752-2387
2166 Hayes Suite 200San Francisco, CA

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] possible wierd boolean bug?

2004-12-15 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> I confirmed the problem on a linux server running beta3...so this
> problem is quite reproducible by running the attached scripts on a
> freshly loaded database.

The attached patch fixes the problem for me.

regards, tom lane

*** src/backend/access/nbtree/nbtutils.c.orig   Sun Aug 29 01:06:40 2004
--- src/backend/access/nbtree/nbtutils.cWed Dec 15 14:00:59 2004
***
*** 224,234 
BTScanOpaque so = (BTScanOpaque) scan->opaque;
int numberOfKeys = scan->numberOfKeys;
int new_numberOfKeys;
ScanKey inkeys;
ScanKey outkeys;
ScanKey cur;
ScanKey xform[BTMaxStrategyNumber];
-   boolallEqualSoFar;
boolhasOtherTypeEqual;
Datum   test;
int i,
--- 224,234 
BTScanOpaque so = (BTScanOpaque) scan->opaque;
int numberOfKeys = scan->numberOfKeys;
int new_numberOfKeys;
+   int numberOfEqualCols;
ScanKey inkeys;
ScanKey outkeys;
ScanKey cur;
ScanKey xform[BTMaxStrategyNumber];
boolhasOtherTypeEqual;
Datum   test;
int i,
***
*** 278,284 
 * Otherwise, do the full set of pushups.
 */
new_numberOfKeys = 0;
!   allEqualSoFar = true;
  
/*
 * Initialize for processing of keys for attr 1.
--- 278,284 
 * Otherwise, do the full set of pushups.
 */
new_numberOfKeys = 0;
!   numberOfEqualCols = 0;
  
/*
 * Initialize for processing of keys for attr 1.
***
*** 321,327 
 */
if (i == numberOfKeys || cur->sk_attno != attno)
{
!   boolpriorAllEqualSoFar = allEqualSoFar;
  
/* check input keys are correctly ordered */
if (i < numberOfKeys && cur->sk_attno != attno + 1)
--- 321,327 
 */
if (i == numberOfKeys || cur->sk_attno != attno)
{
!   int priorNumberOfEqualCols = 
numberOfEqualCols;
  
/* check input keys are correctly ordered */
if (i < numberOfKeys && cur->sk_attno != attno + 1)
***
*** 355,368 
xform[BTLessEqualStrategyNumber - 1] = NULL;
xform[BTGreaterEqualStrategyNumber - 1] = NULL;
xform[BTGreaterStrategyNumber - 1] = NULL;
}
else
{
!   /*
!* If no "=" for this key, we're done with 
required keys
!*/
!   if (!hasOtherTypeEqual)
!   allEqualSoFar = false;
}
  
/* keep only one of <, <= */
--- 355,368 
xform[BTLessEqualStrategyNumber - 1] = NULL;
xform[BTGreaterEqualStrategyNumber - 1] = NULL;
xform[BTGreaterStrategyNumber - 1] = NULL;
+   /* track number of attrs for which we have "=" 
keys */
+   numberOfEqualCols++;
}
else
{
!   /* track number of attrs for which we have "=" 
keys */
!   if (hasOtherTypeEqual)
!   numberOfEqualCols++;
}
  
/* keep only one of <, <= */
***
*** 411,417 
 * If all attrs before this one had "=", include these 
keys
 * into the required-keys count.
 */
!   if (priorAllEqualSoFar)
so->numberOfRequiredKeys = new_numberOfKeys;
  
/*
--- 411,417 
 * If all attrs before this one had "=", include these 
keys
 * into the required-keys count.
 */
!   if (priorNumberOfEqualCols == attno - 1)
so->numberOfRequiredKeys = new_numberOfKeys;
  
/*
***
*** 468,475 
 * If unique index and we have equality keys for all columns, set
 * keys_are_unique flag for higher levels.
 */
!   if (allEqualSoFar && relation->rd_index->ind

Re: [HACKERS] somebody working on: Prevent default re-use of sysids for dropped users and groups?

2004-12-15 Thread Matthias Schmidt
Hi Tom,
after beeing offline because of a chrashed box, I able to mail again.
I  would like to volunteer for the uptime() function. Is that OK?
cheers,
Matthias
Am 13.12.2004 um 03:31 schrieb Bruce Momjian:
Matthias Schmidt wrote:
Am 07.12.2004 um 19:24 schrieb Tom Lane:
Matthias Schmidt <[EMAIL PROTECTED]> writes:
By the way: Do you have an idea about a small or
medium sized task from the TODO-List for a newbee, which gets me up 
to
speed?(!Win32)
A lot of the tasks listed under DATA TYPES are fairly self-contained
problems ... but what draws your interest?  I think most of the tasks
that are on TODO are there either because they're hard problems, or
because nobody was very interested in them.  Unfortunately they are 
not
marked as to which is which :-(.  Make a list of things you find
interesting, and we can tell you if they're very hard or already
claimed.

regards, tom lane

Hi Tom,
this is a list of things I'm interested in:
1) Allow administrators to safely terminate individual sessions
   Right now, SIGTERM will terminate a session, but it is treated as
   though the postmaster has paniced and shared memory might not be
   cleaned up properly.  A new signal is needed for safe termination.
This one seems hard.
2) Add a function that returns the 'uptime' of the postmaster
Seems like a good starting point.
3) Have SHOW ALL and pg_settings show descriptions for server-side
variables
Also a good one for you.
--
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 
19073


--
Matthias Schmidt
Viehtriftstr. 49
67346 Speyer
Tel.: +49 6232 4867
Fax.: +49 6232 640089
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] production server down

2004-12-15 Thread Joe Conway
Tom Lane wrote:
Yes, override everything there's a switch for.  Also check that the
other values shown by pg_controldata look reasonable (the locale
settings are probably the only ones you might get burned on).

What if anything have you got in $PGDATA/pg_xlog?
-rw---  1 postgres postgres 16777216 Dec 13 15:39 01650077
Um.  That's so far from the values shown in pg_control that it's not funny.
This is 7.4, right?  I have a crude xlog dump tool that I'll send you
off-list.  We should be able to identify the latest checkpoint in the
existing XLOG files, and that will give you something to work with.
Just wanted to close the loop for the sake of the list archives. With 
Tom's xlog dump tool I was able (with a bunch of his help off-list) to 
identify the needed parameters for pg_resetxlog. Running pg_resetxlog 
got us back a running database. We're now involved in checking the data.

Thank you to everyone for your help -- especially Tom!
Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] race condition for drop schema cascade?

2004-12-15 Thread Andrew Dunstan
I have seen this failure several times, but not consistently, on the 
buildfarm member otter (Debian/MIPS) and possible on others, and am 
wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.

== pgsql.30167/src/test/regress/regression.diffs 
===
*** ./expected/tablespace.out   Sat Dec 11 13:05:32 2004
--- ./results/tablespace.outSat Dec 11 14:35:24 2004
***
*** 35,37 
--- 35,38 
  NOTICE:  drop cascades to table testschema.foo
  -- Should succeed
  DROP TABLESPACE testspace;
+ ERROR:  tablespace "testspace" is not empty
==
cheers
andrew

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] production server down

2004-12-15 Thread Michael Fuhr
On Wed, Dec 15, 2004 at 11:41:02AM -0800, Joe Conway wrote:

> Just wanted to close the loop for the sake of the list archives. With 
> Tom's xlog dump tool I was able (with a bunch of his help off-list) to 
> identify the needed parameters for pg_resetxlog. Running pg_resetxlog 
> got us back a running database. We're now involved in checking the data.

Any chance you could write up a summary of the thread: what caused
the problem, how you diagnosed it, how you fixed it, and how to
avoid it?  Might make a useful "lessons learned" document.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] bgwriter changes

2004-12-15 Thread Mark Kirkwood

Simon Riggs wrote:
100pct.patch (SR)
Test results to date:
1. Mark Kirkwood ([HACKERS] [Testperf-general] BufferSync and bgwriter)
pgbench 1xCPU 1xDisk shared_buffers=1
showed 8.0RC1 had regressed compared with 7.4.6, but patch improved
performance significantly against 8.0RC1
 

It occurs to me that cranking up the number of transactions (say 
1000->10) and seeing if said regression persists would be 
interesting.  This would give the smoothing effect of the bgwriter (plus 
the ARC) a better chance to shine.

regards
Mark
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] strange regression failure

2004-12-15 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Has anyone got any idea on why I see things like this from time to time? 
> It seems to be intermittent, which is odd.

I have a suspicion that the unexpected result occurs when the client
encoding is UTF8, because psql/mbprint.c's ucs_wcwidth() function goes
belly up on control characters (which it probably shouldn't; for that
matter, that entire file should be removed and replaced with our real
multibyte support...).

What's not apparent is why the failure wouldn't be consistent.  You'd
think the buildfarm would always run the test with the same encoding
selection.  Any idea how that might not happen?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] strange regression failure

2004-12-15 Thread Alvaro Herrera
On Wed, Dec 15, 2004 at 05:20:28PM -0500, Andrew Dunstan wrote:

> Our emails just crossed. You got it. I will have buildfarm force SQL_ASCII.

But then you will lose reports using other encodings ...

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] getting 'order by' working with unicode locale? ICU?

2004-12-15 Thread Tom Lane
Palle Girgensohn <[EMAIL PROTECTED]> writes:
> I'm using Postgresql on FreeBSD, and would like to get "order by" to work 
> with unicode.

What makes you think it doesn't?  Use the right locale and you're set.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] getting 'order by' working with unicode locale? ICU?

2004-12-15 Thread Palle Girgensohn
--On onsdag, december 15, 2004 23.21.13 -0500 Tom Lane <[EMAIL PROTECTED]> 
wrote:

Palle Girgensohn <[EMAIL PROTECTED]> writes:
I'm using Postgresql on FreeBSD, and would like to get "order by" to
work  with unicode.
What makes you think it doesn't?  Use the right locale and you're set.
Not on FreeBSD, since collation is not implemented in unicode locales. One 
way would be to implement it in the OS, of course...

/Palle
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org