Tom Lane wrote:
Not really: it only solves the problem *if you change the application*,
which is IMHO not acceptable. In particular, why should a non-threaded
app expect to have to change to deal with this issue? But we can't
safely build a thread-safe libpq.so for general use if it breaks
non-th
Tom Lane wrote:
Bruce Momjian <[EMAIL PROTECTED]> writes:
His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask. Seems
we could also check errno for SIGPIPE rather than calling sigpending.
He has a concern about an application that already blocked SIGPIPE and
has a pending SI
Bruce Momjian wrote:
Comments? This seems like our only solution.
This would be a transparent solution. Another approach would be:
- Use the old 7.3 approach by default. This means perfect backward
compatibility for single-threaded apps and broken multithreaded apps.
- Add a new PQinitDB(int d
[EMAIL PROTECTED] wrote a few months ago:
PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be don
[EMAIL PROTECTED] wrote:
If the memset
bypasses the cache then the following access will cause a cache line
miss, which can be so slow that using the faster memset can result in a
net performance loss.
Could you suggest some structs to test? If I get your meaning, I would make a loop that
Marc Colosimo wrote:
Oops, I used the same setting as in the old hacking message (-O2, gcc
3.3). If I understand what you are saying, then it turns out yes, PG's
MemSet is faster for smaller blocksizes (see below, between 32 and
64). I just replaced the whole MemSet with memset and it is not ver
Josh Berkus wrote:
Gaetano,
I knew there was an evaluation on the futex vs spinlock,
and Josh Berkus on IRC told me that there was only a 20%
performance increase, is this increase to throw away ?
Before we get totally off track here
I evaluated futexes strictly as an attempt to solve
[EMAIL PROTECTED] wrote:
Something to think about:
if you run PostgreSQL with fsync on, but you use the hardware write cache
on your disk drives, how likely are you to lose data? Obviously, this is a
fairly limited problem, as it only applies to power down (which you can
control) or power loss wher
[EMAIL PROTECTED] wrote:
Tom Lane wrote
NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
advantage of the no logging optimization without turning off PITR system
wide. (Just as this is possible in Oracle and Teradata).
Isn't this in direct conflict with your opinion a
Gaetano Mendola wrote:
a1) If exist check that is a 16MB file ( the request can
~arrive during the copy ),
I think this will fail under windows: "copy" first sets the file size
and then transfers the data. I wouldn't rule out that some Unices use
the same implementation.
[EMAIL PROTECTED] wrote:
I have been considering a full sweep in my test lab off client time later on.
ext2, ext3, jfs, xfs, and ReiserFS, fsync on with fdatasync or open_sync,
and fsync off.
Before you start: double check that the disks are not lying:
At least the suse 2.4 kernel send cache flu
Tom Lane wrote:
[EMAIL PROTECTED] writes:
The improvements were REALLY astounding, and I would like to know if other
Linux users see this performance increase, I mean, it is almost 8~10 times
faster than using fsync.
Furthermore, it seems to also have the added benefit of reducing the I/O
storm
Andreas Pflug wrote:
Tom Lane wrote:
Do we have a TODO for allowing users to
force switching to a new WAL file segment?
Together with PITR, this might make sense?
Another idea:
Has anyone tried to put the WAL segment directory on a cluster
filesystem and use that for cold (perhaps even hot) failo
Christopher Browne wrote:
The "fix" for this problem is to rewrite all of your applications so
that they become conscious of which bits of memory they're using so
they can tune their own behaviour. This, of course, requires
discarding useful notions such as "virtual memory" that are _assumed_
by m
[EMAIL PROTECTED] wrote:
What is the recommended way to create mutex objects (CreateMutex) from
Win32 libraries? There must be a clean way like there is in pthreads.
A mutex is inherently a global object. CreateMutex(NULL, FALSE, NULL) will
return a handle to an unowned mutex.
That's not t
Bruce Momjian wrote:
The only downside to removal is that folks without symlinks (I believe
Win32 only) will loose that functionality with nothing to replace it.
However, I think the clarity of removing it is worth it. Also, I think
someone had a special way to do symlinks on Win32 and we should
Gregory Stark wrote:
This patch also looks relevant to Postgres for two reasons.
This part seems like it might expose some bugs that otherwise might have
remained hidden:
This affects I/O scheduling potentially quite significantly. It is no
longer the case that the kernel wi
Diego Montenegro wrote:
Hello all,
Can anyone point me to where in the code does Postgres Flush all the
Data to disk???
When XLogFlush is called, it only flushes the XLOG to disk, right? Does
the entire Data get flushed at the same time as the Log?
in src/backend/storage/smgr/md.c, mdsync():
[EMAIL PROTECTED] wrote:
Compare file sync methods with one 8k write:
(o_dsync unavailable)
open o_sync, write 6.270724
write, fdatasync13.275225
write, fsync, 13.359847
Odd. Which filesystem, which kernel? It seems fdatasync is broken and
Tom Lane wrote:
[EMAIL PROTECTED] writes:
I could certainly do some testing if you want to see how DBT-2 does.
Just tell me what to do. ;)
Just do some runs that are identical except for the wal_sync_method
setting. Note that this should not have any impact on SELECT
performance, only ins
Yusuf Goolamabbas wrote:
I sent this to Bruce but forgot to cc pgsql-hackers, The patches are
likely to go into 2.6.6. People interested in extremely safe fsync
writes should also follow the IDE barrier thread and the true fsync() in
Linux on IDE thread
Actually the most interesting part of the
Marty Scholes wrote:
2. Put them on an actual (or mirrored actual) spindle
Pros:
* Keeps WAL and data file I/O separate
Cons:
* All of the non array drives are still slower than the array
Are you sure this is a problem? The dbt-2 benchmarks from osdl run on an
8-way Intel computer with several ra
Bruce Momjian wrote:
Which basically shows one fsync, no O_SYNC's, and setting of the flag
only for klog reads.
Which sysklogd do you look at? The version from RedHat 9 contains this
block:
/*
* Crack a configuration file line
*/
void cfline(line, f)
char *line;
register str
Bruce Momjian wrote:
How can we test if libpq needs to call that? Seems that is an issue
whether we are threaded or not, no?
I think it's always an issue: in the non-threaded case, it's just not
fatal. At least some openssl init functions are protected with "if
(done) return; done = 1;", and
Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
You are too fast: the patch was a proof of concept, not really tested
(actually quite buggy).
Attached
Bruce Momjian wrote:
What killed the idea of doing ssl or kerberos locking inside libpq was
that there was no way to be sure that outside code didn't also access
those routines.
A callback based implementation can handle that: libpq has a default
implementation for apps that do not use openssl or
zohn_ming wu wrote:
swap_free: Bad swap file entry 0004
Do you use ECC memory, is ECC enabled in the BIOS [and does it work -
some vendors lie about ECC support]?
I would bet that it's a soft memory error: means not used. One
bit differs, and the kernel complains about the invalid
Bruce Momjian wrote:
However, we really have two types of function tested.
The first, strerror, can be thread safe by using thread-local storage
_or_ by returning pointers to static strings. The other two function
tests require thread-local storage to be thread-safe.
You are completely ignori
Greg Stark wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
That means
open();
write();
sync();
could succeed, but the data is not stored on disk, correct?
That would be true on any filesystem. Unless you throw an fsync() call in.
The checkpoint code uses sync() rig
Bruce Momjian wrote:
Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.
Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking a
Greg Stark wrote:
I do know that AFS returns quota failures on close. This was unusual enough
that when AFS was deployed at school unix tools failed left and right over
precisely this issue. Though it mostly just meant they returned the wrong exit
status.
That means
open();
write();
sync(
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
What are the chances for Win64 support? sizeof(unsigned long) remains 4,
sizeof(void*) is 8.
If you can tell me what type Datum should be (unsigned long long
maybe?), we could probably handle that.
Probably uintptr_t: That
Tom Lane wrote:
Claudio Natoli <[EMAIL PROTECTED]> writes:
Or, maybe we'll just use the tas() implementation that already exists for
__i386__/__x86_64__ in s_lock.h. How did I miss that?
Move along. Nothing to see here.
Actually, I was expecting you to complain that the s_lock.h coding is
Tom Lane wrote:
Wait a minute. I am *not* buying into any proposal that we need to
support ENABLE_THREAD_SAFETY on machines where libc is not thread-safe.
We have other things to do than adopt an open-ended commitment to work
around threading bugs on obsolete platforms. I don't believe that any
Tom Lane wrote:
Personally I find diff -u format completely unreadable :-(. Send
"diff -c" if you want useful commentary.
diff -c is attached. I've removed the signal changes, they are
unrelated. I'll resent them separately.
--
Manfred
Index: src/interfaces/libpq/libpq-fe.h
==
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
But what about kerberos: I'm a bit reluctant to add a forth mutex: what
if kerberos calls gethostbyname or getpwuid internally?
Wouldn't help anyway, if some other part of the app also calls kerberos.
That's
From fe-secure.c:
/*
* Indicates whether the current thread is in send()
* For use by SIGPIPE signal handlers; they should
* ignore SIGPIPE when libpq is in send(). This means
* that the backend has died unexpectedly.
*/
pqbool
PQinSend(void)
{
#ifdef ENABLE_THREAD_SAFET
libpq needs additional changes for complete thread safety:
- openssl needs different initialization.
- kerberos is not thread safe.
- functions such as gethostbyname are not thread safe, and could be used
by kerberos. Right now protected with a libpq specific mutex.
- dito for getpwuid and stderr
Bruce Momjian wrote:
[EMAIL PROTECTED] wrote:
Hi Manfred,
Just wanted to let you know I tried your patch-spinlock-i386 patch on
our STP (our automated test platform) 8-way systems and saw a 5.5%
improvement with Pentium III Xeons. If you want to see those results:
PostgreSQL 7.4.1:
htt
Jan Wieck wrote:
Moving the Cache Directory Block (cdb) on a hit to the MRU position of
the appropriate queue "is the bookkeeping" of this strategy. The whole
algorithm is based on it, and I don't see yet how to avoid that without
opening a huge can of worms that look like deadlocks. But I'll thin
Bruce Momjian wrote:
>>Anyone see an attack path here?
>>
>>
>
>Should we have one lock per hash bucket rather than one for the entire
>hash?
>
>
That's the simple part. The problem is the aging strategy: we need a
strategy that doesn't rely on a global list that's updated after every
lookup
[EMAIL PROTECTED] wrote:
Hi Manfred,
I'm using unixware 7 but couldn't compile your source with native cc, I
had to compile it with gcc.
here are the results:
Thanks. The test app compares the time needed for three different short
loops: a loop with six empty function calls, a loop with six f
Josh Berkus wrote:
Initial debug logging of a test on one Xeon system demonstrating this issue
showed a very large number of unattributed semop() calls. We are still
following up on this.
Postgres has it's own user space spinlock and semaphore implementation.
Both fall back to semop if ther
Bruce Momjian wrote:
write 0.000360
write & fsync 0.001391
write, close & fsync 0.001308
open o_fsync, write0.000924
That's 1 milliseconds vs. 1.3 milliseconds. Neither value is realistic -
I guess the hw cache on and the os doesn't issue cache flush command
Tom Lane wrote:
Greg Stark <[EMAIL PROTECTED]> writes:
Treating pointers as integers is technically nonportable but
realistically you would be pretty hard pressed to find any
architecture anyone runs postgres on where there isn't some integer
datatype that you can cast both directions from poin
Hi,
I've searched through libpq and looked for global or static variables as
indicators of non-threadsafe code. I found:
- Win32 and BeOS: there is a global "ioctlsocket_ret variable, but it
seems to be a dummy variable that is always discarded.
- pg_krb4_init(): Are the kerberos libraries threa
Greg Stark wrote:
I'm assuming fsync syncs writes issued by other processes on the same file,
which isn't necessarily true though.
It was already pointed out that we can't rely on that assumption.
So the NetBSD and Sun developers I checked with both asserted fsync does in
fact guarante
Tom Lane wrote:
Manfred's idea is interesting but AFAICS completely unimplementable
in any portable fashion. You'd have to have hooks into the kernel.
I thought about outstanding operations from postgres - I don't know
enough about the buffer layer if it's possible to keep a counter of the
c
Jan Wieck wrote:
_Vacuum page delay_:
Tom Lane's napping during vacuums with another tuning option. I
replaced the usleep() call with a PG_DELAY(msec) macro in miscadmin.h,
which does use select(2) instead. That should address the possible
portability problems.
What about skipping the delay if
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
For multithreaded apps, this is not possible: sigaction is per process.
Thus the calling application must handle the SIGPIPE signals for libpq -
either by blocking or ignoring them. We are still discussing the exact
API. Prob
[EMAIL PROTECTED] wrote:
On 1 Nov, Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
signal handlers are a process property, not a thread property - that
code is broken for multi-threaded apps.
Yeah, that's been mentioned before, but I don't see any way
Tom Lane wrote:
It strikes me that sigpipe handling will be a global affair in any
particular application --- it's unlikely that it would be correct for
some PG connections and wrong for others. So one possibility is to make
the control variable be global (static) and thus it could be set before
Neil Conway wrote:
The present Linux implementation doesn't do this, AFAICS -- all it does
it increase the readahead for this file:
AFAIK Linux uses a modified LRU that automatically puts pages that were
touched only once at a lower priority than frequently accessed pages.
Neil: what about ca
AgentM wrote:
That wouldn't offer a solution for people who use SIGPIPE for other
things during the lifetime of the program (after creating the
connection) and if a SIGPIPE handler is called due to the connection,
the handler won't be expecting the source, and polling signal for
state is essen
Mark Wong wrote:
On Sat, Nov 01, 2003 at 10:29:34PM +0100, Manfred Spraul wrote:
Mark Wong wrote:
Yeah, my dbt2 applications are multithreaded.
Do you need SIGPIPE delivery in your app? If no, could you try what
happens if you apply the attached patch to postgres, and perform
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
What about an option to skip the sigaction calls for apps that can
handle SIGPIPE?
If the app is ignoring SIGPIPE globally, then our calls will have no
effect anyway.
Wrong. From the opengroup manpage:
<<
SIG_IGN - i
Mark Wong wrote:
Yeah, my dbt2 applications are multithreaded.
Do you need SIGPIPE delivery in your app? If no, could you try what
happens if you apply the attached patch to postgres, and perform the
signal(SIGPIPE, SIG_IGN);
once in your dbt2 app?
--
Manfred
--- pgsql.orig/src/interfac
Tom Lane wrote:
A bigger objection is that we couldn't get libssl to use it (AFAIK).
The flag really needs to be settable on the socket (eg, via fcntl),
not per-send.
It's a per-send flag, it's not possible to force it on with a fcntl :-(
What about an option to skip the sigaction calls for apps
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
signal handlers are a process property, not a thread property - that
code is broken for multi-threaded apps.
Yeah, that's been mentioned before, but I don't see any way around it.
Do not handle SIGPIPE on multith
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
Is that really necessary?
Unfortunately, yes. libpq can't change the global setting of SIGPIPE
without breaking the surrounding application, but we don't want to crash
the app if the server connection has disappea
I've straced
$ pgbench -c 5 -s 6 -t 1000
total 157k syscalls, 70k of them are rt_sigaction(SIGPIPE):
1754 poll([{fd=3, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
1754 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
1754 send(3, "\0\0\0%\0\3\0\0user\0postgres\0database\0t"..., 37,
Tom Lane wrote:
[EMAIL PROTECTED] writes:
7.4beta5 offers more throughput. One significant difference I see is in
the oprofile for the database. For the additional 7% increase in the
metric, there are about 32% less ticks in SearchCatCache.
Hmm. I have been profiling PG for some years n
[EMAIL PROTECTED] wrote:
Results from 7.4beta5
http://developer.osdl.org/markw/dbt2-pgsql/188/
- metric 1446.01
CPU: P4 / Xeon with 2 hyper-threads, speed 1497.51 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a
unit mask of 0x01 (count c
Greg Stark wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
One problem for WAL is that O_DIRECT would disable the write cache -
each operation would block until the data arrived on disk, and that might block
other backends that try to access WALWriteLock.
Perhaps a dedicated backen
Tom Lane wrote:
Not for WAL --- we never read the WAL at all in normal operation. (If
it works for writes, then we would want to use it for writing WAL, but
that's not apparent from what Christopher quoted.)
At least under Linux, it works for writes. Oracle uses O_DIRECT to
access (both read and
Andrew Dunstan wrote:
I have wondered (somewhat fruitlessly) for several years about the
possibilities of special purpose lightweight file systems that could
relax some of the assumptions and checks used in general purpose file
systems. Such a thing might provide most of the benefits of a
"dat
Andrew Dunstan wrote:
Bruce Momjian wrote:
This seems to be a bug in gcc-3.3.1. -fstrict-aliasing is enabled by
-O2 or higher optimization in gcc 3.3.1.
According to the C standard, it's illegal to access a data with a
pointer of the wrong type. The only exception is "char *".
This can be used
scott.marlowe wrote:
OK, I've done some more testing on our IDE drive machine.
First, some background. The hard drives we're using are Seagate
drives, model number ST380023A. Firmware version is 3.33. The machine
they are in is running RH9. The setup string I'm feeding them on startup
righ
Peter Eisentraut wrote:
Tom Lane writes:
No. The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties. In
multi-master, if you fail you know it before you have told the client
his data is committed.
I have a book here w
Tom Lane wrote:
Claudio Natoli <[EMAIL PROTECTED]> writes:
How are you dealing with the issue of wanting some static variables to
be per-thread and others not?
To be perfectly honest, I'm still trying to familiarize myself with the code
sufficiently well so that I can tell which varia
Tom Lane wrote:
Manfred Spraul <[EMAIL PROTECTED]> writes:
... Initially I tried to increase MAX_ALIGNOF to 16, but
the result didn't work:
You would need to do a full recompile and initdb to alter MAX_ALIGNOF.
I think I did that, but it still failed. 7.4cvs works, I
Tom Lane wrote:
Oh, pgbench ;-). Are you aware that you need a "scale factor" (-s)
larger than the number of clients to avoid unreasonable levels of
contention in pgbench?
No. What about adding a few reasonable examples to README? I've switched
to "pgbench -c 10 -s 11 -t 1000 test". Is that ok?
Tom Lane wrote:
AFAIK, semops are not done unless we actually have to yield the
processor, so saving a syscall or two in that path doesn't sound like a
big win. I'd be more interested in asking why you're seeing long series
of semops in the first place.
Virtually all semops yield the processor
I've noticed that postgres strace output contains long groups of
setitimer/semop/setitimer.
Just FYI: semtimedop is a special syscalls that implements a semop with
a timeout. It was added just for the purpose of avoiding the setitimer
calls.
I know that it's supported by Solaris and recent Linux
Hi,
When analyzing the kernel profile from osdl dbt benchmarks, I noticed
that around 50% of the kernel time is spent in __copy_user_intel.
http://khack.osdl.org/stp/280060/profile/
This function is one of two functions that does the actual memory copy
from/to kernel space to/from user space.
U
Manfred Spraul wrote:
Is the Itanium tas implementation correct? I think it should be
xchg4.aqv instead of just xchg4 - as far as I know a normal atomic
exchange is is not a memory barrier on Itanium. At least the Linux
kernel version contains "cmpxchg4.aqv".
Sorry for the noise,
Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian <[EMAIL PROTECTED]> writes:
He is uncomfortable with the port/*.h changes at this point, so it seems
I am going to have to add Itanium/Opteron tests to most of those files.
Why don't you try to put together a proposed patch of that
Jeroen Ruigrok/asmodai wrote:
-On [20030908 23:52], Peter Eisentraut ([EMAIL PROTECTED]) wrote:
Why would FreeBSD have a "library of thread-safe libc functions" (libc_r)
if the functions weren't thread-safe? I think the test is faulty.
A thread-safe library has a per-thread errno value (i
Another question:
Is it possible to apply patches to postgresql before a DBT-2 run, or is
only patching the kernel supported?
--
Manfred
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
[EMAIL PROTECTED] wrote:
http://developer.osdl.org/markw/44/
I threw together (kind of sloppily) a web page of the data I was
starting to collect for our DBT-2 workload (TPC-C derivative) on
PostgreSQL 7.3.4. Keep in mind not much database tuning has been done
yet. Feel free to ask any questions
Bruce Momjian wrote:
Shridhar Daithankar wrote:
Hi all,
Following is from Documentation/vm/overcommit-accounting
-
2 - (NEW) strict overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable percentage (default is 50) of physical
Bruce Momjian wrote:
if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
CFLAGS="$CFLAGS -g"
fi
+
+ /* Compile AMD Opteron using gcc in 64-bit mode */
+ if test "$GCC" = yes; then
+ case $host in
+ ia64-*) CFLAGS="$CFLAGS -m64"
+LDFLAGS="$LDFLAGS -melf_x86_64"
Shridhar Daithankar wrote:
2) Native freeBSD threads
pthread.h in /usr/include and lc_r
Do you know if FreeBSD supports pthread_rwlock with
PTHREAD_PROCESS_SHARED? I'm trying to replace the LWLocks with
pthread_rwlocks.
What about other Unices?
--
Manfred
---(end o
83 matches
Mail list logo