(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr
---
v4: roll into one patch, use macros
---
src/interfaces/lib
Hi Greg,
Thanks for the benchmark app, thought I'd pitch in with some ppc
results:
> clz 1.530s
> lookup table 1.720s
> float hack 4.424s
> unrolled 5.280s
> normal 5.369s
POWER5+:
clz 2.046s
lookup table 2.18
Hi Tom,
> That code is not broken as it stands, and doesn't appear to really
> gain anything from the proposed change. Why should we risk any
> portability questions when the code isn't going to get either simpler
> or shorter?
OK, after attempting a macro version of this, we end up with:
#defi
Rather than testing single bits in a loop, change AllocSetFreeIndex to
use the __builtin_clz() function to calculate the chunk index.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on sysbench on PowerPC.
Signed-off-by: Jeremy Kerr
Tom,
> However, I think the whole patch is pretty useless. That code is not
> broken as it stands, and doesn't appear to really gain anything from
> the proposed change. Why should we risk any portability questions
> when the code isn't going to get either simpler or shorter?
This patch "clears
Hi Robert,
> That having been said, Jeremy, you probably want to take a look at
> those comments and I have a few responses to them as well.
OK, thanks for the heads-up.
> following comment:
> > Applied and built cleanly. Regress passes. Trying to hunt down ppc
> > box to see if performance enha
Hi Robert,
> Perhaps we should use macros.
I was trying to avoid macros, as this means we lose type- and syntax-
checking at the call-site, and end up with slightly messier code.
However, I understand that this is probably personal preference for me
:)
How about just 'static' functions? (ie,
Hi, Alvaro,
> Does this work in compilers other than GCC? I think we use some
> kludges to protect against them ... see pg_list.h for the canonical
> example.
As I understand it, we're not using static inlines in pg_list.h to
prevent multiple objects from exporting the same symbols if the
func
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr
---
v3: respin as context
e
can change the code to be implemented as static inlines.
Signed-off-by: Jeremy Kerr
---
src/interfaces/libpq/fe-secure.c | 88 ---
1 file changed, 55 insertions(+), 33 deletions(-)
*** a/src/interfaces/libpq/fe-secure.c
--- b/src/interfaces/libpq/fe-sec
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr
---
src/interfaces/libpq/fe-connect.c | 40 ++
l like a
little more testing though, is there a machine that allows both
SO_NOSIGPIPE and MSG_NOSIGNAL?
v3: respin as context diffs
Again, comments most welcome,
Jeremy
---
Jeremy Kerr (2):
[libpq] rework sigpipe-handling macros
[libpq] Try to avoid manually masking SIGPIPEs on eve
Stephen,
> If the updated function is always faster when the overall string is at
> least, say, 16 characters long,
But that's not the case - the cost of the function (and the speedup from
the previous version) depends on the number of spaces that there are at
the end.
For the new function to
Stephen,
> Is it just the size that matters, or is it when there are few spaces
> at the end?
It's the number of spaces at the end. If we knew this number, then we
wouldn't have to do any comparisons at all :)
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.o
Tom,
> > I've put together some data from a microbenchmark of the bcTrulen
> > function, patched and unpatched.
>
> Uh, where's the data?
If you're after the raw data for a run, I've put it up:
http://ozlabs.org/~jk/projects/db/data/bctruelen.csv
I've also packaged up the quick-and-dirty bench
Hi Stephen,
> What would be really useful would be "best case" and "worst case"
> scenarios.
I've put together some data from a microbenchmark of the bcTrulen
function, patched and unpatched.
As for best-case, if you have a long string of trailing spaces, we can
go through them at theoreticall
Robert,
> I'd still like to know the workload and exact numbers.
From up-thread:
http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
- or were there other details you were looking for?
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make chan
number of compares where possible. Word-size compares
are performed on aligned sections of the string.
Results in a small performance increase; around 1-2% on my POWER6 test
box.
Signed-off-by: Jeremy Kerr
---
Resend: context diff this time
---
src/backend/utils/adt/varchar.c | 24
Hi all,
> Speaking of which, what about some performance numbers?
OK, benchmarks done:
http://ozlabs.org/~jk/projects/db/data/postgres.bcTruelen/
Summary: small increase in performance (~1-2% on my machine), at about
1.5 standard deviations from the mean. Profiles show a decent drop in
hits
Hi Tom,
> Speaking of which, what about some performance numbers? Like Heikki,
> I'm quite suspicious of whether there is any real-world gain to be
> had from this approach.
Will send numbers tomorrow, with the reworked patch.
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hac
Hi all,
> That's correct. To check the alignment you would have to look at the
> actual pointer. I would suggest using the existing macros to handle
> alignment. Hm, though the only one I see offhand which is relevant is
> the moderately silly PointerIsAligned(). Still it would make the code
> cle
Robert,
> This looks very non-portable to me.
Unsurprisingly, I'm new to postgres hacking and the large number of
supported platforms :)
I was considering something like:
unsigned int spaces;
const unsigned int wordsize = sizeof(unsigned int);
memset(&spaces, ' ', word
Signed-off-by: Jeremy Kerr
---
src/backend/utils/adt/varchar.c | 24 +---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 5f3c658..6889dff 100644
--- a/src/backend/utils/adt/varchar.c
Alvaro,
> Maybe bcTruelen could be optimized to step on one word at a time
> (perhaps by using XOR against a precomputed word filled with ' '),
> instead of one byte at a time ...
I have a patch for this, will send soon.
Regards,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@p
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr
---
v2: initialise optval
---
src/interfaces/libpq/fe-connect.c |
Marko,
> optval seems used without initialization.
Dang, I was checking for the SO_NOSIGPIPE flag, and didn't check for
optval. New patch coming...
Cheers,
Jeremy
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresq
(SO_NOSIGPIPE) and the MSG_NOSIGNAL flag to send().
This change attempts to use these if they're available at compile-
and run-time. If not, we drop back to manipulating the signal mask as
before.
Signed-off-by: Jeremy Kerr
---
src/interfaces/libpq/fe-connect.c | 39 +
e
can change the code to be implemented as static inlines.
Signed-off-by: Jeremy Kerr
---
src/interfaces/libpq/fe-secure.c | 88 ---
1 file changed, 55 insertions(+), 33 deletions(-)
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-sec
l like a
little more testing though, is there a machine that allows both
SO_NOSIGPIPE and MSG_NOSIGNAL?
Again, comments most welcome,
Jeremy
---
Jeremy Kerr (2):
[libpq] rework sigpipe-handling macros
[libpq] Try to avoid manually masking SIGPIPEs on every send()
--
Sent via p
Hi Atsushi,
> In my benchmark program, it is a little different performance
> in fls implementation and inline asm implementation.
> However, the result of a pgbench is almost the same improvement.
>
> Here is the result of my benchmark.
Excellent, thank you for getting this extra set of numbers.
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr
---
v2: prevent
Hi Atsushi,
> If size <= 8, fls((size - 1) >> ALLOC_MINBITS) is fls(0).
> The result of fls(0) is undefined.
Yep, got caught out by this because my previous fls() supported zero.
> I think we have to never call fls(0) from AllocSetFreeIndex().
> My proposal code:
>
> if (size > (1 << ALLOC_
Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.
This requires a new check for __builtin_clz in the configure script.
Results in a ~2% performance increase on PowerPC.
Signed-off-by: Jeremy Kerr
---
configu
Hi Tom,
> The other thing I didn't like about the patch was the assumption that
> it's okay to have a "static inline" function in a header. You can
> get away with that in gcc but *not* in other compilers.
Gee, you user-space guys have it tough! :D
Point taken, will rework.
> Look at the exist
Florian,
> > +#if defined(__GNUC__) && \
> > + (defined(__ppc__) || defined(__powerpc__) || \
> > +defined(__ppc64__) || defined (__powerpc64__))
>
> If you require GCC anyway, you can use __builtin_clz instead.
> (It's been available since GCC 4.1 at least.)
Because now we have to test the
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr
---
v2: only use inline asm with gcc
---
src/include/utils/bitops.h | 53 +
1 file changed, 53
Results in a ~2% performance increase by using the powerpc fls()
implementation.
Signed-off-by: Jeremy Kerr
---
src/backend/utils/mmgr/aset.c |8 ++--
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index
Add a utility header for simple bit operatios - bitops.h.
At present, just contains the fls() (find last set bit) function.
Signed-off-by: Jeremy Kerr
---
src/include/utils/bitops.h | 52 +
1 file changed, 52 insertions(+)
diff --git a/src
Hi,
> I made a faster version of AllocSetFreeIndex for x86 architecture.
Neat, I have a version for PowerPC too.
In order to prevent writing multiple copies of AllocSetFreeIndex, I
propose that we add a fls() function ("find last set"); this can be
defined in an architecture-independent manner
Tom,
> A feature that is exercised via setsockopt is probably fairly safe,
> since you can check for failure of the setsockopt call and then do
> it the old way. MSG_NOSIGNAL is a recv() flag, no?
It's a flag to send().
> The question is whether you could expect that the recv() would fail if
>
Tom,
> The consideration is that the application fails completely on server
> disconnect (because it gets SIGPIPE'd). This was long ago deemed
> unacceptable, and we aren't likely to change our opinion on that.
OK, understood. I'm guessing MSG_NOSIGNAL on the send() isn't portable
enough here?
this change I see the following performance improvement
during a sysbench OLTP run:
http://ozlabs.org/~jk/projects/db/data/sigpipe-perf.png
load: sysbench --test=oltp --oltp-read-only=on, connecting locally,
machine: POWER6, 64-way, 4.2GHz
Comments most welcome,
Jeremy
---
Jeremy Kerr (1):
If the connection isn't over SSL, there's no need to do the disable.
This effectively halves the number of syscalls performed by libpq when
SSL is not in use.
Signed-off-by: Jeremy Kerr
---
src/interfaces/libpq/fe-secure.c |7 +++
1 file changed, 3 insertions(+), 4 deletion
43 matches
Mail list logo