> Thanks! I can't think of anything else to worry about with regards to
> that version, so I have committed it.
>
Thanks, Robert. And thanks everyone for contributing to this patch.
--
Best regards,
Aleksander Alekseev
http://eax.me/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@po
On Wed, Mar 23, 2016 at 5:49 AM, Aleksander Alekseev
wrote:
> I still believe that optimizing 1% blindly without considering possible
> side effects this optimization can bring (other data alignment, some
> additional machine instructions - just to name a few) and having no
> way to measure these
> > I have a strong feeling that we are just wasting our time here.
>
> That is possible. However, I would like it if you would give me the
> benefit of the doubt and assume that, if I seem to be more cautious
> than you would be were you a committer, there might possibly be some
> good reasons f
On Mon, Mar 21, 2016 at 4:48 AM, Aleksander Alekseev
wrote:
>> This is the point where I think I am missing something about patch.
>> As far as I can understand, it uses the same freelist index
>> (freelist_idx) for allocating and putting back the entry, so I think
>> the chance of increment in on
> This is the point where I think I am missing something about patch.
> As far as I can understand, it uses the same freelist index
> (freelist_idx) for allocating and putting back the entry, so I think
> the chance of increment in one list and decrement in another is there
> when the value of free
On Mon, Mar 21, 2016 at 5:12 AM, Robert Haas wrote:
>
> On Sun, Mar 20, 2016 at 3:01 AM, Amit Kapila
wrote:
> > On Sat, Mar 19, 2016 at 7:02 PM, Robert Haas
wrote:
> >> On Sat, Mar 19, 2016 at 12:28 AM, Amit Kapila
> >> wrote:
> >> > Won't in theory, without patch as well nentries can overflow
On Sun, Mar 20, 2016 at 3:01 AM, Amit Kapila wrote:
> On Sat, Mar 19, 2016 at 7:02 PM, Robert Haas wrote:
>> On Sat, Mar 19, 2016 at 12:28 AM, Amit Kapila
>> wrote:
>> > Won't in theory, without patch as well nentries can overflow after
>> > running
>> > for very long time? I think with patch i
On Sat, Mar 19, 2016 at 7:02 PM, Robert Haas wrote:
>
> On Sat, Mar 19, 2016 at 12:28 AM, Amit Kapila
wrote:
> > Won't in theory, without patch as well nentries can overflow after
running
> > for very long time? I think with patch it is more prone to overflow
because
> > we start borrowing from
On Sat, Mar 19, 2016 at 12:28 AM, Amit Kapila wrote:
> Won't in theory, without patch as well nentries can overflow after running
> for very long time? I think with patch it is more prone to overflow because
> we start borrowing from other free lists as well.
Uh, I don't think so. Without the p
On Sat, Mar 19, 2016 at 1:41 AM, Robert Haas wrote:
>
> On Tue, Mar 1, 2016 at 9:43 AM, Aleksander Alekseev
> wrote:
> >
> > So answering your question - it turned out that we _can't_ reduce
> > NUM_FREELISTS this way.
>
> That's perplexing. I would have expected that with all of the mutexes
> p
On Tue, Mar 1, 2016 at 9:43 AM, Aleksander Alekseev
wrote:
>> I am not sure, if this is exactly what has been suggested by Robert,
>> so it is not straightforward to see if his suggestion can allow us to
>> use NUM_FREELISTS as 8 rather than 32. I think instead of trying to
>> use FREELISTBUFF, w
On Thu, Mar 3, 2016 at 1:50 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> > Won't it always use the same freelist to remove and add the entry from
> > freelist as for both cases it will calculate the freelist_idx in same
> > way?
>
> No. If "our" freelist is empty when we try to re
> Won't it always use the same freelist to remove and add the entry from
> freelist as for both cases it will calculate the freelist_idx in same
> way?
No. If "our" freelist is empty when we try to remove an item from it we
borrow item from another freelist. Then this borrowed item will be
returne
On Tue, Mar 1, 2016 at 8:13 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>
> Hello, Amit
>
> > I am not sure, if this is exactly what has been suggested by Robert,
> > so it is not straightforward to see if his suggestion can allow us to
> > use NUM_FREELISTS as 8 rather than 32. I
Hello, Amit
> I am not sure, if this is exactly what has been suggested by Robert,
> so it is not straightforward to see if his suggestion can allow us to
> use NUM_FREELISTS as 8 rather than 32. I think instead of trying to
> use FREELISTBUFF, why not do it as Robert has suggested and try with
>
On Fri, Feb 12, 2016 at 9:04 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> Hello, Robert
>
> > It also strikes me that it's probably quite likely that slock_t
> > mutex[NUM_FREELISTS] is a poor way to lay out this data in memory.
> > For example, on a system where slock_t is just o
Hello, Robert
> It also strikes me that it's probably quite likely that slock_t
> mutex[NUM_FREELISTS] is a poor way to lay out this data in memory.
> For example, on a system where slock_t is just one byte, most likely
> all of those mutexes are going to be in the same cache line, which
> means y
On Thu, Feb 11, 2016 at 3:26 PM, Robert Haas wrote:
> On Thu, Feb 11, 2016 at 9:53 AM, Robert Haas wrote:
>> The fact that InitLocks() doesn't do this has been discussed before
>> and there's no consensus on changing it. It is, at any rate, a
>> separate issue. I'll go through the rest of this
On Thu, Feb 11, 2016 at 9:53 AM, Robert Haas wrote:
> The fact that InitLocks() doesn't do this has been discussed before
> and there's no consensus on changing it. It is, at any rate, a
> separate issue. I'll go through the rest of this patch again now.
I did a little bit of cleanup on this pa
On Thu, Feb 11, 2016 at 9:11 AM, Aleksander Alekseev
wrote:
>> Thanks, this looks way better. Some more comments:
>>
>> - I don't think there's any good reason for this patch to change the
>> calling convention for ShmemInitHash(). Maybe that's a good idea and
>> maybe it isn't, but it's a separ
Hello, Robert
> Thanks, this looks way better. Some more comments:
>
> - I don't think there's any good reason for this patch to change the
> calling convention for ShmemInitHash(). Maybe that's a good idea and
> maybe it isn't, but it's a separate issue from what this patch is
> doing, and if
On Wed, Feb 10, 2016 at 3:24 AM, Aleksander Alekseev
wrote:
>> Basically, the burden for you to impose a new coding rule on everybody
>> who uses shared hash tables in the future is very high.
>
> I fixed an issue you described. Number of spinlocks doesn't depend of
> NUM_LOCK_PARTITIONS anymore a
Hello, Robert
> Basically, the burden for you to impose a new coding rule on everybody
> who uses shared hash tables in the future is very high.
I fixed an issue you described. Number of spinlocks doesn't depend of
NUM_LOCK_PARTITIONS anymore and could be specified for each hash table
on a callin
I've closed this as returned-with-feedback. Please resubmit once you
have found answers to the questions posed; from the submitted benchmark
numbers this looks very exciting, but it needs a bit more work. You
don't necessarily have to agree with everything Robert says, but you
need to have well r
On Mon, Feb 8, 2016 at 5:39 AM, Aleksander Alekseev
wrote:
>> So: do we have clear evidence that we need 128 partitions here, or
>> might, say, 16 work just fine?
>
> Yes, this number of partitions was chosen based on this benchmark (see
> "spinlock array" column):
>
> http://www.postgresql.org/me
Hello, Robert.
> So: do we have clear evidence that we need 128 partitions here, or
> might, say, 16 work just fine?
Yes, this number of partitions was chosen based on this benchmark (see
"spinlock array" column):
http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu
In fact we c
On Thu, Jan 21, 2016 at 9:03 AM, Anastasia Lubennikova
wrote:
> First of all, why not merge both patches into one? They aren't too big
> anyway.
So looking over this patch, it institutes a new coding rule that all
shared hash tables must have the same number of partitions, and that
number is hard
On Wed, Jan 27, 2016 at 5:15 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> Most likely HASHHDR.mutex is not only bottleneck in your case so my
> patch doesn't help much. Unfortunately I don't have access to any
> POWER8 server so I can't investigate this issue. I suggest to use a
>
> This comment certainly requires some changes.
Fixed.
> BTW, could you explain why init_table_size was two times less than
> max_table_size?
I have no clue. My best guess is that it was a reasonable thing to do in
the past. Then somebody changed a code and now there is little reason
to use ini
22.01.2016 13:48, Aleksander Alekseev:
Then, this thread became too tangled. I think it's worth to write a
new message with the patch, the test script, some results and brief
overview of how does it really works. It will make following review
much easier.
Sure.
HASHHDR represents a hash table
> > This patch affects header files. By any chance didn't you forget to
> > run `make clean` after applying it? As we discussed above, when you
> > change .h files autotools doesn't rebuild dependent .c files:
> >
>
> Yes, actually i always compile using "make clean;make -j20; make
> install" If y
On Fri, Jan 22, 2016 at 3:44 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
> This patch affects header files. By any chance didn't you forget to run
> `make clean` after applying it? As we discussed above, when you
> change .h files autotools doesn't rebuild dependent .c files:
>
Ye
Hi,
> First of all, why not merge both patches into one? They aren't too
> big anyway.
Agree.
> I think comments should be changed, to be more informative here.
> Add a comment here too.
> Maybe you should explain this magic number 7 in the comment above?
Done.
> Then, this thread became too
This patch affects header files. By any chance didn't you forget to run
`make clean` after applying it? As we discussed above, when you
change .h files autotools doesn't rebuild dependent .c files:
http://www.postgresql.org/message-id/caf4au4y4mtapp2u3ugtbqd_z0chesjwfnavrgk4umfcwa4e...@mail.gmail.
30.12.2015 16:01, Aleksander Alekseev:
Here is a clean version of the patch. Step 1 is an optimization. Step 2
refactors this:
HTAB *
ShmemInitHash(const char *name, /* table string name for shmem index */
- long init_size, /* initial table size */
+ long init_size,
On Tue, Jan 12, 2016 at 1:50 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>
> increasing number of lock partitions (see columns "no locks", "lwlock"
> and "spinlock array"). Previously it couldn't help us (see "master"
> column) because of a bottleneck.
>
> If you have any other que
Andres Freund wrote:
> On 2016-01-11 21:16:42 -0300, Alvaro Herrera wrote:
> > Andres, Robert, are you still reviewing this patch?
>
> I don't think I've signed to - or actually did - reviewed this path.
Well, maybe not as such, but you replied to the thread several times,
which is why I asked.
On 2016-01-11 21:16:42 -0300, Alvaro Herrera wrote:
> Andres, Robert, are you still reviewing this patch?
I don't think I've signed to - or actually did - reviewed this path.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your
Hello, Álvaro
> So you were saying some posts upthread that the new hash tables would
> "borrow" AN elements from the freelist then put AN elements back when
> too many got unused. Is that idea embodied in this patch?
Right. It didn't satisfy "use all memory" requirement anyway. It's
better to s
Andres, Robert, are you still reviewing this patch?
Aleksander Alekseev wrote:
> Here is a clean version of the patch. Step 1 is an optimization. Step 2
> refactors this:
>
> HTAB *
> ShmemInitHash(const char *name, /* table string name for shmem index */
> - long init_size, /* in
On Wed, Dec 30, 2015 at 1:10 PM, Oleg Bartunov wrote:
>
>
>
> On Wed, Dec 30, 2015 at 5:44 PM, Andres Freund wrote:
>>
>> On 2015-12-30 11:37:19 -0300, Alvaro Herrera wrote:
>> > Aleksander Alekseev wrote:
>> >
>> > > Here is a funny thing - benchmark results I shared 22.12.2015 are
wrong
>> > >
On 2015-12-30 10:49:27 -0500, Tom Lane wrote:
> > On Wed, Dec 30, 2015 at 5:44 PM, Andres Freund wrote:
> >> I still maintain that --enable-depend should be on by default. We're
> >> absurdly optimizing towards saving a handful of cycles in scenarios
> >> which are usually bottlenecked by other th
[ A change as significant as this should not be debated in a thread with
a title that suggests it's of interest to only one or two people ]
> On Wed, Dec 30, 2015 at 5:44 PM, Andres Freund wrote:
>> I still maintain that --enable-depend should be on by default. We're
>> absurdly optimizing toward
On Wed, Dec 30, 2015 at 5:44 PM, Andres Freund wrote:
> On 2015-12-30 11:37:19 -0300, Alvaro Herrera wrote:
> > Aleksander Alekseev wrote:
> >
> > > Here is a funny thing - benchmark results I shared 22.12.2015 are wrong
> > > because I forgot to run `make clean` after changing lwlock.h (autotool
Agree. --enable-depend turned on by default could save me a lot of
time. Now I'm aware regarding --enable-depend and `make clean`. Still
other people will definitely do the same mistake over and over again.
On Wed, 30 Dec 2015 11:49:13 -0300
Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 20
Andres Freund wrote:
> On 2015-12-30 11:37:19 -0300, Alvaro Herrera wrote:
> > Aleksander Alekseev wrote:
> >
> > > Here is a funny thing - benchmark results I shared 22.12.2015 are wrong
> > > because I forgot to run `make clean` after changing lwlock.h (autotools
> > > doesn't rebuild project pr
On 2015-12-30 11:37:19 -0300, Alvaro Herrera wrote:
> Aleksander Alekseev wrote:
>
> > Here is a funny thing - benchmark results I shared 22.12.2015 are wrong
> > because I forgot to run `make clean` after changing lwlock.h (autotools
> > doesn't rebuild project properly after changing .h files).
Aleksander Alekseev wrote:
> Here is a funny thing - benchmark results I shared 22.12.2015 are wrong
> because I forgot to run `make clean` after changing lwlock.h (autotools
> doesn't rebuild project properly after changing .h files).
Running configure with --enable-depend should avoid this prob
Here is a clean version of the patch. Step 1 is an optimization. Step 2
refactors this:
HTAB *
ShmemInitHash(const char *name, /* table string name for shmem index */
- long init_size, /* initial table size */
+ long init_size, /* initial table size XXX ignored,
refa
Good news, everyone!
I discovered that NUM_LOCK_OF_PARTITIONS is a bottleneck for a last
patch. Long story short - NUM_LOCK_OF_PARTITIONS = (1 << 7) gives best
performance:
PN = 16, AN = 8, NUM_LOCK_PARTITIONS = (1 << 7): 4782.9
PN = 16, AN = 4, NUM_LOCK_PARTITIONS = (1 << 7): 4089.9
PN = 16, A
Here is another preliminary result I would like to share. As always you
will find corresponding patch in attachment. It has work in progress
quality.
The idea was suggested by colleague of mine Aleksander Lebedev.
freeList is partitioned like in "no lock" patch. When there is no
enough free items
On Tue, Dec 22, 2015 at 10:39 AM, Aleksander Alekseev
wrote:
> Obviously after splitting a freelist into NUM_LOCK_PARTITIONS
> partitions (and assuming that all necessary locking/unlocking is done
> on calling side) tables can't have more than NUM_LOCK_PARTITIONS
> partitions because it would caus
> > Actually, I'd like to improve all partitioned hashes instead of
> > improve only one case.
>
> Yeah. I'm not sure that should be an LWLock rather than a spinlock,
> but we can benchmark it both ways.
I would like to share some preliminary results. I tested four
implementations:
- no loc
On Fri, Dec 18, 2015 at 8:46 AM, Teodor Sigaev wrote:
>> Oh, that's an interesting idea. I guess the problem is that if the
>> freelist is unshared, then users might get an error that the lock
>> table is full when some other partition still has elements remaining.
>
> Could we split one freelist
> This idea can improve the situation with ProcLock hash table, but I
> think IIUC what Andres is suggesting would reduce the contention
> around dynahash freelist and can be helpful in many more situations
> including BufMapping locks.
I agree. But as I understand PostgreSQL community doesn't gen
> Could we split one freelist in hash to NUM_LOCK_PARTITIONS freelists?
> Each partition will have its own freelist and if freelist is empty
> then partition should search an entry in freelists of other
> partitions. To prevent concurrent access it's needed to add one
> LWLock to hash, each partiti
Oh, that's an interesting idea. I guess the problem is that if the
freelist is unshared, then users might get an error that the lock
table is full when some other partition still has elements remaining.
Could we split one freelist in hash to NUM_LOCK_PARTITIONS freelists?
Each partition will ha
On Fri, Dec 18, 2015 at 2:50 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>
> > This idea can improve the situation with ProcLock hash table, but I
> > think IIUC what Andres is suggesting would reduce the contention
> > around dynahash freelist and can be helpful in many more situat
> What's in pgbench.sql?
It's from first message of this thread:
http://www.postgresql.org/message-id/20151211170001.78ded9d7@fujitsu
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-12-18 11:40:58 +0300, Aleksander Alekseev wrote:
> $ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_database
What's in pgbench.sql?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.o
> This idea can improve the situation with ProcLock hash table, but I
> think IIUC what Andres is suggesting would reduce the contention
> around dynahash freelist and can be helpful in many more situations
> including BufMapping locks.
I agree. But as I understand PostgreSQL community doesn't gen
On Thu, Dec 17, 2015 at 9:33 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>
> > It'd really like to see it being replaced by a queuing lock
> > (i.e. lwlock) before we go there. And then maybe partition the
> > freelist, and make nentries an atomic.
>
> I believe I just implemented s
> Oops, 3.5-4 _times_ more TPS, i.e. 2206 TPS vs 546 TPS.
In fact these numbers are for similar but a little bit different
benchmark (same schema without checks on child tables). Here are exact
numbers for a benchmark described above.
Before:
$ pgbench -j 64 -c 64 -f pgbench.sql -T 30 my_datab
On Thu, Dec 17, 2015 at 8:44 PM, Andres Freund wrote:
>
> On 2015-12-17 09:47:57 -0500, Robert Haas wrote:
> > On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund
wrote:
> > > I'd consider using a LWLock instead of a spinlock here. I've seen this
> > > contended in a bunch of situations, and the queue
> Oh, that's an interesting idea. I guess the problem is that if the
> freelist is unshared, then users might get an error that the lock
> table is full when some other partition still has elements remaining.
True, but I don't believe it should be a real problem assuming we have
a strong hash fun
On Thu, Dec 17, 2015 at 11:03 AM, Aleksander Alekseev
wrote:
>> It'd really like to see it being replaced by a queuing lock
>> (i.e. lwlock) before we go there. And then maybe partition the
>> freelist, and make nentries an atomic.
>
> I believe I just implemented something like this (see attachme
> It'd really like to see it being replaced by a queuing lock
> (i.e. lwlock) before we go there. And then maybe partition the
> freelist, and make nentries an atomic.
I believe I just implemented something like this (see attachment). The
idea is to partition PROCLOCK hash table manually into NUM_
On 2015-12-17 09:47:57 -0500, Robert Haas wrote:
> On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund wrote:
> > I'd consider using a LWLock instead of a spinlock here. I've seen this
> > contended in a bunch of situations, and the queued behaviour, combined
> > with directed wakeups on the OS level,
On Tue, Dec 15, 2015 at 7:25 AM, Andres Freund wrote:
> On 2015-12-11 17:00:01 +0300, Aleksander Alekseev wrote:
>> The problem is that code between LWLockAcquire (lock.c:881) and
>> LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
>> old-good gettimeofday and logging method I m
On 2015-12-11 17:00:01 +0300, Aleksander Alekseev wrote:
> The problem is that code between LWLockAcquire (lock.c:881) and
> LWLockRelease (lock.c:1020) can _sometimes_ run up to 3-5 ms. Using
> old-good gettimeofday and logging method I managed to find a bottleneck:
>
> -- proclock = SetupLockInT
Hello, Tom.
I was exploring this issue further and discovered something strange.
"PROCLOCK hash" and "LOCK hash" are hash tables in shared memory. All
memory for these tables is in fact pre-allocated. But for some reason
these two tables are created (lock.c:394) with init_size =/= max_size.
It ca
On 11 December 2015 at 16:14, Aleksander Alekseev wrote:
> I see your point, but I would like to clarify a few things.
>
> 1. Do we consider described measurement method good enough to conclude
> that sometimes PostgreSQL really spends 3 ms in a spinlock (like a RTT
> between two Internet hosts
Oops. s/approve or disapprove/confirm or deny/
On Fri, 11 Dec 2015 19:14:41 +0300
Aleksander Alekseev wrote:
> Hello, Tom
>
> I see your point, but I would like to clarify a few things.
>
> 1. Do we consider described measurement method good enough to conclude
> that sometimes PostgreSQL reall
Hello, Tom
I see your point, but I would like to clarify a few things.
1. Do we consider described measurement method good enough to conclude
that sometimes PostgreSQL really spends 3 ms in a spinlock (like a RTT
between two Internet hosts in the same city)? If not, what method
should be used to
Aleksander Alekseev writes:
> Turns out PostgreSQL can spend a lot of time waiting for a lock in this
> particular place, especially if you are running PostgreSQL on 60-core
> server. Which obviously is a pretty bad sign.
> ...
> I managed to fix this behaviour by modifying choose_nelem_alloc
> pr
Hello all,
Consider following stacktrace:
(gdb) bt
#0 0x7f77c81fae87 in semop () syscall-template.S:81
#1 0x0063b721 in PGSemaphoreLock pg_sema.c:387
#2 0x00692e1b in LWLockAcquire lwlock.c:1026
#3 0x0068d4c5 in LockAcquireExtended lock.c:881
#4 0x0068dcc1
76 matches
Mail list logo