I'm closing this as returned-with-feedback; AFAICS even the last version
submitted is still in research stage. Please resubmit once you make
further progress.
Thanks,
--
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Servic
On 11 January 2016 at 09:30, Tomas Vondra
wrote:
> Hi,
>
> On 01/10/2016 04:03 AM, Peter Geoghegan wrote:
>
>> On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
>
> Also, are you aware of this?
>>
>>
>> http://www.nus.edu.sg/nurop/2010/Proceedings/SoC/NUROP_Congress_Cheng%20Bin.pdf
>>
>> It
Hi,
On 01/10/2016 05:11 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
Which means the "dim.r" column has 100 different values (0-99) with uniform
distribution. So e.g. "WHERE r < 15" matches 15%.
I think that the use of a uniform distribution to demonstrate
Hi,
On 01/10/2016 04:03 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
Also, have you considered Hash join conditions with multiple
attributes as a special case? I'm thinking of cases like this:
Sorry, accidentally fat-fingered my enter key before I was fin
Hi,
On 01/10/2016 01:08 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
So, this seems to bring reasonable speedup, as long as the selectivity is
below 50%, and the data set is sufficiently large.
What about semijoins? Apparently they can use bloom filters
par
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
> Which means the "dim.r" column has 100 different values (0-99) with uniform
> distribution. So e.g. "WHERE r < 15" matches 15%.
I think that the use of a uniform distribution to demonstrate this
patch is a bad idea, unless you want to have a
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
> Also, have you considered Hash join conditions with multiple
> attributes as a special case? I'm thinking of cases like this:
Sorry, accidentally fat-fingered my enter key before I was finished
drafting that mail. That example isn't useful,
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
> So, this seems to bring reasonable speedup, as long as the selectivity is
> below 50%, and the data set is sufficiently large.
What about semijoins? Apparently they can use bloom filters
particularly effectively. Have you considered them as a
Hi,
attached is v2 of the patch, with a number of improvements:
0) This relies on the the other hashjoin patches (delayed build of
buckets and batching), as it allows sizing the bloom filter.
1) enable_hashjoin_bloom GUC
This is mostly meant for debugging and testing, not for committing.
On 12/28/2015 11:52 AM, David Rowley wrote:
On 28 December 2015 at 23:44, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote:
On 12/28/2015 11:38 AM, David Rowley wrote:
If so, then a filter with all 1 bits set should be thrown away, as
it'll never help us, and the fi
On 28 December 2015 at 23:44, Tomas Vondra
wrote:
> On 12/28/2015 11:38 AM, David Rowley wrote:
>
>> If so, then a filter with all 1 bits set should be thrown away, as
>>
> it'll never help us, and the filter should generally become more
>> worthwhile as it contains a higher ratio of 0 bits vs 1
On 12/28/2015 11:38 AM, David Rowley wrote:
On 28 December 2015 at 23:23, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote:
On 12/28/2015 03:15 AM, David Rowley wrote:
Maybe it would be better to, once the filter is built, simply
count the
number of 1 bits a
On 28 December 2015 at 23:23, Tomas Vondra
wrote:
> On 12/28/2015 03:15 AM, David Rowley wrote:
>
>> Maybe it would be better to, once the filter is built, simply count the
>>
> number of 1 bits and only use the filter if there's less than
>> 1 bits compared to the size of the filter in bits. Th
On 12/28/2015 03:15 AM, David Rowley wrote:
On 18 December 2015 at 04:34, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote:
I think ultimately we'll need to measure the false positive rate, so
that we can use it to dynamically disable the bloom filter if it
gets inefficient.
On 18 December 2015 at 04:34, Tomas Vondra
wrote:
> I think ultimately we'll need to measure the false positive rate, so that
> we can use it to dynamically disable the bloom filter if it gets
> inefficient. Also maybe put some of that into EXPLAIN ANALYZE.
>
I'm not so convinced that will be a
On 12/24/2015 02:51 PM, Simon Riggs wrote:
On 17 December 2015 at 16:00, Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote:
On 12/17/2015 11:44 AM, Simon Riggs wrote:
My understanding is that the bloom filter would be ineffective
in any of
these cases
On 17 December 2015 at 16:00, Tomas Vondra
wrote:
> On 12/17/2015 11:44 AM, Simon Riggs wrote:
>
>>
>> My understanding is that the bloom filter would be ineffective in any of
>> these cases
>> * Hash table is too small
>>
>
> Yes, although it depends what you mean by "too small".
>
> Essentially
Hello, Tomas.
Great idea!
Did you consider to cache bloom filter or at least part(s) of it
somehow? I think this way we could gain some more TPS. This of course
assuming that creating a bloom filter is really a bottleneck here,
which would be nice be investigated first.
Best regards,
Aleksander
Hi,
On 12/20/2015 05:46 AM, Oleg Bartunov wrote:
Tomas,
have you seen
http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru
I have very limited internet connection (no graphics) , so I may miss
something
I haven't seen that, but I don't really see how that's related - your
post is
Tomas,
have you seen
http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru
I have very limited internet connection (no graphics) , so I may miss
something
Oleg
On Wed, Dec 16, 2015 at 4:15 AM, Tomas Vondra
wrote:
> Hi,
>
> while working on the Hash Join improvements, I've been repeat
On 12/17/2015 11:44 AM, Simon Riggs wrote:
My understanding is that the bloom filter would be ineffective in any of
these cases
* Hash table is too small
Yes, although it depends what you mean by "too small".
Essentially if we can do with a single batch, then it's cheaper to do a
single look
Hi,
On 12/17/2015 10:50 AM, Shulgin, Oleksandr wrote:
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:
Attached is a spreadsheet with results for various work_mem
values, and also with a smaller data set (just 30M rows in the fact
table), which easily
On 15 December 2015 at 22:30, Tomas Vondra
wrote:
3) Currently the bloom filter is used whenever we do batching, but it
> should really be driven by selectivity too - it'd be good to (a)
> estimate the fraction of 'fact' tuples having a match in the hash
> table, and not to do bl
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra wrote:
>
> Attached is a spreadsheet with results for various work_mem values, and
> also with a smaller data set (just 30M rows in the fact table), which
> easily fits into memory. Yet it shows similar gains, shaving off ~40% in
> the best case, sug
24 matches
Mail list logo