Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-10-02 Thread Michael Paquier
On Wed, Sep 14, 2016 at 3:48 PM, Heikki Linnakangas wrote: > If we flushed the tree to a tape instead, then we could perhaps use the > machinery that Peter's parallel B-tree patch is adding to tuplesort.c, to > merge the tapes. I'm not sure if that works out, but I think it's worth some > experime

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-09-13 Thread Heikki Linnakangas
On 01/17/2016 10:03 PM, Jeff Janes wrote: On Fri, Jan 15, 2016 at 3:29 PM, Peter Geoghegan wrote: On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan wrote: I have a draft implementation which divides the whole process between N parallel workers, see the patch attached. Instead of a full scan

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-18 Thread Robert Haas
On Fri, Jan 15, 2016 at 5:38 PM, Constantin S. Pan wrote: > In current state the implementation is just a proof of concept > and it has all the configuration hardcoded, but it already works as is, > though it does not speed up the build process more than 4 times on my > configuration (12 CPUs). Th

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-17 Thread Peter Geoghegan
On Sun, Jan 17, 2016 at 12:03 PM, Jeff Janes wrote: > I think it would take a lot of changes to tuple sort to make this be a > almost-always win. > > In the general case each GIN key occurs in many tuples, and the > in-memory rbtree is good at compressing the tid list for each key to > maximize th

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-17 Thread Jeff Janes
On Fri, Jan 15, 2016 at 3:29 PM, Peter Geoghegan wrote: > On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan wrote: >> I have a draft implementation which divides the whole process between >> N parallel workers, see the patch attached. Instead of a full scan of >> the relation, I give each worker

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-17 Thread Constantin S. Pan
On Fri, 15 Jan 2016 15:29:51 -0800 Peter Geoghegan wrote: > On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan > wrote: > Even without parallelism, wouldn't it be better if GIN indexes were > built using tuplesort? I know way way less about the gin am than the > nbtree am, but I imagine that a p

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-15 Thread Peter Geoghegan
On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan wrote: > I have a draft implementation which divides the whole process between > N parallel workers, see the patch attached. Instead of a full scan of > the relation, I give each worker a range of blocks to read. I am currently working on a patch

[HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-15 Thread Constantin S. Pan
Hi, Hackers. The task of building GIN can require lots of time and eats 100 % CPU, but we could easily make it use more than a 100 %, especially since we now have parallel workers in postgres. The process of building GIN looks like this: 1. Accumulate a batch of index records into an rbtree in m