On Thu, Aug 9, 2012 at 2:59 AM, Jesper Krogh <jes...@krogh.cc> wrote: > If it is an implementation artifact or an result of this > approach I dont know. But currently, when the GIN fastupdate > code finally decides to "flush" the buffer, it is going to stall all > other processes doing updates while doing it. If you only have > one update process then this doesn't matter. But if you're trying to get > user-interactive-updates to flow in with batch-updates from > background processes, then you'd better kill off this feature, > since you're gauranteed that the user-interactive process is > either going to flush the buffer or wait on someone else doing > it. > > I havent done the benchmarking, but I'm actually fairly sure that > fastupdate isn't overall faster if you bump concurrency slightly and run of > memory or SSD-based backends due to this cross-backend contention > of the buffer.
Yeah, I've noticed that there are some things that are a little wonky about GIN fastupdate. On the other hand, I believe that MySQL has something along these lines called secondary index buffering which apparently does very good things for random I/O. I am not sure of the details or the implementation, though. > A buffer that is backend local, so you can use transactions to > batch up changes would get around this, but that may have another > huge set of consequenses I dont know if. > > ... based on my own real-world experience with this feature. Well, the main thing to worry about is transactional consistency. If a backend which has postponed doing the index-inserts does an index scan after the command-counter-id has been bumped, it'll see inconsistent results. We could avoid that by only using the optimization when some set of sanity checks passes and doing the deferred inserts at the end of the statement, or something like that. The other tricky part is figuring out how to actually get a performance improvement out of it. I think Simon's probably right that a lot of the cost is in repeatedly walking the btree, looking up and pinning/unpinning/locking/unlocking buffers along the way. Maybe we could sort the data in index order, walk down to the first insertion point, and the insert as many tuples in a row as precede the next key in the index. Then lather, rinse, repeat. If you're actually just adding everything at the tail of the index, this ought to work pretty well. But if the inserts are all over the place it seems like it might not be any better, or actually a little worse. Of course it's probably premature to speculate too much until someone actually codes something up and tests it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers