On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, Sep 11, 2015 at 9:21 PM, Robert Haas <robertmh...@gmail.com> > wrote: > > > > On Fri, Sep 11, 2015 at 10:31 AM, Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > Could you perhaps try to create a testcase where xids are accessed > that > > > > are so far apart on average that they're unlikely to be in memory? > And > > > > then test that across a number of client counts? > > > > > > > > > > Now about the test, create a table with large number of rows (say > 11617457, > > > I have tried to create larger, but it was taking too much time (more > than a day)) > > > and have each row with different transaction id. Now each transaction > should > > > update rows that are at least 1048576 (number of transactions whose > status can > > > be held in 32 CLog buffers) distance apart, that way ideally for each > update it will > > > try to access Clog page that is not in-memory, however as the value to > update > > > is getting selected randomly and that leads to every 100th access as > disk access. > > > > What about just running a regular pgbench test, but hacking the > > XID-assignment code so that we increment the XID counter by 100 each > > time instead of 1? > > > > If I am not wrong we need 1048576 number of transactions difference > for each record to make each CLOG access a disk access, so if we > increment XID counter by 100, then probably every 10000th (or multiplier > of 10000) transaction would go for disk access. > > The number 1048576 is derived by below calc: > #define CLOG_XACTS_PER_BYTE 4 > #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) > > Transaction difference required for each transaction to go for disk access: > CLOG_XACTS_PER_PAGE * num_clog_buffers. > That guarantees that every xid occupies its own 32-contiguous-pages chunk of clog. But clog pages are not pulled in and out in 32-page chunks, but one page chunks. So you would only need 32,768 differences to get every real transaction to live on its own clog page, which means every look up of a different real transaction would have to do a page replacement. (I think your references to disk access here are misleading. Isn't the issue here the contention on the lock that controls the page replacement, not the actual IO?) I've attached a patch that allows you set the guc "JJ_xid",which makes it burn the given number of xids every time one new one is asked for. (The patch introduces lots of other stuff as well, but I didn't feel like ripping the irrelevant parts out--if you don't set any of the other gucs it introduces from their defaults, they shouldn't cause you trouble.) I think there are other tools around that do the same thing, but this is the one I know about. It is easy to drive the system into wrap-around shutdown with this, so lowering autovacuum_vacuum_cost_delay is a good idea. Actually I haven't attached it, because then the commitfest app will list it as the patch needing review, instead I've put it here https://drive.google.com/file/d/0Bzqrh1SO9FcERV9EUThtT3pacmM/view?usp=sharing I think reducing to every 100th access for transaction status as disk access > is sufficient to prove that there is no regression with the patch for the > screnario > asked by Andres or do you think it is not? > > Now another possibility here could be that we try by commenting out fsync > in CLOG path to see how much it impact the performance of this test and > then for pgbench test. I am not sure there will be any impact because even > every 100th transaction goes to disk access that is still less as compare > WAL fsync which we have to perform for each transaction. > You mentioned that your clog is not on ssd, but surely at this scale of hardware, the hdd the clog is on has a bbu in front of it, no? But I thought Andres' concern was not about fsync, but about the fact that the SLRU does linear scans (repeatedly) of the buffers while holding the control lock? At some point, scanning more and more buffers under the lock is going to cause more contention than scanning fewer buffers and just evicting a page will. Cheers, Jeff