InnoDB fixed group commit in the InnoDB plugin. This performs as
expected when the binlog is disabled. This does not perform as I
expect when the binlog is enabled.

Is this a problem for PBXT?

The problems for InnoDB are:
1) commit is serialized on the binlog write/fsync
2) row locks are not released until the commit step of XA prepare/commit
3) per-table auto inc locks not released until the commit step of XA

I think that 2) and 3) can be fixed without significant changes. They
cause a lot of convoys today for high-throughput OLTP -- too many
connections needlessly wait on row locks and the per-table auto-inc
lock. Doing the binlog fsync one connection at a time also causes a
lot of convoys. This makes MySQL much slower than it should be for
some workloads even with battery backed RAID write caches.

Problem 1) occurs because:
* there is no group commit for the binlog fsync
* InnoDB locks prepare_commit_mutex in the prepare step

Even if there were group commit for the binlog fsync, it would be
useless for InnoDB because prepare_commit_mutex is locked in the
prepare step and not unlocked until the commit step and the binlog
write/fsync is done between these two steps.

There is a MySQL worklog for this (4007) that:
* doesn't intend to add group commit for the binlog fsync
* doesn't mention the problem of prepare_commit_mutex

I have started to work on this, but don't have any code to share yet.

Pseudo-code for commit with the InnoDB plugin when the binlog is enabled:

ha_commit_trans()
    * ht->prepare() == innobase_xa_prepare()
          o trx_prepare_for_mysql(trx)
                + force to disk the trx log buffer for all changes from this trx
                + fsync done here, group prepare may amortize that
          o lock prepare_commit_mutex
    * tc_log->log_xid(thd, xid)
          o writes SQL to binlog, XID to binlog, optionally fsync binlog
    * ha_commit_one_phase()
          o ht->commit() == innobase_commit()
                + innobase_commit_low(()
                      # write commit record to trx log buffer, release
locks from this trx
                      # for auto-commit statements, the per-table
auto-inc lock is released here
                + unlock prepare_commit_mutex
                + trx_commit_complete_for_mysql()
                      # force to disk the trx log buffer including
commit record for this trx
                      # fsync done here, group commit may amortize that

-- 
Mark Callaghan
mdcal...@gmail.com

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to