Pavel Ivanov <piva...@google.com> writes: > I think to fix this bug we should stop using gtid_slave_pos as > indication of the current db state. We should make it possible to
Agree. > change gtid_binlog_pos when there's no events in binlogs. And when Ok. Actually, I think we should expose the real binlog state (what is stored in the Gtid_list event at the start of the binlog). So something like a variable @@GLOBAL.gtid_binlog_state Example value: '0-1-100,0-2-101' And you get an error if you set it unless the binlog is empty. Would this be what you need? > it kind of makes sense more than using gtid_slave_pos. But probably > this will break the detection of slaves trying to connect using GTID > before the start of binlogs... I do not think it will break that (but we will see). > 5. Completely from different area but also GTID related bug. Take > database from previous MySQL version (I've tested on the database from > 5.1), start MariaDB on it, run mysql_upgrade and then try to set > gtid_slave_pos to something. At this point I've got error "unable to > load slave state from gtid_slave_pos table". This error was apparently > remembered from MariaDB's start and reading of gtid_slave_pos table > wasn't retried after mysql_upgrade actually created it. Ok, I will take a look. I think there is an existing bug report on that. IIRC there is some locking issue (the variable can be accessed from a place where table locks cannot be taken to read gtid_slave_pos table), but I will see what can be done. > 1. When master doesn't have binlogs and gtid_slave_pos is ahead of the > GTID that slave tries to connect with you give error "The binlog on > the master is missing the GTID ... requested by the slave (even though > both a prior and a subsequent number does exist), and GTID strict mode > is enabled". I find this error message very confusing: presence of a > subsequent GTID in such situation is questionable, but there is no > prior GTID in master's binlog for sure. Hm, this sounds like a bug. Do you have a testcase? But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will get instead the correct error message, that the position that the slave requests to connect at has been purged from the master's binlog. > 2. The error message "An attempt was made to binlog GTID ... which > would create an out-of-order sequence number with existing GTID ..., > and gtid strict mode is enabled" is confusing too, because it's issued > not when slave actually tries to write event to binlog. Apparently the > error condition is checked when slave considers executing the event > that was just received from master. And if this event contains changes > only to tables matching replicate-wild-ignore-table filter then this > event won't be ever binlog'ed on slave in non-strict mode. So there's > no "attempt to binlog" involved and error wording becomes not quite > understandable. Right, I see. Thanks! One problem here is that when using non-transactional (DDL or MyISAM), then we _do_ need to check this _before_ executing the event. Because we cannot roll back after the event. But I agree of course that this is a bug. I will try to find a way to fix. Maybe the check can be delayed until the first event that we are actually going to execute (not filter). > 3. There's error message "Specified GTID ... conflicts with the binary > log which contains a more recent GTID .... If > MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override > the new value of @@gtid_slave_pos". It looks like it's issued > inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27, > 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos > were set to '0-2-30'. In this situation I was able to set > gtid_slave_pos to '0-1-29' successfully and get "slave has diverged" > error after START SLAVE. Then I was able to set gtid_slave_pos to > '0-2-29' and get error "Attempt was made to binlog out-of-order" after > START SLAVE. > I'd think that at least in strict mode MariaDB shouldn't allow to set > gtid_slave_pos to a value that is clearly in the past. Right, thanks, I will check. (I can understand that 0-1-29 did not give error, though you are probably right that it should; but that 0-2-29 did not give error is surprising). > 4. Now real bug. Start three servers S1, S2 and S3 without binlogs. > Set gtid_slave_pos to the same value on all of them. Connect S2 to > replicate from S1. Execute a few transactions on S1. Perform a > failover, make S1 to replicate from S2. Now connect S3 to replicate > from S2. At this point S3 should be able to replicate successfully > because it has the same db state as S2 had in the beginning (S3 has > the same gtid_slave_pos as S2 had initially), and S2 has all binlogs > to move from current position on S3 to the current position on S2. But > yet S3 gets error that starting GTID doesn't exist in S2's binlogs. This should also be fixed by setting @@GLOBAL.gtid_binlog_state. - Kristian. _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp