Thanks for the response Ben this was exactly what I was looking for! I think will write a test to ensure this condition holds when we do Cassandra upgrades, and the linked test in CASSANDRA-6905 <https://issues.apache.org/jira/browse/CASSANDRA-6905> seems to be a nice starting point.
Best regards Morten V. Nielsen On Thu, Dec 20, 2018 at 12:36 PM Ben Slater <ben.sla...@instaclustr.com> wrote: > I don’t have any personal knowledge of the fix but out of interest I took a > look in Jira and it looks sounds to me like the behaviour was fixed here > (in 2.0.10): https://issues.apache.org/jira/browse/CASSANDRA-6905 > > --- > > > *Ben Slater* > *Chief Product Officer* > > > <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> > <https://www.linkedin.com/company/instaclustr> > > Read our latest technical blog posts here > <https://www.instaclustr.com/blog/>. > > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message. > > > On Thu, 20 Dec 2018 at 21:07, Morten Vejen Nielsen <mve...@gmail.com> > wrote: > > > Hi, > > > > (Moved from user mailing list to here) > > > > I have found a statement in the Datastax documentation regarding > CommitLog > > recovery that concerns me, namely: > > > > "*Restore stops when the first client-supplied timestamp is greater than > > the restore point timestamp. Because the order in which the database > > receives mutations does not strictly follow the timestamp order, this can > > leave some mutations unrecovered.*" > > > > From: > > > > > https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html > > Which to me means that point in time restore really doesn't guarantee > point > > in time replay for the configured time. Since we expect to have mutations > > out of order in our setup. > > > > I conducted a few experiments on this myself by forcing my Cassandra > > instance to do CommitLog replay with changes ahead in time. But I was not > > able to reproduce this behavior. > > I used a fresh instance taken from the official Cassandra docker image to > > run the tests, so no changes to any configs was done other than setting > the > > restore_point_in_time as specified below. > > I did the experiment as follows: > > > > --edit /etc/cassandra/commitlog_archiving.properties, set > > *restore_point_in_time* to something in the near future (lets say 2 > > hours ahead of server-time) > > > > ssh into instance > > > > cqlsh > > create keyspace thezoo with replication = > > {'class':'SimpleStrategy','replication_factor':1}; > > use thezoo; > > create table animal (id int primary key, name varchar); > > insert into animal (id, name) values (1, 'Bear1');insert into animal > > (id, name) values (2, 'Bear2');insert into animal (id, name) values > > (3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert > > into animal (id, name) values (5, 'Bear5');insert into animal (id, > > name) values (6, 'Bear6');insert into animal (id, name) values (7, > > 'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into > > animal (id, name) values (9, 'Bear9');insert into animal (id, name) > > values (10, 'Bear10'); > > select id,name,writetime(name) from animal; > > --Add some to timestamp, and use this as future_timestamp, must be > > ahead of what was defined in commitlog config file > > insert into animal (id, name) values (11, 'DuckFromFuture') using > > timestamp <future_timestamp> > > insert into animal (id, name) values (12, 'Bird1');insert into animal > > (id, name) values (13, 'Bird2');insert into animal (id, name) values > > (14, 'Bird3');insert into animal (id, name) values (15, > > 'Bird4');insert into animal (id, name) values (16, 'Bird5');insert > > into animal (id, name) values (17, 'Bird6');insert into animal (id, > > name) values (18, 'Bird7');insert into animal (id, name) values (19, > > 'Bird8');insert into animal (id, name) values (20, 'Bird9');insert > > into animal (id, name) values (21, 'Bird10'); > > > > --Now I simply forced the power off the machine held the power button > > down. And restarted > > > > --During startup verify that commitlog replay has been done in log > > > > ssh into instance and enter cqlsh > > > > cqlsh:thezoo> select * from animal; > > > > --Which shows all the bears and birds have been replayed but not the > duck! > > > > I also did some digging in the Cassandra source code, and made the > > following findings: > > > > I think the code that skips mutations ahead of time is in > CommitLogReplayer > > class: > > See lines: 194-195 (at the time of writing) > > if (commitLogReplayer.pointInTimeExceeded(mutation)) > > return; > > This code is triggerred from CommitLogReader, where readSection seems to > be > > responsible for reading the commit logs, this is wrapped in a while loop, > > that just reads the file until EOF. > > See: > > while (statusTracker.shouldContinue() && reader.getFilePointer() < end > && > > !reader.isEOF()) > > This method is called file by file from CommitLog.recover to recover all > > commitlog segment files. > > And just a note statusTracker.shouldContinue will fail if > > statusTracker.requestTermination(); is called but I dont see this being > the > > case for the pointInTimeExceeded case. > > > > I am a bit concerned if this is some hidden feature in Cassandra, and as > > such we might have to revise our backup strategies, if this is the case. > > However as far as I can see the Datastax documentation on this is simply > > wrong unfortunately the official documentation on this just seems to be > > work in progress. > > The fact that it doesn't do this is in fact a positive result for me as I > > would also expect the point-in-time to guarantee that all mutatations up > > until this point-in-time is in fact recovered. > > > > Can anyone confirm if this is just the documentation that is wrong or > maybe > > if I did something wrong in my experiments. > > > > (For reference I also conducted some experiments with larger amount of > data > > where the recovery went through multiple commitlog files, but I got the > > same results, namely that it recovered ALL records before > > restore_point_in_time.) > > > > Best regards > > Morten V. Nielsen > > > -- //Morten Vejen Nielsen - mvejen