Sure! I will move further questions to JIRA. Thanks On Thu, Nov 3, 2016 at 4:34 AM, Benjamin Lerer <benjamin.le...@datastax.com> wrote:
> I really have to look at the code of the different versions first to answer > your questions. > I will put as much as information as I can in the ticket but feel free to > ask more questions in the ticket if you need to. > > Benjamin > > On Wed, Nov 2, 2016 at 10:58 PM, Bhaskar Muppana <mgvbhas...@gmail.com> > wrote: > > > Thanks Benjamin! I am also hoping to ask couple of questions on > difference > > in this code base between 2.0, 2.1 and 3.0. I understand 2.0 is closed > from > > community point of view. But, we are still using in some cases, so we > need > > to fix it internally. > > > > With my limited understanding of the code, it looks like to me there is > no > > protection against short reads in 2.0. But in 3.0, we seem to have some > > short read transformation. At least, from comments it feels like that > code > > should have taken care of cases like this. So this must be a bug in 3.0 > > code, where as in 2.0 the complete feature for short read protection is > > missing. Is my understanding right? Can you please shed more light on > this? > > > > I have created a JIRA as you asked - > > https://issues.apache.org/jira/browse/CASSANDRA-12872 > > > > This is my first JIRA, so I don't know conventions to set priority. As > its > > data inconsistency issue, I believe this is critical. Feel free to change > > the priority according to community conventions. > > > > Thanks, > > Bhaskar > > > > On Wed, Nov 2, 2016 at 1:26 AM, Benjamin Lerer < > > benjamin.le...@datastax.com> > > wrote: > > > > > Hi Bhaskar, > > > > > > Thanks for reporting that problem. It is a nice catch :-) > > > > > > Could you open a JIRA ticket with all the information that you > provided? > > > > > > I will try to fix that problem. > > > > > > Benjamin > > > > > > > > > On Wed, Nov 2, 2016 at 12:00 AM, Bhaskar Muppana <mgvbhas...@gmail.com > > > > > wrote: > > > > > > > Hi Guys, > > > > > > > > We are seeing an issue with paging reads missing some small number of > > > > columns when we do paging/limit reads. We get this on a single DC > > cluster > > > > itself when both reads and writes are happening with QUORUM. > > Paging/limit > > > > reads see this issue. I have attached the ccm based script which > > > reproduces > > > > the problem. > > > > > > > > * Keyspace RF - 2 > > > > * Table (id int, course text, marks int, primary key(id, course)) > > > > * replicas for partition key 1 - r1, r2 and r3 > > > > * insert (1, '1', 1) , (1, '2', 2), (1, '3', 3), (1, '4', 4), (1, > > > '5', > > > > 5) - succeeded on all 3 replicas > > > > * insert (1, '6', 6) succeeded on r1 and r3, failed on r2 > > > > * delete (1, '2'), (1, '3'), (1, '4'), (1, '5') succeeded on r1 and > r2, > > > > failed on r3 > > > > * insert (1, '7', 7) succeeded on r1 and r2, failed on r3 > > > > > > > > Local data on 3 nodes looks like as below now > > > > > > > > r1: (1, '1', 1), tombstone(2-5 records), (1, '6', 6), (1, '7', 7) > > > > r2: (1, '1', 1), tombstone(2-5 records), (1, '7', 7) > > > > r3: (1, '1', 1), (1, '2', 2), (1, '3', 3), (1, '4', 4), (1, '5', > > > > 5), (1, '6', 6) > > > > > > > > If we do a paging read with page_size 2, and if it gets data from r2 > > and > > > > r3, then it will only get the data (1, '1', 1) and (1, '7', 7) > skipping > > > > record 6. This problem would happen if the same query is not doing > > paging > > > > but limit set to 2 records. > > > > > > > > Resolution code for reads works same for paging queries and normal > > > > queries. Co-ordinator shouldn't respond back to client with > > > records/columns > > > > that it didn't have complete visibility on all required replicas (in > > this > > > > case 2 replicas). In above case, it is sending back record (1, '7', > 7) > > > back > > > > to client, but its visibility on r3 is limited up to (1, '2', 2) and > it > > > is > > > > relying on just r2 data to assume (1, '6', 6) doesn't exist, which is > > > > wrong. End of the resolution all it can conclusively say any thing > > about > > > is > > > > (1, '1', 1), which exists and (1, '2', 2), which is deleted. > > > > > > > > Ideally we should have different resolution implementation for > > > > paging/limit queries. > > > > > > > > We could reproduce this on 2.0.17, 2.1.16 and 3.0.9. > > > > > > > > Seems like 3.0.9 we have ShortReadProtection transformation on list > > > > queries. I assume that is to protect against the cases like above. > But, > > > we > > > > can reproduce the issue in 3.0.9 as well. > > > > > > > > Thanks, > > > > Bhaskar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >