> But this is the first time I've tried to use the > wide-row support, which makes me a little suspicious. The wide-row support is > not > very well documented, so maybe I'm doing something wrong there in ignorance. This was the area I was thinking about.
Can you drill in and see a pattern. Are the differences in rows that would be paged by wide rows ? Could it be an off by one error in the wide row paging ? It all sounds strange. So I would make sure what your job is outputing matches what it is reading from C*. Maybe add some logging in there. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 10/01/2013, at 1:24 AM, Brian Jeltema <brian.jelt...@digitalenvoy.net> wrote: > Sorry if this is a duplicate - I was having mailer problems last night: > >> Assuming their were no further writes, running repair or using CL all should >> have fixed it. >> >> Can you describe the inconsistency between runs? > > Sure. The job output is generated by a single reducer and consists of a list > of > key/value pairs where the key is the row key of the original table, and the > value is > the total count of all columns in the row. Each run produces a file with a > different > size, and running a diff against various output file pairs displays rows that > only > appear in one file, or rows with the same key but different counts. > > What seems particularly hard to explain is the behavior after setting CL to > ALL, > where the results eventually become reproducible (making it hard to place the > blame on my trivial mapper/reducer implementations) but only after about half > a > dozen runs. And once reaching this state, setting CL to QUORUM results in > additional inconsistent results. > > I can say with certainty that there were no other writes. I'm the sole > developer working > with the CF in question. I haven't seen behavior like this before, though I > don't have > a tremendous amount of experience. But this is the first time I've tried to > use the > wide-row support, which makes me a little suspicious. The wide-row support is > not > very well documented, so maybe I'm doing something wrong there in ignorance. > > Brian > >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 8/01/2013, at 2:16 AM, Brian Jeltema <brian.jelt...@digitalenvoy.net> >> wrote: >> >>> I need some help understanding unexpected behavior I saw in some recent >>> experiments with Cassandra 1.1.5 and Hadoop 1.0.3: >>> >>> I've written a small map/reduce job that simply counts the number of >>> columns in each row of a static CF (call it Foo) >>> and generates a list of every row and column count. A relatively small >>> fraction of the rows have a large number >>> of columns; worst case is approximately 36 million. So when I set up the >>> job, I used wide-row support: >>> >>> ConfigHelper.setInputColumnFamily(job.getConfiguration(), "fooKS", >>> "Foo", WIDE_ROWS); // where WIDE_ROWS == true >>> >>> When I ran this job using the default CL (1) I noticed that the results >>> varied from run to run, which I attributed to inconsistent >>> replicas, since Foo was generated with CL == 1 and the RF == 3. >>> >>> So I ran repair for that CF on every node. The cassandra log on every node >>> contains lines similar to: >>> >>> INFO [AntiEntropyStage:1] 2013-01-05 20:38:48,605 AntiEntropyService.java >>> (line 778) [repair #e4a1d7f0-579d-11e2-0000-d64e0a75e6df] Foo is fully >>> synced >>> >>> However, repeated runs were still inconsistent. Then I set CL to ALL, which >>> I presumed would always result in identical >>> output, but repeated runs initially continued to be inconsistent. However, >>> I noticed that the results seemed to >>> be converging, and after several runs (somewhere between 4 and 6) I finally >>> was producing identical results on every run. >>> Then I set CL to QUORUM, and again generated inconsistent results. >>> >>> Does this behavior make sense? >>> >>> Brian >> >