I remembered something like that so had a look at RangeSliceResponseResolver.resolve() in 0.6.12 and it looks like it schedules the repairs...
protected Row getReduced() { ColumnFamily resolved = ReadResponseResolver.resolveSuperset(versions); ReadResponseResolver.maybeScheduleRepairs(resolved, table, key, versions, versionSources); versions.clear(); versionSources.clear(); return new Row(key, resolved); } Is that right? ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 May 2011, at 00:48, Jonathan Ellis wrote: > range_slices respects consistencylevel, but only single-row reads and > multiget do the *repair* part of RR. > > On Sat, May 7, 2011 at 1:44 AM, aaron morton <aa...@thelastpickle.com> wrote: >> get_range_slices() does read repair if enabled (checked >> DoConsistencyChecksBoolean in the config, it's on by default) so you should >> be getting good reads. If you want belt-and-braces run nodetool repair first. >> >> Hope that helps. >> >> >> On 7 May 2011, at 11:46, Jeremy Hanna wrote: >> >>> Great! I just wanted to make sure you were getting the information you >>> needed. >>> >>> On May 6, 2011, at 6:42 PM, Henrik Schröder wrote: >>> >>>> Well, I already completed the migration program. Using get_range_slices I >>>> could migrate a few thousand rows per second, which means that migrating >>>> all of our data would take a few minutes, and we'll end up with pristine >>>> datafiles for the new cluster. Problem solved! >>>> >>>> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so >>>> that you all can repeat this and hopefully fix it. >>>> >>>> >>>> /Henrik Schröder >>>> >>>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1...@gmail.com> >>>> wrote: >>>> If you're able, go into the #cassandra channel on freenode (IRC) and talk >>>> to driftx or jbellis or aaron_morton about your problem. It could be that >>>> you don't have to do all of this based on a conversation there. >>>> >>>> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote: >>>> >>>>> I'll see if I can make some example broken files this weekend. >>>>> >>>>> >>>>> /Henrik Schröder >>>>> >>>>> On Fri, May 6, 2011 at 02:10, aaron morton <aa...@thelastpickle.com> >>>>> wrote: >>>>> The difficulty is the different thrift clients between 0.6 and 0.7. >>>>> >>>>> If you want to roll your own solution I would consider: >>>>> - write an app to talk to 0.6 and pull out the data using keys from the >>>>> other system (so you know can check referential integrity while you are >>>>> at it). Dump the data to flat file. >>>>> - write an app to talk to 0.7 to load the data back in. >>>>> >>>>> I've not given up digging on your migration problem, having to manually >>>>> dump and reload if you've done nothing wrong is not the best solution. >>>>> I'll try to find some time this weekend to test with: >>>>> >>>>> - 0.6 server, random paritioner, standard CF's, byte column >>>>> - load with python or the cli on osx or ubuntu (dont have a window >>>>> machine any more) >>>>> - migrate and see whats going on. >>>>> >>>>> If you can spare some sample data to load please send it over in the user >>>>> group or my email address. >>>>> >>>>> Cheers >>>>> >>>>> ----------------- >>>>> Aaron Morton >>>>> Freelance Cassandra Developer >>>>> @aaronmorton >>>>> http://www.thelastpickle.com >>>>> >>>>> On 6 May 2011, at 05:52, Henrik Schröder wrote: >>>>> >>>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows >>>>>> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in >>>>>> the sstables are corrupt, and it seems impossible to clean it up without >>>>>> losing data. >>>>>> >>>>>> However, we can still read all rows perfectly via thrift so we are now >>>>>> looking at building a simple tool that will copy all rows from our 0.6.3 >>>>>> cluster to a parallell 0.7.5 cluster. Our question is now how to do that >>>>>> and ensure that we actually get all rows migrated? It's a pretty small >>>>>> cluster, 3 machines, a single keyspace, a singke columnfamily, ~2 >>>>>> million rows, a few GB of data, and a replication factor of 3. >>>>>> >>>>>> So what's the best way? Call get_range_slices and move through the >>>>>> entire token space? We also have all row keys in a secondary system, >>>>>> would it be better to use that and make calls to get_multi or >>>>>> get_multi_slices instead? Are we correct in assuming that if we use the >>>>>> consistencylevel ALL we'll get all rows? >>>>>> >>>>>> >>>>>> /Henrik Schröder >>>>> >>>>> >>>> >>>> >>> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com