range_slices respects consistencylevel, but only single-row reads and multiget do the *repair* part of RR.
On Sat, May 7, 2011 at 1:44 AM, aaron morton <aa...@thelastpickle.com> wrote: > get_range_slices() does read repair if enabled (checked > DoConsistencyChecksBoolean in the config, it's on by default) so you should > be getting good reads. If you want belt-and-braces run nodetool repair first. > > Hope that helps. > > > On 7 May 2011, at 11:46, Jeremy Hanna wrote: > >> Great! I just wanted to make sure you were getting the information you >> needed. >> >> On May 6, 2011, at 6:42 PM, Henrik Schröder wrote: >> >>> Well, I already completed the migration program. Using get_range_slices I >>> could migrate a few thousand rows per second, which means that migrating >>> all of our data would take a few minutes, and we'll end up with pristine >>> datafiles for the new cluster. Problem solved! >>> >>> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so >>> that you all can repeat this and hopefully fix it. >>> >>> >>> /Henrik Schröder >>> >>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1...@gmail.com> >>> wrote: >>> If you're able, go into the #cassandra channel on freenode (IRC) and talk >>> to driftx or jbellis or aaron_morton about your problem. It could be that >>> you don't have to do all of this based on a conversation there. >>> >>> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote: >>> >>>> I'll see if I can make some example broken files this weekend. >>>> >>>> >>>> /Henrik Schröder >>>> >>>> On Fri, May 6, 2011 at 02:10, aaron morton <aa...@thelastpickle.com> wrote: >>>> The difficulty is the different thrift clients between 0.6 and 0.7. >>>> >>>> If you want to roll your own solution I would consider: >>>> - write an app to talk to 0.6 and pull out the data using keys from the >>>> other system (so you know can check referential integrity while you are at >>>> it). Dump the data to flat file. >>>> - write an app to talk to 0.7 to load the data back in. >>>> >>>> I've not given up digging on your migration problem, having to manually >>>> dump and reload if you've done nothing wrong is not the best solution. >>>> I'll try to find some time this weekend to test with: >>>> >>>> - 0.6 server, random paritioner, standard CF's, byte column >>>> - load with python or the cli on osx or ubuntu (dont have a window machine >>>> any more) >>>> - migrate and see whats going on. >>>> >>>> If you can spare some sample data to load please send it over in the user >>>> group or my email address. >>>> >>>> Cheers >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Cassandra Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 6 May 2011, at 05:52, Henrik Schröder wrote: >>>> >>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows >>>>> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in >>>>> the sstables are corrupt, and it seems impossible to clean it up without >>>>> losing data. >>>>> >>>>> However, we can still read all rows perfectly via thrift so we are now >>>>> looking at building a simple tool that will copy all rows from our 0.6.3 >>>>> cluster to a parallell 0.7.5 cluster. Our question is now how to do that >>>>> and ensure that we actually get all rows migrated? It's a pretty small >>>>> cluster, 3 machines, a single keyspace, a singke columnfamily, ~2 million >>>>> rows, a few GB of data, and a replication factor of 3. >>>>> >>>>> So what's the best way? Call get_range_slices and move through the entire >>>>> token space? We also have all row keys in a secondary system, would it be >>>>> better to use that and make calls to get_multi or get_multi_slices >>>>> instead? Are we correct in assuming that if we use the consistencylevel >>>>> ALL we'll get all rows? >>>>> >>>>> >>>>> /Henrik Schröder >>>> >>>> >>> >>> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com