range_slices respects consistencylevel, but only single-row reads and
multiget do the *repair* part of RR.

On Sat, May 7, 2011 at 1:44 AM, aaron morton <aa...@thelastpickle.com> wrote:
> get_range_slices() does read repair if enabled (checked 
> DoConsistencyChecksBoolean in the config, it's on by default) so you should 
> be getting good reads. If you want belt-and-braces run nodetool repair first.
>
> Hope that helps.
>
>
> On 7 May 2011, at 11:46, Jeremy Hanna wrote:
>
>> Great!  I just wanted to make sure you were getting the information you 
>> needed.
>>
>> On May 6, 2011, at 6:42 PM, Henrik Schröder wrote:
>>
>>> Well, I already completed the migration program. Using get_range_slices I 
>>> could migrate a few thousand rows per second, which means that migrating 
>>> all of our data would take a few minutes, and we'll end up with pristine 
>>> datafiles for the new cluster. Problem solved!
>>>
>>> I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so 
>>> that you all can repeat this and hopefully fix it.
>>>
>>>
>>> /Henrik Schröder
>>>
>>> On Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1...@gmail.com> 
>>> wrote:
>>> If you're able, go into the #cassandra channel on freenode (IRC) and talk 
>>> to driftx or jbellis or aaron_morton about your problem.  It could be that 
>>> you don't have to do all of this based on a conversation there.
>>>
>>> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:
>>>
>>>> I'll see if I can make some example broken files this weekend.
>>>>
>>>>
>>>> /Henrik Schröder
>>>>
>>>> On Fri, May 6, 2011 at 02:10, aaron morton <aa...@thelastpickle.com> wrote:
>>>> The difficulty is the different thrift clients between 0.6 and 0.7.
>>>>
>>>> If you want to roll your own solution I would consider:
>>>> - write an app to talk to 0.6 and pull out the data using keys from the 
>>>> other system (so you know can check referential integrity while you are at 
>>>> it). Dump the data to flat file.
>>>> - write an app to talk to 0.7 to load the data back in.
>>>>
>>>> I've not given up digging on your migration problem, having to manually 
>>>> dump and reload if you've done nothing wrong is not the best solution. 
>>>> I'll try to find some time this weekend to test with:
>>>>
>>>> - 0.6 server, random paritioner, standard CF's, byte column
>>>> - load with python or the cli on osx or ubuntu (dont have a window machine 
>>>> any more)
>>>> - migrate and see whats going on.
>>>>
>>>> If you can spare some sample data to load please send it over in the user 
>>>> group or my email address.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 6 May 2011, at 05:52, Henrik Schröder wrote:
>>>>
>>>>> We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows 
>>>>> stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in 
>>>>> the sstables are corrupt, and it seems impossible to clean it up without 
>>>>> losing data.
>>>>>
>>>>> However, we can still read all rows perfectly via thrift so we are now 
>>>>> looking at building a simple tool that will copy all rows from our 0.6.3 
>>>>> cluster to a parallell 0.7.5 cluster. Our question is now how to do that 
>>>>> and ensure that we actually get all rows migrated? It's a pretty small 
>>>>> cluster, 3 machines, a single keyspace, a singke columnfamily, ~2 million 
>>>>> rows, a few GB of data, and a replication factor of 3.
>>>>>
>>>>> So what's the best way? Call get_range_slices and move through the entire 
>>>>> token space? We also have all row keys in a secondary system, would it be 
>>>>> better to use that and make calls to get_multi or get_multi_slices 
>>>>> instead? Are we correct in assuming that if we use the consistencylevel 
>>>>> ALL we'll get all rows?
>>>>>
>>>>>
>>>>> /Henrik Schröder
>>>>
>>>>
>>>
>>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to