If you're able, go into the #cassandra channel on freenode (IRC) and talk to 
driftx or jbellis or aaron_morton about your problem.  It could be that you 
don't have to do all of this based on a conversation there.

On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:

> I'll see if I can make some example broken files this weekend.
> 
> 
> /Henrik Schröder
> 
> On Fri, May 6, 2011 at 02:10, aaron morton <aa...@thelastpickle.com> wrote:
> The difficulty is the different thrift clients between 0.6 and 0.7.
> 
> If you want to roll your own solution I would consider:
> - write an app to talk to 0.6 and pull out the data using keys from the other 
> system (so you know can check referential integrity while you are at it). 
> Dump the data to flat file.
> - write an app to talk to 0.7 to load the data back in.
> 
> I've not given up digging on your migration problem, having to manually dump 
> and reload if you've done nothing wrong is not the best solution. I'll try to 
> find some time this weekend to test with:
> 
> - 0.6 server, random paritioner, standard CF's, byte column
> - load with python or the cli on osx or ubuntu (dont have a window machine 
> any more)
> - migrate and see whats going on.
> 
> If you can spare some sample data to load please send it over in the user 
> group or my email address.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6 May 2011, at 05:52, Henrik Schröder wrote:
> 
> > We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have rows 
> > stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in the 
> > sstables are corrupt, and it seems impossible to clean it up without losing 
> > data.
> >
> > However, we can still read all rows perfectly via thrift so we are now 
> > looking at building a simple tool that will copy all rows from our 0.6.3 
> > cluster to a parallell 0.7.5 cluster. Our question is now how to do that 
> > and ensure that we actually get all rows migrated? It's a pretty small 
> > cluster, 3 machines, a single keyspace, a singke columnfamily, ~2 million 
> > rows, a few GB of data, and a replication factor of 3.
> >
> > So what's the best way? Call get_range_slices and move through the entire 
> > token space? We also have all row keys in a secondary system, would it be 
> > better to use that and make calls to get_multi or get_multi_slices instead? 
> > Are we correct in assuming that if we use the consistencylevel ALL we'll 
> > get all rows?
> >
> >
> > /Henrik Schröder
> 
> 

Reply via email to