Well, I already completed the migration program. Using get_range_slices I
could migrate a few thousand rows per second, which means that migrating all
of our data would take a few minutes, and we'll end up with pristine
datafiles for the new cluster. Problem solved!

I'll see if I can create datafiles in 0.6 that are uncleanable in 0.7 so
that you all can repeat this and hopefully fix it.


/Henrik Schröder

On Sat, May 7, 2011 at 00:35, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote:

> If you're able, go into the #cassandra channel on freenode (IRC) and talk
> to driftx or jbellis or aaron_morton about your problem.  It could be that
> you don't have to do all of this based on a conversation there.
>
> On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:
>
> > I'll see if I can make some example broken files this weekend.
> >
> >
> > /Henrik Schröder
> >
> > On Fri, May 6, 2011 at 02:10, aaron morton <aa...@thelastpickle.com>
> wrote:
> > The difficulty is the different thrift clients between 0.6 and 0.7.
> >
> > If you want to roll your own solution I would consider:
> > - write an app to talk to 0.6 and pull out the data using keys from the
> other system (so you know can check referential integrity while you are at
> it). Dump the data to flat file.
> > - write an app to talk to 0.7 to load the data back in.
> >
> > I've not given up digging on your migration problem, having to manually
> dump and reload if you've done nothing wrong is not the best solution. I'll
> try to find some time this weekend to test with:
> >
> > - 0.6 server, random paritioner, standard CF's, byte column
> > - load with python or the cli on osx or ubuntu (dont have a window
> machine any more)
> > - migrate and see whats going on.
> >
> > If you can spare some sample data to load please send it over in the user
> group or my email address.
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 6 May 2011, at 05:52, Henrik Schröder wrote:
> >
> > > We can't do a straight upgrade from 0.6.13 to 0.7.5 because we have
> rows stored that have unicode keys, and Cassandra 0.7.5 thinks those rows in
> the sstables are corrupt, and it seems impossible to clean it up without
> losing data.
> > >
> > > However, we can still read all rows perfectly via thrift so we are now
> looking at building a simple tool that will copy all rows from our 0.6.3
> cluster to a parallell 0.7.5 cluster. Our question is now how to do that and
> ensure that we actually get all rows migrated? It's a pretty small cluster,
> 3 machines, a single keyspace, a singke columnfamily, ~2 million rows, a few
> GB of data, and a replication factor of 3.
> > >
> > > So what's the best way? Call get_range_slices and move through the
> entire token space? We also have all row keys in a secondary system, would
> it be better to use that and make calls to get_multi or get_multi_slices
> instead? Are we correct in assuming that if we use the consistencylevel ALL
> we'll get all rows?
> > >
> > >
> > > /Henrik Schröder
> >
> >
>
>

Reply via email to