>>>>> "Eric" == Eric Czech <e...@nextbigsound.com> writes:

    Eric> We're exploring a data processing procedure where we snapshot
    Eric> our production cluster data and move that data to a new
    Eric> cluster for analysis but I'm having some strange issues where
    Eric> the analysis cluster is still somehow aware of the production
    Eric> cluster (i.e. the production cluster ring is trying to include
    Eric> nodes from the other cluster with the same token).

Are you using the same cluster name in for both clusters? If so, I would
suggest you don't.

    Eric> The seed addresses in cassandra.yaml definitely prohibit this
    Eric> type of intersection between the two clusters so I'm guessing
    Eric> that it has something to do with the information in the system
    Eric> sstables.

I'm sure you will get a more knowledgeable answer from people who have
been doing this for a while: but I have to ask are copying over the
LocationInfo* SSTables from the snapshot to the analysis cluster?

The LocationInfo CF can record the endpoints in your production cluster.
>From the little I've read of the code (StorageService.java and
SystemTable.java) it is possible (likely?) that endpoints from your
production cluster will get added to your analysis cluster's Gossiper on
startup. If you are using the same cluster name, well, there you have
it.....

    Eric> Is there anyway to duplicate raw sstables in an effort to
    Eric> "copy" a cluster such that the copied cluster has a different
    Eric> name?  I know this usually results in a "saved cluster name X
    Eric> != Y" sort of error but it looks like we need to find some
    Eric> sort of way to do this logical separation.

Copying the raw tables and ignoring/deleting the
data/system/LocationInfo* files has worked for me. But I have to add the
disclaimer that I'm definitely a Cassandra newbie!

Cheers!
Shyamal

Reply via email to