FWIW, I'm working on migrating a large amount of data out of Oracle into my
test cluster. The data has been warehoused as CSV files on Amazon S3. Having
that in place allows me to not put extra load on the production service when
doing many repeated tests. I then parse the data using CSV Python module
and, as Jonathan says, use threads to batch upload data into Cassandra.
Notable points: since the data is relatively sparse (i.e. many zeros for
integers and empty strings for strings etc), I establish a default value
dictionary, and don't write these to Cassandra at all -- they can be
reconstructed as needed when reading back.

Also, make sure you wrap Cassandra writes etc into exceptions. When load is
high, you might get timeouts at TSocket level etc.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Reply via email to