Hi, Zhongxing, I am also interested in your table size. I am trying to dump 10s Million record data from C* using map-reduce related API like CqlInputFormat. You mentioned about Java driver. Could you suggest any API you used? Thanks.
On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing <xu_zhong_x...@163.com> wrote: > Both Java driver "select * from table" and Spark sc.cassandraTable() work > well. > I use both of them frequently. > > At 2015-01-28 04:06:20, "Mohammed Guller" <moham...@glassbeam.com> wrote: > > Hi – > > > > Over the last few weeks, I have seen several emails on this mailing list > from people trying to extract all data from C*, so that they can import > that data into other analytical tools that provide much richer analytics > functionality than C*. Extracting all data from C* is a full-table scan, > which is not the ideal use case for C*. However, people don’t have much > choice if they want to do ad-hoc analytics on the data in C*. > Unfortunately, I don’t think C* comes with any built-in tools that make > this task easy for a large dataset. Please correct me if I am wrong. Cqlsh > has a COPY TO command, but it doesn’t really work if you have a large > amount of data in C*. > > > > I am aware of couple of approaches for extracting all data from a table in > C*: > > 1) Iterate through all the C* partitions (physical rows) using the > Java Driver and CQL. > > 2) Extract the data directly from SSTables files. > > > > Either approach can be used with Hadoop or Spark to speed up the > extraction process. > > > > I wanted to do a quick survey and find out how many people on this mailing > list have successfully used approach #1 or #2 for extracting large datasets > (terabytes) from C*. Also, if you have used some other techniques, it would > be great if you could share your approach with the group. > > > > Mohammed > > > > -- Regards, Shenghua (Daniel) Wan