Cool. What about performance? e.g. how many record for how long? On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing <xu_zhong_x...@163.com> wrote:
> For Java driver, there is no special API actually, just > > ResultSet rs = session.execute("select * from ..."); > for (Row r : rs) { > ... > } > > For Spark, the code skeleton is: > > val rdd = sc.cassandraTable("ks", "table") > > then call various standard Spark API to process the table parallelly. > > I have not used CqlInputFormat. > > At 2015-01-28 13:38:20, "Shenghua(Daniel) Wan" <wansheng...@gmail.com> > wrote: > > Hi, Zhongxing, > I am also interested in your table size. I am trying to dump 10s Million > record data from C* using map-reduce related API like CqlInputFormat. > You mentioned about Java driver. Could you suggest any API you used? > Thanks. > > On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing <xu_zhong_x...@163.com> > wrote: > >> Both Java driver "select * from table" and Spark sc.cassandraTable() work >> well. >> I use both of them frequently. >> >> At 2015-01-28 04:06:20, "Mohammed Guller" <moham...@glassbeam.com> wrote: >> >> Hi – >> >> >> >> Over the last few weeks, I have seen several emails on this mailing list >> from people trying to extract all data from C*, so that they can import >> that data into other analytical tools that provide much richer analytics >> functionality than C*. Extracting all data from C* is a full-table scan, >> which is not the ideal use case for C*. However, people don’t have much >> choice if they want to do ad-hoc analytics on the data in C*. >> Unfortunately, I don’t think C* comes with any built-in tools that make >> this task easy for a large dataset. Please correct me if I am wrong. Cqlsh >> has a COPY TO command, but it doesn’t really work if you have a large >> amount of data in C*. >> >> >> >> I am aware of couple of approaches for extracting all data from a table >> in C*: >> >> 1) Iterate through all the C* partitions (physical rows) using the >> Java Driver and CQL. >> >> 2) Extract the data directly from SSTables files. >> >> >> >> Either approach can be used with Hadoop or Spark to speed up the >> extraction process. >> >> >> >> I wanted to do a quick survey and find out how many people on this >> mailing list have successfully used approach #1 or #2 for extracting large >> datasets (terabytes) from C*. Also, if you have used some other techniques, >> it would be great if you could share your approach with the group. >> >> >> >> Mohammed >> >> >> >> > > > -- > > Regards, > Shenghua (Daniel) Wan > > -- Regards, Shenghua (Daniel) Wan