Re: Re: full-tabe scan - extracting all data from C*

Shenghua(Daniel) Wan Tue, 27 Jan 2015 22:44:51 -0800

Cool. What about performance? e.g. how many record for how long?

On Tue, Jan 27, 2015 at 10:16 PM, Xu Zhongxing <xu_zhong_x...@163.com>
wrote:


> For Java driver, there is no special API actually, just
>
> ResultSet rs = session.execute("select * from ...");
> for (Row r : rs) {
>    ...
> }
>
> For Spark, the code skeleton is:
>
> val rdd = sc.cassandraTable("ks", "table")
>
> then call various standard Spark API to process the table parallelly.
>
> I have not used CqlInputFormat.
>
> At 2015-01-28 13:38:20, "Shenghua(Daniel) Wan" <wansheng...@gmail.com>
> wrote:
>
> Hi, Zhongxing,
> I am also interested in your table size. I am trying to dump 10s Million
> record data from C* using map-reduce related API like CqlInputFormat.
> You mentioned about Java driver. Could you suggest any API you used?
> Thanks.
>
> On Tue, Jan 27, 2015 at 5:33 PM, Xu Zhongxing <xu_zhong_x...@163.com>
> wrote:
>
>> Both Java driver "select * from table" and Spark sc.cassandraTable() work
>> well.
>> I use both of them frequently.
>>
>> At 2015-01-28 04:06:20, "Mohammed Guller" <moham...@glassbeam.com> wrote:
>>
>>  Hi –
>>
>>
>>
>> Over the last few weeks, I have seen several emails on this mailing list
>> from people trying to extract all data from C*, so that they can import
>> that data into other analytical tools that provide much richer analytics
>> functionality than C*. Extracting all data from C* is a full-table scan,
>> which is not the ideal use case for C*. However, people don’t have much
>> choice if they want to do ad-hoc analytics on the data in C*.
>> Unfortunately, I don’t think C* comes with any built-in tools that make
>> this task easy for a large dataset. Please correct me if I am wrong. Cqlsh
>> has a COPY TO command, but it doesn’t really work if you have a large
>> amount of data in C*.
>>
>>
>>
>> I am aware of couple of approaches for extracting all data from a table
>> in C*:
>>
>> 1)      Iterate through all the C* partitions (physical rows) using the
>> Java Driver and CQL.
>>
>> 2)      Extract the data directly from SSTables files.
>>
>>
>>
>> Either approach can be used with Hadoop or Spark to speed up the
>> extraction process.
>>
>>
>>
>> I wanted to do a quick survey and find out how many people on this
>> mailing list have successfully used approach #1 or #2 for extracting large
>> datasets (terabytes) from C*. Also, if you have used some other techniques,
>> it would be great if you could share your approach with the group.
>>
>>
>>
>> Mohammed
>>
>>
>>
>>
>
>
> --
>
> Regards,
> Shenghua (Daniel) Wan
>
>


-- 

Regards,
Shenghua (Daniel) Wan

Re: Re: full-tabe scan - extracting all data from C*

Reply via email to