Use Spark to distribute the job of copying data all over the cluster and
help accelerating the migration. The Spark connector does auto paging in
the background with the Java Driver
Le 22 oct. 2015 11:03, "qihuang.zheng" <qihuang.zh...@fraudmetrix.cn> a
écrit :

> I tried using java driver with *auto paging query: setFetchSize* instead
> of token function. as Cass has this feature already.
> ref from here:
> http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
>
> But I tried in test envrionment with only 1million data read then insert 3
> tables, It’s too slow.
> After running 20 min, Exception like NoHostAvailableException happen,
> offcourse data did’t sync completed.
> And our product env has nearly 25 billion data. which is unacceptble for
> this case. It’s there other ways?
>
> ------------------------------
> Thanks & Regards,
> qihuang.zheng
>
>  原始邮件
> *发件人:* Jeff Jirsa<jeff.ji...@crowdstrike.com>
> *收件人:* user@cassandra.apache.org<user@cassandra.apache.org>
> *发送时间:* 2015年10月22日(周四) 13:52
> *主题:* Re: C* Table Changed and Data Migration with new primary key
>
> Because the data format has changed, you’ll need to read it out and write
> it back in again.
>
> This means using either a driver (java, python, c++, etc), or something
> like spark.
>
> In either case, split up the token range so you can parallelize it for
> significant speed improvements.
>
>
>
> From: "qihuang.zheng"
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, October 21, 2015 at 6:18 PM
> To: user
> Subject: C* Table Changed and Data Migration with new primary key
>
> Hi All:
>
>   We have a table defined only one partition key and some cluster key.
> CREATE TABLE test1 (
>   attribute text,
>   partner text,
>   app text,
>   "timestamp" bigint,
>   event text,
>   PRIMARY KEY ((attribute), partner, app, "timestamp")
> )
> And now we want to split  original test1 table to 3 tables like this:
> test_global :  PRIMARY KEY ((attribute), “timestamp")
> test_partner:  PRIMARY KEY ((attribute, partner), "timestamp”)
> test_app:       PRIMARY KEY ((attribute, partner, app), “timestamp”)
>
> Why we split original table because when query *global data* by timestamp
> desc like this:
> select * from test1 where attribute=? order by timestamp desc
> is not support in Cass. As class order by support should use all
> clustering key.
> But sql like this:
> select * from test1 where attribute=? order by partner desc,app desc,
> timestamp desc
> can’t query the right global data by ts desc.
> After Split table we could do globa data query right: select * from
> test_global where attribute=? order by timestamp desc.
>
> Now we have a problem of* data migration*.
> As I Know, *sstableloader* is the most easy way,but could’t deal with
> different table name. (Am I right?)
> And *cp* cmd in cqlsh can’t fit our situation because our data is two
> large. (10Nodes, one nodes has 400G data)
> I alos try JavaAPI by query the origin table and then insert into 3
> different splited table.But seems too slow
>
> Any Solution aboult quick data migration?
> TKS!!
>
> PS: Cass version: 2.0.15
>
>
>
> ------------------------------
> Thanks & Regards,
> qihuang.zheng
>

Reply via email to