Re: Any Bulk Load on Large Data Set Advice?

2016-11-17 Thread Jeff Jirsa
certain parts of our workflow, and it works well.   From: Joe Olson Reply-To: "user@cassandra.apache.org" Date: Thursday, November 17, 2016 at 5:58 AM To: "user@cassandra.apache.org" Subject: Any Bulk Load on Large Data Set Advice? I received a grant to do some anal

Re: Any Bulk Load on Large Data Set Advice?

2016-11-17 Thread Ben Bromhead
example to generate the > SSTables, but I have not executed it on the entire data set yet. > > Any advice on how to execute the bulk load under this configuration? Can > I generate the SSTables in parallel? Once generated, can I write the > SSTables to all nodes simultaneously? Should I be

Re: Any Bulk Load on Large Data Set Advice?

2016-11-17 Thread Jonathan Haddad
uilt and tested a bulk loader following this example in GitHub: > https://github.com/yukim/cassandra-bulkload-example to generate the > SSTables, but I have not executed it on the entire data set yet. > > Any advice on how to execute the bulk load under this configuration? Can > I gene

Any Bulk Load on Large Data Set Advice?

2016-11-17 Thread Joe Olson
" I built and tested a bulk loader following this example in GitHub: https://github.com/yukim/cassandra-bulkload-example to generate the SSTables, but I have not executed it on the entire data set yet. Any advice on how to execute the bulk load under this configuration? Can I generate the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
Splitting the batches by partition key and inserting them with a TokenAware policy is already possible with existing driver code, though you will have to split the batches yourself. On Fri, Dec 5, 2014 at 3:12 PM, Dong Dai wrote: > Err, am i misunderstanding something? > I thought Tyler is going

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
Err, am i misunderstanding something? I thought Tyler is going to add some codes to split unlogged batch and make the batch insertion token aware. it is already done? or else i can do it too. thanks, - Dong > On Dec 5, 2014, at 2:06 PM, Philip Thompson > wrote: > > What progress are you try

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
What progress are you trying to be aware of? All of the features Tyler discussed are implemented and can be used. On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai wrote: > > On Dec 5, 2014, at 11:23 AM, Tyler Hobbs wrote: > > > On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai wrote: > >> Sounds great! By the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs wrote: > > > On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai > wrote: > Sounds great! By the way, will you create a ticket for this, so we can follow > the updates? > > What would the ticket be for? (I might have missed somethi

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai wrote: > Sounds great! By the way, will you create a ticket for this, so we can > follow the updates? What would the ticket be for? (I might have missed something in the conversation.) -- Tyler Hobbs DataStax

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
> On Dec 4, 2014, at 1:46 PM, Tyler Hobbs wrote: > > > On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai > wrote: > As we already did what coordinators do in client side, why don’t we do one > step more: > break the UNLOGGED batch statements into several small batch statem

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Shane Hansen
I'd be really interested to know what sort of performance or load improvements you see by doing client side partitioning. Please post back some results if you've tried that strategy. On Thu, Dec 4, 2014 at 11:46 AM, Tyler Hobbs wrote: > > On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai wrote: > >> As

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Thu, Dec 4, 2014 at 11:50 AM, Dong Dai wrote: > As we already did what coordinators do in client side, why don’t we do one > step more: > break the UNLOGGED batch statements into several small batch statements, > each of which contains > the statements with the same partition key. And send the

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Dong Dai
> On Dec 4, 2014, at 11:37 AM, Tyler Hobbs wrote: > > > On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai > wrote: > > 1) except I am using TokenAwarePolicy, the async insert also can not be sent > to > the right coordinator. > > Yes. Of course, TokenAwarePolicy can w

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-04 Thread Tyler Hobbs
On Wed, Dec 3, 2014 at 11:02 PM, Dong Dai wrote: > > 1) except I am using TokenAwarePolicy, the async insert also can not be > sent to > the right coordinator. > Yes. Of course, TokenAwarePolicy can wrap any other policy. > > 2) the TokenAwarePolicy actually is doing the job that coordinators

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-03 Thread Dong Dai
ing to my knowledge, BATCH statement packs several >>>> mutations into one RPC to save time. Similarly, Bulk Loader also pack >> all >>>> the mutations as a SSTable file and (I think) may be able to save lot of >>>> time too. >>>> >>>

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Robert Coli
On Mon, Dec 1, 2014 at 12:10 PM, Dong Dai wrote: > I guess you mean that BulkLoader is done by streaming whole SSTable to > remote servers, so it is faster? > Well, it's not exactly "whole SSTable" but yes, that's the sort of statement I'm making. [1] > The documentation says that all the rows

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Dong Dai
Thanks Rob, I guess you mean that BulkLoader is done by streaming whole SSTable to remote servers, so it is faster? The documentation says that all the rows in the SSTable will be inserted into the new cluster conforming to the replication strategy of that cluster. This gives me a felling tha

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-01 Thread Robert Coli
On Sun, Nov 30, 2014 at 8:44 PM, Dong Dai wrote: > The question is can I expect a better performance using the BulkLoader > this way comparing with using Batch insert? > You just asked if writing once (via streaming) is likely to be significantly more efficient than writing twice (once to the co

Performance Difference between Batch Insert and Bulk Load

2014-11-30 Thread Dong Dai
Hi, all, I have a performance question about the batch insert and bulk load. According to the documents, to import large volume of data into Cassandra, Batch Insert and Bulk Load can both be an option. Using batch insert is pretty straightforwards, but there have not been an ‘official’ way

Bulk Load Hadoop to Cassandra

2014-11-05 Thread Vijay Kadel
Hi, I intend to bulk load data from HDFS to Cassandra using a map-only program which uses the BulkOutputFormat class. Please advise me which versions of Cassandra and Hadoop would support such a use-case. I am using Hadoop 2.2.0 and Cassandra 2.0.6 and I am getting following error: Error

Re: Bulk load in cassandra

2014-08-27 Thread Robert Coli
on Cassandra cluster or you can say bulk load. How can I achieve that. > Please help me out. > http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra or CQLsh "COPY" but beware that COPY is capable of timing out in the current implementation. =Rob

Re: Bulk load in cassandra

2014-08-27 Thread Mark Reddy
installed Cassandra on one node successfully using CLI I am able to add > a table to the keyspace as well as retrieve the data from the table. My > query is if I have text file on my local file system and I want to load on > Cassandra cluster or you can say bulk load. How can I achieve that

Re: Bulk load in cassandra

2014-08-27 Thread baskar.duraikannu
Please try COPY command via CQL shell if it is delimited file. Regards, Baskar Duraikannu -Original Message- From: Malay Nilabh Date: Wed, 27 Aug 2014 17:43:21 To: user@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: Bulk load in cassandra Hi I installed Cassandra

Re: Bulk load in cassandra

2014-08-27 Thread Umang Shah
able to add > a table to the keyspace as well as retrieve the data from the table. My > query is if I have text file on my local file system and I want to load on > Cassandra cluster or you can say bulk load. How can I achieve that. Please > help me out. > > > > Regards >

Bulk load in cassandra

2014-08-27 Thread Malay Nilabh
Hi I installed Cassandra on one node successfully using CLI I am able to add a table to the keyspace as well as retrieve the data from the table. My query is if I have text file on my local file system and I want to load on Cassandra cluster or you can say bulk load. How can I achieve that

Re: how to get column family details dynamically in cassandra bulk load program

2013-05-09 Thread aaron morton
The schema is available over the various interfaces, check with the client you are using to see it exposes the information. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 8/05/2013, at 1:36 AM, chandana.tumm...@wip

how to get column family details dynamically in cassandra bulk load program

2013-05-07 Thread chandana.tummala
Dear All, I am using cassandra bulkload program from www.datastax.com/dev/blog/bulk-loading‎ In This for CSV entry we are giving column name and validation class . Is there any way to get the column names and validation class directly from database by giving just keyspace and column family name

Re: bulk load problem

2012-07-09 Thread Pushpalanka Jayawardhana
directory as a parameter for sstableloader > > bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/cf0 > > Yuki > > On Tuesday, June 26, 2012 at 7:07 PM, James Pirz wrote: > > Dear all, > > I am trying to use "sstableloader" in cassandra 1.1.1, to bulk

Re: bulk load problem

2012-07-09 Thread Yuki Morishita
ableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/cf0 Yuki On Tuesday, June 26, 2012 at 7:07 PM, James Pirz wrote: > Dear all, > > I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some data > into a single node cluster. > I am running the following

Re: bulk load problem

2012-07-09 Thread Brian Jeltema
I couldn't get the same-host sstableloader to work either. But it's easier to use the JMX bulk-load hook that's built into Cassandra anyway. The following is what I implemented to do this: import java.io.IOException; import java.util.HashMap; import java.util.Map; import javax

Re: bulk load problem

2012-07-09 Thread Pushpalanka Jayawardhana
On 27/06/2012, at 12:07 PM, James Pirz wrote: > > Dear all, > > I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some > data into a single node cluster. > I am running the following command: > > bin/sstableloader -d 192.168.100.1 /data/ssTable/t

Re: bulk load glitch

2012-07-02 Thread Brandon Williams
On Mon, Jul 2, 2012 at 10:35 AM, Brian Jeltema wrote: > I can't tell whether the bulk load process recovered from the transient dead > node, or whether I need to start over. > > Does anybody know? You need to start over if the failure detector tripped, but it will retry a few

Re: bulk load problem

2012-07-02 Thread aaron morton
Do you have the full stack ? It will include a cause. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/06/2012, at 12:07 PM, James Pirz wrote: > Dear all, > > I am trying to use "sstableloader" in cassandra 1.1.1

bulk load glitch

2012-07-02 Thread Brian Jeltema
I'm attempting to perform a bulk load by calling the jmx:bulkLoad method on several nodes in parallel. In a Casssandra log file I see a few occurrences of the following: INFO [GossipTasks:1] 2012-07-02 10:12:33,626 Gossiper.java (line 759) InetAddress /10.4.0.3 is now dead. ERROR [Gossip

Re: bulk load problem

2012-06-27 Thread James Pirz
ue, 26 Jun 2012 17:07:49 -0700 от James Pirz : > > Dear all, > > I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some > data into a single node cluster. > I am running the following command: > > bin/sstableloader -d 192.168.100.1 /data/ssTabl

Re: bulk load problem

2012-06-27 Thread Nury Redjepow
What is your yaml setting for rpc and listen server on destination node? Nury Tue, 26 Jun 2012 17:07:49 -0700 от James Pirz : Dear all, I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following com

bulk load problem

2012-06-26 Thread James Pirz
Dear all, I am trying to use "sstableloader" in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from "another" node (other than the node on which cassandra is

RE: bulk load

2011-06-30 Thread Priyanka
I am working on Cassandra for last 4 weeks and am trying to load large amount of data.I am trying to use the Bulk loading technique but am not clear with the process.Could some explain the process for the bulk load? Also Is the new bulk loading utility discussed in the previous posts available

Re: bulk load

2011-06-22 Thread Jeremy Hanna
This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/browse/CASSANDRA-1278 0.8.1 is being voted on now and will hopefully be out in the next day or two. You can try it out with the 0.8-branch if you want - looking near the bottom of the comments on the ticke

RE: bulk load

2011-06-22 Thread Stephen Pope
Awesome, thanks! -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Wednesday, June 22, 2011 3:08 PM To: user@cassandra.apache.org Subject: Re: bulk load This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/b

bulk load

2011-06-22 Thread Stephen Pope
According to the README.txt in examples/bmt BinaryMemtable is being deprecated. What's the recommended way to do bulk loading? Cheers, Steve