Ingesting Large Number of files

2015-11-17 Thread Tushar Agrawal
We get periodic bulk load (twice a month) in form of delimited data files. We get about 10K files with average size of 50 MB. Each record is a row in Cassandra table. What is the best way to ingest data into cassandra in fastest possible way? Thank you, Tushar

Re: compression cpu overhead

2015-11-03 Thread Tushar Agrawal
For writes it's negligible. For reads it makes a significant difference for high tps and low latency workload. You would see up to 3x higher cpu with LZ4 vs no compression. It would be different for different h/w configurations. Thanks, Tushar (Sent from iPhone) > On Nov 3, 2015, at 5:51 PM, D

Re: Find partition row of Compacted partition maximum bytes

2015-10-26 Thread Tushar Agrawal
Toppartions provide the most active partitions. I am trying to do same thing. I was able to narrow down the largest partition by looking at warning in system.log. Given that I have the key, how to see the entire data for that key? Thanks, Tushar > On Oct 26, 2015, at 4:21 AM, DuyHai Doan w

Re: Is HEAP_NEWSIZE configuration is no more useful from cassandra 2.1 ?

2015-10-04 Thread Tushar Agrawal
If you are using CMS garbage collector then you still have to set the HEAP_NEWSIZE. With G1GC (new recommended GC) there is no concept of New or Older generation. On Sun, Oct 4, 2015 at 5:30 PM, Kiran mk wrote: > Is HEAP_NEWSIZE configuration is no more useful from cassandra 2.1 ? > > Best Regar

Re: Cassandra certification

2015-10-01 Thread Tushar Agrawal
Check this out: Get Trained, Get Certified, Get Better Paid https://www.linkedin.com/pulse/get-trained-certified-better-paid-tushar-agrawal Thanks, Tushar (Sent from iPhone) > On Oct 1, 2015, at 9:00 AM, Fernandez Gomara, Ruben (CCI-Atlanta) > wrote: > > Hi, > Did anybody too