Read the csv file using a Java app and then index the rows using the Cassandra Java driver with multiple, parallel input streams.
Oh, and make sure to provision your cluster with enough nodes to handle your desired ingestion and query rates. Do a proof of concept with a six node cluster with RF=2 to see what ingestion and query rates you can get for a fraction of your data and then scale from there. Although a 12-node cluster with RF=3 would be more realistic. RF=2 is not for production – doesn’t permit any failures, while RF=3 permits quorum operations with a single node failure. But RF=2 at least lets you test with a more realistic scenario of coordinator nodes and inter-node traffic. And if your total row count does manage to fit on one machine (or three nodes with RF=3), at least make sure you have enough CPU cores and I/O bandwidth to handle your desired ingestion and query rate. -- Jack Krupansky From: Akshay Ballarpure Sent: Friday, July 25, 2014 5:26 AM To: user@cassandra.apache.org Subject: read huge data from CSV and write into Cassandra How to read data from large CSV file which is having 100+ columns and millions of rows and inserting into Cassandra every 1 minute. Thanks & Regards Akshay Ghanshyam Ballarpure Tata Consultancy Services Cell:- 9985084075 Mailto: akshay.ballarp...@tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Consulting ____________________________________________ =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you