Hi Akhil, Thank you for the pointers. Below is how we are saving data to Cassandra.
javaFunctions(rddToSave).writerBuilder(datapipelineKeyspace, datapipelineOutputTable, mapToRow(Sample.class)) The data we are saving at this stage is ~200 million rows. How do we control application threads in spark so that it does not exceed " rpc_max_threads"? We are running with default value of this property in cassandra.yaml. I have already set these two properties for Spark-Cassandra connector: spark.cassandra.output.batch.size.rows=1 spark.cassandra.output.concurrent.writes=1 Thanks - Ankur On Sun, Jan 11, 2015 at 10:16 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > I see, can you paste the piece of code? Its probably because you are > exceeding the number of connection that are specified in the > property rpc_max_threads. Make sure you close all the connections properly. > > Thanks > Best Regards > > On Mon, Jan 12, 2015 at 7:45 AM, Ankur Srivastava < > ankur.srivast...@gmail.com> wrote: > >> Hi Akhil, thank you for your response. >> >> Actually we are first reading from cassandra and then writing back after >> doing some processing. All the reader stages succeed with no error and many >> writer stages also succeed but many fail as well. >> >> Thanks >> Ankur >> >> On Sat, Jan 10, 2015 at 10:15 PM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Just make sure you are not connecting to the Old RPC Port (9160), new >>> binary port is running on 9042. >>> >>> What is your rpc_address listed in cassandra.yaml? Also make sure you >>> have start_native_transport: *true *in the yaml file. >>> >>> Thanks >>> Best Regards >>> >>> On Sat, Jan 10, 2015 at 8:44 AM, Ankur Srivastava < >>> ankur.srivast...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> We are currently using spark to join data in Cassandra and then write >>>> the results back into Cassandra. While reads happen with out any error >>>> during the writes we see many exceptions like below. Our environment >>>> details are: >>>> >>>> - Spark v 1.1.0 >>>> - spark-cassandra-connector-java_2.10 v 1.1.0 >>>> >>>> We are using below settings for the writer >>>> >>>> spark.cassandra.output.batch.size.rows=1 >>>> >>>> spark.cassandra.output.concurrent.writes=1 >>>> >>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>> host(s) tried for query failed (tried: [] - use getErrors() for details) >>>> >>>> at >>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>> >>>> at >>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> Thanks >>>> >>>> Ankur >>>> >>> >>> >> >