What is this Input Size in Spark Application Detail UI?

2015-09-25 Thread Chirag Dewan
Hi All, I was wondering what does the Input Size in Application UI mean? For my 3 node Cassandra Cluster, with 3 node Spark Cluster this size is 32GB. For my 15 node Cassandra Cluster, with 15 node Spark Cluster this size reaches 172GB. Though the data in both clusters is about same volume. C

Output files of saveAsText are getting stuck in temporary directory

2015-09-04 Thread Chirag Dewan
Hi, I have a 2 node Spark cluster and I am trying to read data from a Cassandra cluster and save the data as CSV file. Here is my code: JavaRDD mapPair = cachedRdd.map(new Function() { /** *

RE: Output files of saveAsText are getting stuck in temporary directory

2015-09-04 Thread Chirag Dewan
Yes. The driver has successfully stopped. All the shutdown is succeeded without any errors in logs. I am using spark 1.4.1 with Cassandra 2.0.14. Chirag -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Friday, September 04, 2015 3:23 PM To: Chirag Dewan Cc: user

RE: Output files of saveAsText are getting stuck in temporary directory

2015-09-07 Thread Chirag Dewan
Hi, Any idea about this? I am still facing this issue. thanks, Chirag -Original Message- From: Chirag Dewan [mailto:chirag.de...@ericsson.com] Sent: Friday, September 04, 2015 3:26 PM To: Sean Owen Cc: user@spark.apache.org Subject: RE: Output files of saveAsText are getting stuck in

Cassandra row count grouped by multiple columns

2015-09-10 Thread Chirag Dewan
Hi, I am using Spark 1.2.0 with Cassandra 2.0.14. I have a problem where I need a count of rows unique to multiple columns. So I have a column family with 3 columns i.e. a,b,c and for each value of distinct a1,b1,c1 I want the row count. For eg: A1,B1,C1 A2,B2,C2 A3,B3,C2 A1,B1,C1 The output

Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Chirag Dewan
Hi, I am using Spark to access around 300m rows in Cassandra. My job is pretty simple as I am just mapping my row into a CSV format and saving it as a text file. public String call(CassandraRow row) throws Excepti

RE: Why is 1 executor overworked and other sit idle?

2015-09-22 Thread Chirag Dewan
, 2015 5:39 AM To: Ted Yu Cc: User; Chirag Dewan Subject: Re: Why is 1 executor overworked and other sit idle? If there's only one partition, by definition it will only be handled by one executor. Repartition to divide the work up. Note that this will also result in multiple output files, ho

Exception during SaveAstextFile Stage

2015-09-24 Thread Chirag Dewan
Hi, I have 2 stages in my job map and save as text file. During the save text file stage I am getting an exception : 15/09/24 15:38:16 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.