Re: ClassCastException when using saveAsTextFile

2014-06-04 Thread Kanwaldeep
Hi Niko I'm having a similar problem with running the Sparks on standalone cluster. Any suggestions on how to fix this? The error is happening on using pairRDDFunction saveAsHadoopDataSet. java.lang.ClassCastException (java.lang.ClassCastException: cannot assign instance of org.apache.spark.rdd.P

Re: Writing data to HBase using Spark

2014-06-09 Thread Kanwaldeep
Please see sample code attached at https://issues.apache.org/jira/browse/SPARK-944. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-data-to-HBase-using-Spark-tp7304p7305.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

NullPointerException on reading checkpoint files

2014-06-09 Thread Kanwaldeep
I've been running into the following error on creating the streaming context using checkpoint files. This error occurs quite often when I stop and re-start the job. Any suggestions? 14-06-09 23:47:59 WARN CheckpointReader:91 - Error reading checkpoint from file file:/Users/kanwaldeep.dang/git/c

Re: Writing data to HBase using Spark

2014-06-12 Thread Kanwaldeep
We are running code in a cluster using Spark Standalone cluster and not seeing this issue. The new data is being saved to HBase. Can you check if you are getting any errors and the reading from Kafka actually stops? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabbl

Kafka Streaming - Error Could not compute split

2014-06-22 Thread Kanwaldeep
We are using Spark 1.0.0 deployed on Spark Standalone cluster and I'm getting the following exception. With previous version I've seen this error occur along with OutOfMemory errors which I'm not seeing with Sparks 1.0. Any suggestions? Job aborted due to stage failure: Task 3748.0:20 failed 4 ti

Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
We are using Sparks 1.0 for Spark Streaming on Spark Standalone cluster and seeing the following error. Job aborted due to stage failure: Task 3475.0:15 failed 4 times, most recent failure: Exception failure in TID 216394 on host hslave33102.sjc9.service-now.com: java.lang.Exception: Could

Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
We are using Sparks 1.0. I'm using DStream operations such as map, filter and reduceByKeyAndWindow and doing a foreach operation on DStream. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p1

Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
All the operations being done are using the dstream. I do read an RDD in memory which is collected and converted into a map and used for lookups as part of DStream operations. This RDD is loaded only once and converted into map that is then used on streamed data. Do you mean non streaming jobs on

Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
Not at all. Don't have any such code. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11231.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Streaming : Could not compute split, block not found

2014-08-01 Thread Kanwaldeep
Here is the log file. streaming.gz There are quite few AskTimeouts that have happening for about 2 minutes and then followed by block not found errors. Thanks Kanwal -- View this message in context: http://apach

Re: Problem with HBase external table on freshly created EMR cluster

2014-03-14 Thread Kanwaldeep
I'm getting the same error on writing data to HBase cluster using SPark Streaming. Any suggestions on how to fix this? 2014-03-14 23:10:33,832 ERROR o.a.s.s.scheduler.JobScheduler - Error running job streaming job 139486383 ms.0 org.apache.spark.SparkExceptio

Re: Problem with HBase external table on freshly created EMR cluster

2014-03-21 Thread Kanwaldeep
Seems like this could be a version mismatch issue between the HBase version deployed and the jars being used. Here are the details on the versions we have setup We are running CDH-4.6.0 (which includes hadoop 2.0.0), and the spark was compiled against that version. Below is environment variable

Using ProtoBuf 2.5 for messages with Spark Streaming

2014-03-27 Thread Kanwaldeep
We are using Protocol Buffer 2.5 to send messages to Spark Streaming 0.9 with Kafka stream setup. I have protocol Buffer 2.5 part of the uber jar deployed on each of the spark worker nodes. The message is compiled using 2.5 but then on runtime it is being de-serialized by 2.4.1 as I'm getting the

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Kanwaldeep
Yes I'm using akka as well. But if that is the problem then I should have been facing this issue in my local setup as well. I'm only running into this error on using the spark standalone cluster. But will try out your suggestion and let you know. Thanks Kanwal -- View this message in context:

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-01 Thread Kanwaldeep
I've removed the dependency on akka in a separate project but still running into the same error. In the POM Dependency Hierarchy I do see 2.4.1 - shaded and 2.5.0 being included. If there is a conflict with project dependency I would think I should be getting the same error in my local setup as wel

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-09 Thread Kanwaldeep
Any update on this? We are still facing this issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396p4015.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

KafkaInputDStream Stops reading new messages

2014-04-09 Thread Kanwaldeep
Spark Streaming job was running on two worker nodes and then there was an error on one of the nodes. The spark job showed running but no progress was being made and not processing any new messages. Based on the driver log files I see the following errors. I would expect the stream reading would b