unsubscribe

2017-04-13 Thread tian zhang

Re: Spark streaming checkpoint against s3

2015-10-15 Thread Tian Zhang
So as long as jar is kept on s3 and available across different runs, then the s3 checkpoint is working. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25081.html Sent from the Apache Spark User List mailing list

Re: Spark streaming checkpoint against s3

2015-10-14 Thread Tian Zhang
It looks like that reconstruction of SparkContext from checkpoint data is trying to look for the jar file of previous failed runs. It can not find the jar files as our jar files are on local machines and were cleaned up after each failed run. -- View this message in context: http://apac

Spark streaming checkpoint against s3

2015-10-14 Thread Tian Zhang
Hi, I am trying to set spark streaming checkpoint to s3, here is what I did basically val checkpoint = "s3://myBucket/checkpoint" val ssc = StreamingContext.getOrCreate(checkpointDir, () => getStreamingContext(sparkJobName,

Re: updateStateByKey and stack overflow

2015-10-13 Thread Tian Zhang
It turns out that our hdfs checkpoint failed, but spark streaming is running and building up a long lineage ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/updateStateByKey-and-stack-overflow-tp25015p25054.html Sent from the Apache Spark User List maili

Re: "Too many open files" exception on reduceByKey

2015-10-11 Thread Tian Zhang
It turns out the mesos can overwrite the OS ulimit -n setting. So we have increased the mesos slave ulimit -n setting. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Too-many-open-files-exception-on-reduceByKey-tp2462p25019.html Sent from the Apache Spark U

updateStateByKey and stack overflow

2015-10-10 Thread Tian Zhang
Hi, I am following the spark streaming stateful application example and write a simple counting application with updateStateByKey. val keyStateStream = actRegBatchCountStream.updateStateByKey(update, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initKeyStateRDD) This runs for

updateStateByKey and Partitioner

2015-10-09 Thread Tian Zhang
Hi, I am following the spark streaming stateful application example to write a stateful application and here is the critical line of code. val keyStateStream = actRegBatchCountStream.updateStateByKey(update, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initKeyStateRDD) I n

Re: "Too many open files" exception on reduceByKey

2015-10-09 Thread tian zhang
Key ID: 0xAF08DF8D On Thu, Oct 8, 2015 at 3:22 PM, Tian Zhang wrote: I hit this issue with spark 1.3.0 stateful application (with updateStateByKey) function on mesos.  It will fail after running fine for about 24 hours. The error stack trace as below, I checked ulimit -n and we have very large numbe

Re: "Too many open files" exception on reduceByKey

2015-10-08 Thread Tian Zhang
I hit this issue with spark 1.3.0 stateful application (with updateStateByKey) function on mesos. It will fail after running fine for about 24 hours. The error stack trace as below, I checked ulimit -n and we have very large numbers set on the machines. What else can be wrong? 15/09/27 18:45:11 W

how to pass configuration properties from driver to executor?

2015-04-30 Thread Tian Zhang
Hi, We have a scenario as below and would like your suggestion. We have app.conf file with propX=A as default built into the fat jar file that is provided to spark-submit WE have env.conf file with propX=B that would like spark-submit to take as input to overwrite the default and populate to both

Re: Lifecycle of RDD in spark-streaming

2014-11-26 Thread tian zhang
I have found this paper seems to answer most of questions about life duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf Tian On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha wrote: Hey Experts, I wanted to understand in detail about the lifecycle

2 spark streaming questions

2014-11-23 Thread tian zhang
Hi, Dear Spark Streaming Developers and Users, We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise. 1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a transfor

Re: spark streaming and the spark shell

2014-11-19 Thread Tian Zhang
I am hitting the same issue, i.e., after running for some time, if spark streaming job lost or timeout kafka connection, it will just start to return empty RDD's .. Is there a timeline for when this issue will be fixed so that I can plan accordingly? Thanks. Tian -- View this message in conte

Re: spark 1.1.0/yarn hang

2014-10-22 Thread Tian Zhang
We have narrowed this hanging issue down to the calliope package that we used to create RDD from reading cassandra table. The calliope native RDD interface seems hanging and I have decided to switch to the calliope cql3 RDD interface. -- View this message in context: http://apache-spark-user-l

spark 1.1.0 RDD and Calliope 1.1.0-CTP-U2-H2

2014-10-21 Thread Tian Zhang
Hi, I am using the latest calliope library from tuplejump.com to create RDD for cassandra table. I am on a 3 nodes spark 1.1.0 with yarn. My cassandra table is defined as below and I have about 2000 rows of data inserted. CREATE TABLE top_shows ( program_id varchar, view_minute timestamp, vi

spark 1.1.0/yarn hang

2014-10-14 Thread tian zhang
Hi, I have spark 1.1.0 yarn installation. I am using spark-submit to run a simple application. >From the console output, I have 769 partitions and after task 768 in stage 0 >(count) finished, it hangs. I used jstack to dump the stacktop and it shows it is waiting ... Any suggestion what might go

Re: Spark Streaming : Could not compute split, block not found

2014-10-09 Thread Tian Zhang
I have figured out why I am getting this error: We have a lot of data in kafka and the DStream from Kafka used MEMROY_ONLY_SER, so once the memory is low, spark started to discard data that is needed later ... So once I change to MEMORY_AND_DISK_SER, the error is gone. Tian -- View this messa

Re: Spark Streaming : Could not compute split, block not found

2014-10-07 Thread Tian Zhang
Hi, we are using spark 1.1.0 streaming and we are hitting this same issue. Basically from the job output I saw the following things happen in sequence. 948 14/10/07 18:09:59 INFO storage.BlockManagerInfo: Added input-0-1412705397200 in memory on ip-10-4-62-85.ec2.internal:59230 (size: 5.3 MB, fr

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-06 Thread tian zhang
-CTP-U2-H2 Let us know how your testing goes. Regards, Rohit Founder & CEO, Tuplejump, Inc. www.tuplejump.comThe Data Engineering Platform On Sat, Oct 4, 2014 at 3:49 AM, tian zhang wrote: Hi, Rohit, > > >Thank you for sharing this good news. >

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread tian zhang
Hi, Rohit, Thank you for sharing this good news. I have some relevant issue that I would like to ask your help. I am using spark 1.1.0 and I have a spark application using "com.tuplejump"% "calliope-core_2.10"% "1.1.0-CTP-U2", At runtime there are following errors that seem indicate that calli

Spark 1.1.0 (w/ hadoop 2.4) versus aws-java-sdk-1.7.2.jar

2014-09-19 Thread tian zhang
Hi, Spark experts, I have the following issue when using aws java sdk in my spark application. Here I narrowed down the following steps to reproduce the problem 1) I have Spark 1.1.0 with hadoop 2.4 installed on 3 nodes cluster 2) from the master node, I did the following steps. spark-shell --