So as long as jar is kept on s3 and available across different runs, then the
s3 checkpoint is working.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-checkpoint-against-s3-tp25068p25081.html
Sent from the Apache Spark User List mailing list
It looks like that reconstruction of SparkContext from checkpoint data is
trying to look for
the jar file of previous failed runs. It can not find the jar files as our
jar files are on local
machines and were cleaned up after each failed run.
--
View this message in context:
http://apac
Hi, I am trying to set spark streaming checkpoint to s3, here is what I did
basically
val checkpoint = "s3://myBucket/checkpoint"
val ssc = StreamingContext.getOrCreate(checkpointDir,
() =>
getStreamingContext(sparkJobName,
It turns out that our hdfs checkpoint failed, but spark streaming
is running and building up a long lineage ...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/updateStateByKey-and-stack-overflow-tp25015p25054.html
Sent from the Apache Spark User List maili
It turns out the mesos can overwrite the OS ulimit -n setting. So we have
increased the mesos slave ulimit -n setting.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Too-many-open-files-exception-on-reduceByKey-tp2462p25019.html
Sent from the Apache Spark U
Hi, I am following the spark streaming stateful application example and write
a simple counting application with updateStateByKey.
val keyStateStream = actRegBatchCountStream.updateStateByKey(update, new
HashPartitioner(ssc.sparkContext.defaultParallelism), true, initKeyStateRDD)
This runs for
Hi, I am following the spark streaming stateful application example to write
a stateful application
and here is the critical line of code.
val keyStateStream = actRegBatchCountStream.updateStateByKey(update, new
HashPartitioner(ssc.sparkContext.defaultParallelism), true, initKeyStateRDD)
I n
Key ID: 0xAF08DF8D
On Thu, Oct 8, 2015 at 3:22 PM, Tian Zhang wrote:
I hit this issue with spark 1.3.0 stateful application (with
updateStateByKey) function on mesos. It will
fail after running fine for about 24 hours.
The error stack trace as below, I checked ulimit -n and we have very large
numbe
I hit this issue with spark 1.3.0 stateful application (with
updateStateByKey) function on mesos. It will
fail after running fine for about 24 hours.
The error stack trace as below, I checked ulimit -n and we have very large
numbers set on the machines.
What else can be wrong?
15/09/27 18:45:11 W
Hi,
We have a scenario as below and would like your suggestion.
We have app.conf file with propX=A as default built into the fat jar file
that is provided to spark-submit
WE have env.conf file with propX=B that would like spark-submit to take as
input to overwrite the default and populate to both
I have found this paper seems to answer most of questions about life
duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf
Tian
On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha
wrote:
Hey Experts,
I wanted to understand in detail about the lifecycle
Hi, Dear Spark Streaming Developers and Users,
We are prototyping using spark streaming and hit the following 2 issues thatI
would like to seek your expertise.
1) We have a spark streaming application in scala, that reads data from Kafka
intoa DStream, does some processing and output a transfor
I am hitting the same issue, i.e., after running for some time, if spark
streaming job lost or timeout
kafka connection, it will just start to return empty RDD's ..
Is there a timeline for when this issue will be fixed so that I can plan
accordingly?
Thanks.
Tian
--
View this message in conte
We have narrowed this hanging issue down to the calliope package
that we used to create RDD from reading cassandra table.
The calliope native RDD interface seems hanging and I have decided to switch
to the calliope cql3 RDD interface.
--
View this message in context:
http://apache-spark-user-l
Hi, I am using the latest calliope library from tuplejump.com to create RDD
for cassandra table.
I am on a 3 nodes spark 1.1.0 with yarn.
My cassandra table is defined as below and I have about 2000 rows of data
inserted.
CREATE TABLE top_shows (
program_id varchar,
view_minute timestamp,
vi
Hi, I have spark 1.1.0 yarn installation. I am using spark-submit to run a
simple application.
>From the console output, I have 769 partitions and after task 768 in stage 0
>(count) finished,
it hangs. I used jstack to dump the stacktop and it shows it is waiting ...
Any suggestion what might go
I have figured out why I am getting this error:
We have a lot of data in kafka and the DStream from Kafka used
MEMROY_ONLY_SER,
so once the memory is low, spark started to discard data that is needed
later ...
So once I change to MEMORY_AND_DISK_SER, the error is gone.
Tian
--
View this messa
Hi, we are using spark 1.1.0 streaming and we are hitting this same issue.
Basically from the job output I saw the following things happen in sequence.
948 14/10/07 18:09:59 INFO storage.BlockManagerInfo: Added
input-0-1412705397200 in memory on ip-10-4-62-85.ec2.internal:59230 (size:
5.3 MB, fr
-CTP-U2-H2
Let us know how your testing goes.
Regards,
Rohit
Founder & CEO, Tuplejump, Inc.
www.tuplejump.comThe Data Engineering Platform
On Sat, Oct 4, 2014 at 3:49 AM, tian zhang wrote:
Hi, Rohit,
>
>
>Thank you for sharing this good news.
>
Hi, Rohit,
Thank you for sharing this good news.
I have some relevant issue that I would like to ask your help.
I am using spark 1.1.0 and I have a spark application using
"com.tuplejump"% "calliope-core_2.10"% "1.1.0-CTP-U2",
At runtime there are following errors that seem indicate that
calli
Hi, Spark experts,
I have the following issue when using aws java sdk in my spark application.
Here I narrowed down the following steps to reproduce the problem
1) I have Spark 1.1.0 with hadoop 2.4 installed on 3 nodes cluster
2) from the master node, I did the following steps.
spark-shell --
22 matches
Mail list logo