Spark Streaming: Some issues (Could not compute split, block —— not found) and questions

jlg Wed, 19 Aug 2015 07:52:52 -0700

Some background on what we're trying to do:

We have four Kinesis receivers with varying amounts of data coming through
them. Ultimately we work on a unioned stream that is getting about 11
MB/second of data. We use a batch size of 5 seconds.


We create four distinct DStreams from this data that have different
aggregation computations (various combinations of
map/flatMap/reduceByKeyAndWindow and then finishing by serializing the
records to JSON strings and writing them to S3). We want to do 30 minute
windows of computations on this data, to get a better compression rate for
the aggregates (there are a lot of repeated keys across this time frame, and
we want to combine them all -- we do this using reduceByKeyAndWindow). 

But even when trying to do 5 minute windows, we have issues with "Could not
compute split, block —— not found". This is being run on a YARN cluster and
it seems like the executors are getting killed even though they should have
plenty of memory. 

Also, it seems like no computation actually takes place until the end of the
window duration. This seems inefficient if there is a lot of data that you
know is going to be needed for the computation. Is there any good way around
this?

There are some of the configuration settings we are using for Spark:

spark.executor.memory=26000M,\
spark.executor.cores=4,\
spark.executor.instances=5,\
spark.driver.cores=4,\
spark.driver.memory=24000M,\
spark.default.parallelism=128,\
spark.streaming.blockInterval=100ms,\
spark.streaming.receiver.maxRate=20000,\
spark.akka.timeout=300,\
spark.storage.memoryFraction=0.6,\
spark.rdd.compress=true,\
spark.executor.instances=16,\
spark.serializer=org.apache.spark.serializer.KryoSerializer,\
spark.kryoserializer.buffer.max=2047m,\


Is this the correct way to do this, and how can I further debug to figure
out this issue? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Some-issues-Could-not-compute-split-block-not-found-and-questions-tp24342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming: Some issues (Could not compute split, block —— not found) and questions

Reply via email to