Here are some examples and details of the scenarios. The KafkaRDD is the most
error prone to polling
timeouts and concurrentm modification errors.
*Using KafkaRDD* - This takes a list of channels and processes them in
parallel using the KafkaRDD directly. they all use the same consumer group
('st
I am getting the issues using Spark 2.0.1 and Kafka 0.10. I have two jobs,
one that uses a Kafka stream and one that uses just the KafkaRDD.
With the KafkaRDD, I continually get the "Failed to get records .. after
polling". I have adjusted the polling with
`spark.streaming.kafka.consumer.poll.ms`
We were using 1.6, but now we are on 2.0.1. Both versions show the same
issue.
I dove deep into the Spark code and have identified that the extra java
options are /not/ added to the process on the executors. At this point, I
believe you have to use spark-defaults.conf to set any values that will b
I am getting excessive memory leak warnings when running multiple mapping and
aggregations and using DataSets. Is there anything I should be looking for
to resolve this or is this a known issue?
WARN [Executor task launch worker-0]
org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f
I am trying to submit a job to spark running in a Mesos cluster. We need to
pass custom java options to the driver and executor for configuration, but
the driver task never includes the options. Here is an example submit.
GC_OPTS="-XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCTimeStam