Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread vonnagy
Here are some examples and details of the scenarios. The KafkaRDD is the most error prone to polling timeouts and concurrentm modification errors. *Using KafkaRDD* - This takes a list of channels and processes them in parallel using the KafkaRDD directly. they all use the same consumer group ('st

Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread vonnagy
I am getting the issues using Spark 2.0.1 and Kafka 0.10. I have two jobs, one that uses a Kafka stream and one that uses just the KafkaRDD. With the KafkaRDD, I continually get the "Failed to get records .. after polling". I have adjusted the polling with `spark.streaming.kafka.consumer.poll.ms`

Re: Submit job with driver options in Mesos Cluster mode

2016-10-27 Thread vonnagy
We were using 1.6, but now we are on 2.0.1. Both versions show the same issue. I dove deep into the Spark code and have identified that the extra java options are /not/ added to the process on the executors. At this point, I believe you have to use spark-defaults.conf to set any values that will b

Memory leak warnings in Spark 2.0.1

2016-10-12 Thread vonnagy
I am getting excessive memory leak warnings when running multiple mapping and aggregations and using DataSets. Is there anything I should be looking for to resolve this or is this a known issue? WARN [Executor task launch worker-0] org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f

Submit job with driver options in Mesos Cluster mode

2016-10-06 Thread vonnagy
I am trying to submit a job to spark running in a Mesos cluster. We need to pass custom java options to the driver and executor for configuration, but the driver task never includes the options. Here is an example submit. GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStam