Nitin,
I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to
jobs, one that uses a Kafka stream and one that uses just the KafkaRDD.
With the KafkaRDD, I continually get the "Failed to get records". I have
adjusted the polling with `spark.streaming.kafka.consumer.poll.ms` a
I am getting excessive memory leak warnings when running multiple mapping and
aggregations and using DataSets. Is there anything I should be looking for
to resolve this or is this a known issue?
WARN [Executor task launch worker-0]
org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f
I am trying to submit a job to spark running in a Mesos cluster. We need to
pass custom java options to the driver and executor for configuration, but
the driver task never includes the options. Here is an example submit.
GC_OPTS="-XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCTimeStam
I have upgrading to Spark 2.0 and am experimenting with using Kafka 0.10.0. I
have a stream that I extract the data and would like to update the Kafka
offsets as each partition is handled. With Spark 1.6 or Spark 2.0 and Kafka
0.8.2 I was able to update the offsets, but now there seems no way to do
When writing a DataFrame into partitioned parquet files, the partition
columns are removed from the data.
For example:
df.write.mode(SaveMode.Append).partitionBy('year','month','day',
'hour').parquet(somePath)
This creates a directory structure like:
events
|-> 2016
|-> 1
|-> 15
Hi Ted,
Your fix addresses the issue for me. Thanks again for your help and I saw
the PR you submitted to Master.
Ivan
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Calling-stop-on-StreamingContext-locks-up-tp15063p15073.html
Sent from the Apache S
If I have a streaming job (Spark 1.5.1) and attempt to stop the stream after
the first batch, the system locks up and never completes. The pseudo code
below shows that after the batch complete notification is called the stream
is stopped. I have traced the lockup to the call `listener.stop()`in
Job
I have a streaming application that reads from Kakfa (direct stream) and then
write parquet files. It is a pretty simple app that gets a Kafka direct
stream (8 partitions) and then calls `stream.foreachRdd` and then stores to
parquet using a Dataframe. Batch intervals are set to 10 seconds. During