from:"vonnagy"

Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread vonnagy

Nitin, I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to jobs, one that uses a Kafka stream and one that uses just the KafkaRDD. With the KafkaRDD, I continually get the "Failed to get records". I have adjusted the polling with `spark.streaming.kafka.consumer.poll.ms` a

Memory leak warnings in Spark 2.0.1

2016-10-12 Thread vonnagy

I am getting excessive memory leak warnings when running multiple mapping and aggregations and using DataSets. Is there anything I should be looking for to resolve this or is this a known issue? WARN [Executor task launch worker-0] org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory f

Submit job with driver options in Mesos Cluster mode

2016-10-06 Thread vonnagy

I am trying to submit a job to spark running in a Mesos cluster. We need to pass custom java options to the driver and executor for configuration, but the driver task never includes the options. Here is an example submit. GC_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStam

Committing Kafka offsets when using DirectKafkaInputDStream

2016-09-02 Thread vonnagy

I have upgrading to Spark 2.0 and am experimenting with using Kafka 0.10.0. I have a stream that I extract the data and would like to update the Kafka offsets as each partition is handled. With Spark 1.6 or Spark 2.0 and Kafka 0.8.2 I was able to update the offsets, but now there seems no way to do

Partitioned parquet files missing partition columns from data

2016-01-15 Thread vonnagy

When writing a DataFrame into partitioned parquet files, the partition columns are removed from the data. For example: df.write.mode(SaveMode.Append).partitionBy('year','month','day', 'hour').parquet(somePath) This creates a directory structure like: events |-> 2016 |-> 1 |-> 15

Re: Calling stop on StreamingContext locks up

2015-11-08 Thread vonnagy

Hi Ted, Your fix addresses the issue for me. Thanks again for your help and I saw the PR you submitted to Master. Ivan -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Calling-stop-on-StreamingContext-locks-up-tp15063p15073.html Sent from the Apache S

Calling stop on StreamingContext locks up

2015-11-07 Thread vonnagy

If I have a streaming job (Spark 1.5.1) and attempt to stop the stream after the first batch, the system locks up and never completes. The pseudo code below shows that after the batch complete notification is called the stream is stopped. I have traced the lockup to the call `listener.stop()`in Job

Streaming and storing to Google Cloud Storage or S3

2015-10-17 Thread vonnagy

I have a streaming application that reads from Kakfa (direct stream) and then write parquet files. It is a pretty simple app that gets a Kafka direct stream (8 partitions) and then calls `stream.foreachRdd` and then stores to parquet using a Dataframe. Batch intervals are set to 10 seconds. During

Re: Continuous warning while consuming using new kafka-spark010 API

Memory leak warnings in Spark 2.0.1

Submit job with driver options in Mesos Cluster mode

Committing Kafka offsets when using DirectKafkaInputDStream

Partitioned parquet files missing partition columns from data

Re: Calling stop on StreamingContext locks up

Calling stop on StreamingContext locks up

Streaming and storing to Google Cloud Storage or S3

8 matches

Site Navigation

Mail list logo

Footer information