Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2016-08-25 Thread Dibyendu Bhattacharya
Hi , Released latest version of Receiver based Kafka Consumer for Spark Streaming. Receiver is compatible with Kafka versions 0.8.x, 0.9.x and 0.10.x and All Spark Versions Available at Spark Packages : https://spark-packages.org/package/dibbhatt/kafka-spark-consumer Also at github : https://g

Re: Which committers care about Kafka?

2014-12-19 Thread Dibyendu Bhattacharya
Hi, Thanks to Jerry for mentioning the Kafka Spout for Trident. The Storm Trident has done the exact-once guarantee by processing the tuple in a batch and assigning same transaction-id for a given batch . The replay for a given batch with a transaction-id will have exact same set of tuples and re

Spark Streaming with Tachyon : Some findings

2015-05-07 Thread Dibyendu Bhattacharya
Dear All , I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block store . Primary reason for evaluating Tachyon is to find if Tachyon can solve the Spark BlockNotFoundException . In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs failed due to block not fo

Re: Spark Streaming with Tachyon : Some findings

2015-05-08 Thread Dibyendu Bhattacharya
-Dtachyon.worker.hierarchystore.level1.dirs.path=/mnt/tachyon -Dtachyon.worker.hierarchystore.level1.dirs.quota=50GB -Dtachyon.worker.allocate.strategy=MAX_FREE -Dtachyon.worker.evict.strategy=LRU Regards, Dibyendu On Thu, May 7, 2015 at 1:46 PM, Dibyendu Bhattacharya < dibyendu.bhatt

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-05-21 Thread Dibyendu Bhattacharya
m interface, is returning zero. > > On Mon, May 11, 2015 at 4:38 AM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > >> Just to follow up this thread further . >> >> I was doing some fault tolerant testing of Spark Streaming with Tachyon >>

Re: Spark Streaming with Tachyon : Data Loss on Receiver Failure due to WAL error

2015-09-25 Thread Dibyendu Bhattacharya
ne go about configuring spark streaming to use tachyon as its > place for storing checkpoints? Also, can one do this with tachyon running > on a completely different node than where spark processes are running? > > Thanks > Nikunj > > > On Thu, May 21, 2015 at 8:35 PM, Dib

Contributing Receiver based Low Level Kafka Consumer from Spark-Packages to Apache Spark Project

2015-10-14 Thread Dibyendu Bhattacharya
Hi, I have raised a JIRA ( https://issues.apache.org/jira/browse/SPARK-11045) to track the discussion but also mailing dev group for your opinion. There are some discussions already happened in Jira and love to hear what others think. You can directly comment against the Jira if you wish. This ka

Low Level Kafka Consumer for Spark

2014-08-02 Thread Dibyendu Bhattacharya
Hi, I have implemented a Low Level Kafka Consumer for Spark Streaming using Kafka Simple Consumer API. This API will give better control over the Kafka offset management and recovery from failures. As the present Spark KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better control

Re: Low Level Kafka Consumer for Spark

2014-08-05 Thread Dibyendu Bhattacharya
l.com >> +1 (206) 849-4108 >> >> >> On Sun, Aug 3, 2014 at 8:59 PM, Patrick Wendell >> wrote: >> >>> I'll let TD chime on on this one, but I'm guessing this would be a >>> welcome addition. It's great to see community effort on adding new &

Re: Low Level Kafka Consumer for Spark

2014-08-05 Thread Dibyendu Bhattacharya
ers. I’m not sure what is your thought? > > Thanks > Jerry > > From: Dibyendu Bhattacharya [mailto:dibyendu.bhattach...@gmail.com] > Sent: Tuesday, August 05, 2014 5:15 PM > To: Jonathan Hodges; dev@spark.apache.org > Cc: user > Subject: Re: Low Level Kafka Consumer

Re: Low Level Kafka Consumer for Spark

2014-08-24 Thread Dibyendu Bhattacharya
understand the details, but I want to do it really soon. In particular, I > want to understand the improvements, over the existing Kafka receiver. > > And its fantastic to see such contributions from the community. :) > > TD > > > On Tue, Aug 5, 2014 at 8:38 AM, Dibyendu Bhattach

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya
oolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > > > -- > Nan Zhu > > On Thursday, September 11, 2014 at 10:42 AM, Nan Zhu wrote: > &g

Re: All-time stream re-processing

2014-09-24 Thread Dibyendu Bhattacharya
So you have a single Kafka topic which has very high retention period ( that decides the storage capacity of a given Kafka topic) and you want to process all historical data first using Camus and then start the streaming process ? The challenge is, Camus and Spark are two different consumer for Ka