Re: correct and fast way to stop streaming application

2015-10-27 Thread Krot Viacheslav
>> database. That would probably mean using a real database instead of >> zookeeper though. >> >> On Tue, Oct 27, 2015 at 4:13 AM, Krot Viacheslav < >> krot.vyaches...@gmail.com> wrote: >> >>> Any ideas? This is so important because we use kafka direc

Re: correct and fast way to stop streaming application

2015-10-27 Thread Krot Viacheslav
zookeeper. This way I loose data in offsets 10-20 How should this be handled correctly? пн, 26 окт. 2015 г. в 18:37, varun sharma : > +1, wanted to do same. > > On Mon, Oct 26, 2015 at 8:58 PM, Krot Viacheslav < > krot.vyaches...@gmail.com> wrote: > >> Hi all, >>

correct and fast way to stop streaming application

2015-10-26 Thread Krot Viacheslav
Hi all, I wonder what is the correct way to stop streaming application if some job failed? What I have now: val ssc = new StreamingContext ssc.start() try { ssc.awaitTermination() } catch { case e => ssc.stop(stopSparkContext = true, stopGracefully = false) } It works but one problem

stop streaming context of job failure

2015-06-16 Thread Krot Viacheslav
Hi all, Is there a way to stop streaming context when some batch processing failed? I want to set reasonable reties count, say 10, and if failed - stop context completely. Is that possible?

Re: updateStateByKey and kafka direct approach without shuffle

2015-06-02 Thread Krot Viacheslav
e a single instance of a partitioner in advance, and all it > needs to know is the number of partitions (which is just the count of all > the kafka topic/partitions). > > On Tue, Jun 2, 2015 at 12:40 PM, Krot Viacheslav < > krot.vyaches...@gmail.com> wrote: > >> Cody, >

Re: updateStateByKey and kafka direct approach without shuffle

2015-06-02 Thread Krot Viacheslav
x > into the offset range array matches the (spark) partition id. That will > also tell you what the value of numPartitions should be. > > > > > > > > On Tue, Jun 2, 2015 at 11:50 AM, Krot Viacheslav < > krot.vyaches...@gmail.com> wrote: > >> Hi all, &

updateStateByKey and kafka direct approach without shuffle

2015-06-02 Thread Krot Viacheslav
Hi all, In my streaming job I'm using kafka streaming direct approach and want to maintain state with updateStateByKey. My PairRDD has message's topic name + partition id as a key. So, I assume that updateByState could work within same partition as KafkaRDD and not lead to shuffles. Actually this i