Re: Spark Streaming: Custom Receiver OOM consistently

2017-05-23 Thread Manish Malhotra
hes receiver and worker nodes should discard the old data ? > > > > On Mon, May 22, 2017 at 5:20 PM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> thanks Alonso, >> >> Sorry, but there are some security reservations. >> >> But

Re: Spark Streaming: Custom Receiver OOM consistently

2017-05-22 Thread Manish Malhotra
gt; [image: https://]about.me/alonso.isidoro.roman > > <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> > > 2017-05-20 7:54 GMT+02:00 Manish Malhotra > : > >> Hello, >> &g

Spark Streaming: Custom Receiver OOM consistently

2017-05-19 Thread Manish Malhotra
Hello, have implemented Java based custom receiver, which consumes from messaging system say JMS. once received message, I call store(object) ... Im storing spark Row object. it run for around 8 hrs, and then goes OOM, and OOM is happening in receiver nodes. I also tried to run multiple receiver

Re: [Spark Streamiing] Streaming job failing consistently after 1h

2017-05-19 Thread Manish Malhotra
Im also facing same problem. I have implemented Java based custom receiver, which consumes from messaging system say JMS. once received message, I call store(object) ... Im storing spark Row object. it run for around 8 hrs, and then goes OOM, and OOM is happening in receiver nodes. I also tried t

Re: RDD getPartitions() size and HashPartitioner numPartitions

2016-12-04 Thread Manish Malhotra
Its a pretty nice question ! I'll trying to understand the problem, and see can help further. When you say CustomRDD I believe you will using it in the transformation stage, once the data is read from a file source like HDFS or Cassandra or Kafka. Now the RDD.getPartitions() should return the pa

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Manish Malhotra
thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have other dependencies like network hops, like if its across rack, which can have switch in between. But yes people are discuss

Re: Spark Streaming: question on sticky session across batches ?

2016-11-15 Thread Manish Malhotra
ition where > edge data are. > > // maropu > > > On Tue, Nov 15, 2016 at 5:19 AM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > > sending again. > any help is appreciated ! > > thanks in advance. > > On Thu, Nov 10, 2016 at 8:42 AM, Man

Re: Spark Streaming: question on sticky session across batches ?

2016-11-14 Thread Manish Malhotra
sending again. any help is appreciated ! thanks in advance. On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote: > Hello Spark Devs/Users, > > Im trying to solve the use case with Spark Streaming 1.6.2 where for every > batch ( say 2 mins

Spark Streaming: question on sticky session across batches ?

2016-11-10 Thread Manish Malhotra
Hello Spark Devs/Users, Im trying to solve the use case with Spark Streaming 1.6.2 where for every batch ( say 2 mins) data needs to go to the same reducer node after grouping by key. The underlying storage is Cassandra and not HDFS. This is a map-reduce job, where also trying to use the partitio