I think you would have to persist events somehow if you don't want to miss them. I don't see any other option there. Either in MQTT if it is supported there or routing them through Kafka.
There is WriteAheadLog in Spark but you would have decouple stream MQTT reading and processing into 2 separate job so that you could upgrade the processing one assuming the reading one would be stable (without changes) across versions. But it is problematic because there is no easy way how to share DStreams between jobs - you would have develop your own facility for it. Alternatively the reading job could could save MQTT event in its the most raw form into files - to limit need to change code - and then the processing job would work on top of it using Spark streaming based on files. I this is inefficient and can get quite complex if you would like to make it reliable. Basically either MQTT supports prsistence (which I don't know) or there is Kafka for these use case. Another option would be I think to place observable streams in between MQTT and Spark streaming with bakcpressure as far as you could perform upgrade till buffers fills up. I'm sorry that it is not thought out well from my side, it is just a brainstorm but it might lead you somewhere. Regards, Petr On Mon, Sep 21, 2015 at 10:09 AM, Jeetendra Gangele <gangele...@gmail.com> wrote: > Hi All, > > I have an spark streaming application with batch (10 ms) which is reading > the MQTT channel and dumping the data from MQTT to HDFS. > > So suppose if I have to deploy new application jar(with changes in spark > streaming application) what is the best way to deploy, currently I am doing > as below > > 1.killing the running streaming app using yarn application -kill ID > 2. and then starting the application again > > Problem with above approach is since we are not persisting the events in > MQTT we will miss the events for the period of deploy. > > how to handle this case? > > regards > jeeetndra >