I think you would have to persist events somehow if you don't want to miss
them. I don't see any other option there. Either in MQTT if it is supported
there or routing them through Kafka.

There is WriteAheadLog in Spark but you would have decouple stream MQTT
reading and processing into 2 separate job so that you could upgrade the
processing one assuming the reading one would be stable (without changes)
across versions. But it is problematic because there is no easy way how to
share DStreams between jobs - you would have develop your own facility for
it.

Alternatively the reading job could could save MQTT event in its the most
raw form into files - to limit need to change code - and then the
processing job would work on top of it using Spark streaming based on
files. I this is inefficient and can get quite complex if you would like to
make it reliable.

Basically either MQTT supports prsistence (which I don't know) or there is
Kafka for these use case.

Another option would be I think to place observable streams in between MQTT
and Spark streaming with bakcpressure as far as you could perform upgrade
till buffers fills up.

I'm sorry that it is not thought out well from my side, it is just a
brainstorm but it might lead you somewhere.

Regards,
Petr

On Mon, Sep 21, 2015 at 10:09 AM, Jeetendra Gangele <gangele...@gmail.com>
wrote:

> Hi All,
>
> I have an spark streaming application with batch (10 ms) which is reading
> the MQTT channel and dumping the data from MQTT to HDFS.
>
> So suppose if I have to deploy new application jar(with changes in spark
> streaming application) what is the best way to deploy, currently I am doing
> as below
>
> 1.killing the running streaming app using yarn application -kill ID
> 2. and then starting the application again
>
> Problem with above approach is since we are not persisting the events in
> MQTT we will miss the events for the period of deploy.
>
> how to handle this case?
>
> regards
> jeeetndra
>

Reply via email to