Re: Normal Spark Streaming vs Streaming on Beam with Spark Runner

Ismaël Mejía Wed, 16 May 2018 08:11:26 -0700

Hello,

Answers to the questions inline:

> 1. Are there any limitations in terms of implementations, functionalities
or performance if we want to run streaming on Beam with Spark runner vs
streaming on Spark-Streaming directly ?

At this moment the Spark runner does not support some parts of the Beam
model in
streaming mode, e.g. side inputs and state/timer API. Comparing this with
pure
spark streaming is not easy given the semantic differences of Beam.

> 2. Spark features like checkpointing, kafka offset management, how are
they supported in Apache Beam? Do we need to do some extra work for them?

Checkpointing is supported, Kafka offset management (if I understand what
you
mean) is managed by the KafkaIO connector + the runner, so this should be
ok.

> 3. with spark 2.x structured streaming , if we want to switch across
different modes like from micro-batching to continuous streaming mode, how
it can be done while using Beam?

To do this the Spark runner needs to translate the Beam Pipeline using the
Structured Streaming API which is not the case today. It uses the RDD based
API
but we expect to tackle this in the not so far future.  However even if we
did
Spark continuous mode is quite limited at this moment in time because it
does
not support aggregation functions.

https://spark.apache.org/docs/2.3.0/structured-streaming-programming-guide.html#continuous-processing

Don't hesitate to give a try to Beam and the Spark runner and refer us if
you
have questions or find any issues.

Regards,
Ismaël

On Tue, May 15, 2018 at 2:22 PM chandan prakash <[email protected]>
wrote:

> Also,

> 3. with spark 2.x structured streaming , if we want to switch across
different modes like from micro-batching to continuous streaming mode, how
it can be done while using Beam?

> These are some of the initial questions which I am not able to understand
currently.

> Regards,
> Chandan

> On Tue, May 15, 2018 at 5:45 PM, chandan prakash <
[email protected]> wrote:

>> Hi Everyone,
>> I have just started exploring and understanding Apache Beam for new
project in my firm.
>> In particular, we have to take decision whether to implement our product
over spark streaming (as spark batch is already in our eco system) or
should we use Beam over spark runner to have future liberty of changing
underline runner.

>> Couple of questions, after going through beam docs and examples, I have
is:

>> Are there any limitations in terms of implementations, functionalities
or performance if we want to run streaming on Beam with Spark runner vs
streaming on Spark-Streaming directly ?

>> Spark features like checkpointing, kafka offset management, how are they
supported in Apache Beam? Do we need to do some extra work for them?

>> Any answer or link to like wise discussion will be really appreciable.
>> Thanks in advance.

>> Regards,
>> --
>> Chandan Prakash

> --
> Chandan Prakash

Re: Normal Spark Streaming vs Streaming on Beam with Spark Runner

Reply via email to