I'm using Spark DirectKafkaInputDStream so I don't have direct access to
the consumer. There is no option to push down message filter so that
decoded messages are dropped before InputDStream is created. Hence I have
to decode everything into RDD then transform to a new RDD. I was thinking
about my options how to avoid multiple RDD creation. My options are either:

1) Push down the filtering logic into custom Kafka decoder it it is
possible to skip messages in Kafka decoder. I assume that throwing will
stop topic consumption. I just need confirmation that Kafka Decoder doesn't
support skipping messages. I assume there is no way how to push-down a
filter into Kafka to filter messages before they hit its way to consumer.

2) Intercept Spark private code and rewrite it to a custom version if there
would be any performance benefit, but the most probably it doesn't worth it.

3) Live with multiple RDDs creation. I don't know yet if it will have any
significant performance impact.

I'm just collecting my options.

Cheers.




On Wed, Aug 19, 2015 at 11:17 AM, Sharninder <sharnin...@gmail.com> wrote:

> What do you mean by malformed messages? Consuming messages and what to do
> with them is the application's logic. Consume them, if they're not relevant
> pick up the next message.
>
> On Wed, Aug 19, 2015 at 1:10 PM, Petr Novak <oss.mli...@gmail.com> wrote:
>
> > Hi all,
> > ... by returning null?
> >
> > Many thanks,
> > Petr
> >
>
>
>
> --
> --
> Sharninder
>

Reply via email to