I'm using Spark DirectKafkaInputDStream so I don't have direct access to the consumer. There is no option to push down message filter so that decoded messages are dropped before InputDStream is created. Hence I have to decode everything into RDD then transform to a new RDD. I was thinking about my options how to avoid multiple RDD creation. My options are either:
1) Push down the filtering logic into custom Kafka decoder it it is possible to skip messages in Kafka decoder. I assume that throwing will stop topic consumption. I just need confirmation that Kafka Decoder doesn't support skipping messages. I assume there is no way how to push-down a filter into Kafka to filter messages before they hit its way to consumer. 2) Intercept Spark private code and rewrite it to a custom version if there would be any performance benefit, but the most probably it doesn't worth it. 3) Live with multiple RDDs creation. I don't know yet if it will have any significant performance impact. I'm just collecting my options. Cheers. On Wed, Aug 19, 2015 at 11:17 AM, Sharninder <sharnin...@gmail.com> wrote: > What do you mean by malformed messages? Consuming messages and what to do > with them is the application's logic. Consume them, if they're not relevant > pick up the next message. > > On Wed, Aug 19, 2015 at 1:10 PM, Petr Novak <oss.mli...@gmail.com> wrote: > > > Hi all, > > ... by returning null? > > > > Many thanks, > > Petr > > > > > > -- > -- > Sharninder >