Data loss - Compacted topic behind reduce function

2018-11-27 Thread Nitay Kufert
Hey everyone, We are running Kafka streams ver 2.1.0 (Scala). The specific use-case I am talking about can be simplified into the following stream: *given:* inputStream of type [String, BigDecimal] (for example: ("1234", 120.67)) inputStream.map(createUniqueKey).groupByKey.reduce(_ + _).toStream.t

Re: UNKNOWN_PRODUCER_ID error when running Streams WordCount demowith processing.guarantee set to EXACTLY_ONCE

2018-11-27 Thread Nitay Kufert
Thank you very much Matthias! For now, I have implemented a "de-dup" mechanism inside my stream application and disabled "exactly_once" semantic - so, for now, our production is clean of annoying logs. I'll follow on the latest developments around this issue On Fri, Nov 23, 2018 at 8:26 AM Matth

Re: The limit on the number of consumers in a group.

2018-11-27 Thread Matthias J. Sax
Rebalancing is expected to take longer for larger groups. But it should work nevertheless. I would recommend to dig into the logs: does a single rebalance "hang" or do you get multiple rebalances triggered after each other? -Matthias On 10/23/18 12:16 AM, Dominic Kim wrote: > Dear all. > > Is

PySpark Streaming and Kafka Security - Compatibility Issue.

2018-11-27 Thread Ramaswamy, Muthuraman
Hi All, I am using PySpark Direct Streaming to connect to a remote secured Kafka broker and is secured with Kerberos Authentication. The KafkaUtils.createDirectStream python call gives me the following error: 18/11/27 18:20:05 WARN VerifiableProperties: Property sasl.mechanism is not valid 18/