[
https://issues.apache.org/jira/browse/KAFKA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708417#comment-16708417
]
Dmitry Buykin edited comment on KAFKA-7695 at 12/4/18 9:19 AM:
---------------------------------------------------------------
[~vvcephei] I'm agree with your comments. It would be better to throw an
exception in loading StreamsConfig. Now KafkaStreams derive settings from
Consumer, but breaking contract for these settings.
{quote}[~zirx], it sounds like the feature you want to request is some way to
cut down on intermediate `(left, null)` join results. Is that right?
{quote}
Yes, that's right. I had the duplicalypse in using KStreams because there are
many duplicates in source streams plus these (left, null) join results which
double amount of events on each left join. So in some cases I had up to several
millions dupes out of thousand duplicates in source events. Solved these issues
by implementing Deduplication Transformer but it doesn't help in some cases and
works pretty slow.
So other options could help:
1) [~mjsax] mentioned that 2.1 will support aligning streams by timestamps.
2) This dynamic distribution of slow/fast topics between StreamThreads with
respect to local state storage to align processing speed between topics.
3) Disabling seqnum in RocksDBWindowStore(retainDuplicates=true) for some left
joins to compact duplicates in RocksDB using buffering defaults.
4) Specify poll size per topic to throttle consuming from topics
5) Quotes per topic (not supported) per client to throttle throughput per topic
on brokers.
was (Author: zirx):
[~vvcephei] I'm agree with your comments. It would be better to throw an
exception in loading StreamsConfig. Now KafkaStreams derive settings from
Consumer, but breaking contract for these settings.
{quote}[~zirx], it sounds like the feature you want to request is some way to
cut down on intermediate `(left, null)` join results. Is that right?
{quote}
Yes, that's right. I had the duplicalypse in using KStreams because there are
many duplicates in source streams plus these (left, null) join results which
double amount of events on each left join. So in some cases I had up to several
millions dupes out of thousand duplicates in source events. Solved these issues
by implementing Deduplication Transformer but it doesn't help in some cases and
works pretty slow.
So other options could help:
1) [~mjsax] mentioned that 2.1 will support aligning streams by timestamps.
2) This dynamic distribution of slow/fast topics between StreamThreads with
respect to local state storage.
3) Disabling seqnum in RocksDBWindowStore(retainDuplicates=true) for some left
joins to compact duplicates in RocksDB using buffering defaults.
> Cannot override StreamsPartitionAssignor in configuration
> ----------------------------------------------------------
>
> Key: KAFKA-7695
> URL: https://issues.apache.org/jira/browse/KAFKA-7695
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.0.0, 2.0.1
> Reporter: Dmitry Buykin
> Priority: Major
> Labels: configuration
>
> Cannot override StreamsPartitionAssignor by changing property
> partition.assignment.strategy in KStreams 2.0.1 because the streams are
> crashing inside KafkaClientSupplier.getGlobalConsumer. This GlobalConsumer
> works only with RangeAssignor which configured by default.
> Could be reproduced by setting up
> `props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
> StreamsPartitionAssignor.class.getName());`
> For me it looks like a bug.
> Opened a discussion here
> https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1543395977453700
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)