[jira] [Commented] (KAFKA-8037) KTable restore may load bad data

Almog Gavra (Jira) Wed, 22 Jul 2020 16:28:14 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163135#comment-17163135
 ]


Almog Gavra commented on KAFKA-8037:
------------------------------------

> That said...in my (admiteddly anecdotal) experience, the creation of extra 
> topics and extra load on the brokers, etc is a major pain point for users of 
> Streams. I'm pretty sure I've seen it quoted in a "why we decided against 
> Kafka Streams" type article. Compare this with the problem of asymmetric 
> serdes, for which we have received exactly zero complaints as far as I am 
> aware.

And now for a 180, I think this is petty convincing; I just hope we can figure 
a way that doesn't cause the problems we've run into. I see one way to do that: 
when this optimization is enabled, we should have a byte pass-through into the 
state store during normal operation. That guarantees that if there are any 
bugs, at least they happen both in normal operation and in recovery. It also 
guarantees that we don't run into side-effects with serializers (still possible 
for deserializers, but I'm not aware of any deserializers that have side 
effects).

> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8037) KTable restore may load bad data

Reply via email to