[ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166602#comment-17166602
 ] 

Guozhang Wang commented on KAFKA-8037:
--------------------------------------

Per the interaction: I'm actually suggesting that the source-topic reuse would 
NOT be part of the topology optimization moving forward and are ONLY be 
controlled by the per-store APIs, and hence it would not be related to the 
`topology.optimization` any more.

Per the serde: the current rule is, if `Consumed` is specified only, use that 
for both source topic and state store, and similarly if `Materialized` is 
specified only, use that for both source topic and state store; if both are 
specified, `Consumed` serde will override `Materialized` serde. So today we are 
guaranteed that only one serde will be used. My previous comment was not very 
clear, what I meant is that for IQ we just use the only serde, whatever it is, 
to deserialize the bytes (note they are just raw bytes from source topics 
directly).

> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to