[
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559911#comment-15559911
]
Cody Koeninger commented on SPARK-17815:
----------------------------------------
The WAL cannot be the only source of truth, because it can be corrupted in a
situation where the downstream results and offsets are not. The downstream
offsets by contrast cant be corrupted without also affecting the results, thats
the whole point of transactions. Even if you do ignore the fact that the wal
can be corrupted, you still have to be careful about aligning boundaries of the
wal with boundaries of the downstream store.
The kafka commit log cant be ignored as merely for metric collection either. A
kafka consumer is going to use it in preference to auto.offset.reset as the
starting point for a newly constructed consumer.
I'm not saying these issues are unsolvable, but you cant just handwave them
away, and they are confusing to end users. There was already confusion with
only 2 stores - ZK and the dstream checkpoint.
> Report committed offsets
> ------------------------
>
> Key: SPARK-17815
> URL: https://issues.apache.org/jira/browse/SPARK-17815
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit. However,
> this means that external tools are not able to report on how far behind a
> given streaming job is. When the user manually gives us a group.id, we
> should report back to it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]