[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553476#comment-15553476
]
Cody Koeninger commented on SPARK-15406:
----------------------------------------
You cannot have reliable delivery semantics to a downstream data store (i.e.
what people usually care about when they say exactly once) without either
idempotent writes, or transactional writes. The structured streaming api as it
exists today provides no way to specify offsets on startup, and no batched way
to access offsets for insertion into a data store, which means in practical
terms that exactly-once depends on idempotence. Idempotence is not always an
option.
The existing DStream allows me to get reliable delivery of arbitrary
aggregations to a partitioned, scalable downstream data store. The structured
streaming wrapper around the DStream (which honestly is what it is currently)
does not allow that.
I understand that you want to split the interface from the implementation, but
I as yet have heard no concrete ideas on how to make the implementation
meaningfully different from DStreams when it comes to Kafka (which is pretty
clearly the primary use case).
> Structured streaming support for consuming from Kafka
> -----------------------------------------------------
>
> Key: SPARK-15406
> URL: https://issues.apache.org/jira/browse/SPARK-15406
> Project: Spark
> Issue Type: New Feature
> Reporter: Cody Koeninger
>
> This is the parent JIRA to track all the work for the building a Kafka source
> for Structured Streaming. Here is the design doc for an initial version of
> the Kafka Source.
> https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing
> ================== Old description =========================
> Structured streaming doesn't have support for kafka yet. I personally feel
> like time based indexing would make for a much better interface, but it's
> been pushed back to kafka 0.10.1
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]