[
https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653388#comment-15653388
]
Ofir Manor commented on SPARK-18386:
------------------------------------
This would be really useful for me right now!
We have many bounded Kafka topics, for example, metrics of the last couple of
months, web clicks of the last 7 days etc. It would be great to be able to just
query them with Spark (I'd use "earliest" starting offsets in my case).
There is also an interesting interaction between structured streaming and
regular queries, where each streaming batch recomputes the regular queries it
depends on. It works with the file sources, I'd like to use that with Kafka
source as well in some cases.
If would also be great if the external API will be as close to the current one
({{spark.readStream.format("kafka").option(...)}}) as possible (same options
etc), maybe just with {{spark.read.kakfa...}}?
> Batch mode SQL source for Kafka
> -------------------------------
>
> Key: SPARK-18386
> URL: https://issues.apache.org/jira/browse/SPARK-18386
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Cody Koeninger
>
> An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for
> querying over a defined batch of offsets.
> The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X
> to timestamp Y) should be taken into account, even if not available in the
> initial implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]