[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka

Ofir Manor (JIRA) Thu, 10 Nov 2016 00:13:28 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653388#comment-15653388
 ]


Ofir Manor commented on SPARK-18386:
------------------------------------

This would be really useful for me right now!
We have many bounded Kafka topics, for example, metrics of the last couple of 
months, web clicks of the last 7 days etc. It would be great to be able to just 
query them with Spark (I'd use "earliest" starting offsets in my case).
There is also an interesting interaction between structured streaming and 
regular queries, where each streaming batch recomputes the regular queries it 
depends on. It works with the file sources, I'd like to use that with Kafka 
source as well in some cases.
If would also be great if the external API will be as close to the current one 
({{spark.readStream.format("kafka").option(...)}}) as possible (same options 
etc), maybe just with {{spark.read.kakfa...}}?



> Batch mode SQL source for Kafka
> -------------------------------
>
>                 Key: SPARK-18386
>                 URL: https://issues.apache.org/jira/browse/SPARK-18386
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Cody Koeninger
>
> An SQL equivalent to the DStream KafkaUtils.createRDD would be useful for 
> querying over a defined batch of offsets.
> The possibility of Kafka 0.10.1 time indexing (e.g. a batch from timestamp X 
> to timestamp Y) should be taken into account, even if not available in the 
> initial implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18386) Batch mode SQL source for Kafka

Reply via email to