[ 
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553662#comment-15553662
 ] 

Ofir Manor commented on SPARK-17815:
------------------------------------

I think this is a good idea.There is a minor confusion here, though, as setting 
group.id is explictly blocked as far as I understand (it is even 
documented...). So, it might need rephrasing.
1. I think auto-commit should be off, and the driver should manually commit 
kafka offsets after it successfully commits a batch to HDFS (when a batch is 
over), so monitoring will work. I think that should happen unconditionally, 
unless there is a concrete performance / overhead concerns (commiting offsets 
to Kafka too frequently?)
2. Regarding manually setting group.id - that would be great. If there is a 
concern that users might mess up (reuse the group.id by mistake), at least 
allow setting a prefix to it (and a way to get the actual group.id)

> Report committed offsets
> ------------------------
>
>                 Key: SPARK-17815
>                 URL: https://issues.apache.org/jira/browse/SPARK-17815
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
>
> Since we manage our own offsets, we have turned off auto-commit.  However, 
> this means that external tools are not able to report on how far behind a 
> given streaming job is.  When the user manually gives us a group.id, we 
> should report back to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to