Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Shixiong(Ryan) Zhu
I recommend you to use Structured Streaming as it has a patch that can workaround this issue: https://issues.apache.org/jira/browse/SPARK-26267 Best Regards, Ryan On Tue, Apr 30, 2019 at 3:34 PM Shixiong(Ryan) Zhu wrote: > There is a known issue that Kafka may return a wrong offset even if the

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Shixiong(Ryan) Zhu
There is a known issue that Kafka may return a wrong offset even if there is no reset happening: https://issues.apache.org/jira/browse/KAFKA-7703 Best Regards, Ryan On Tue, Apr 30, 2019 at 10:41 AM Austin Weaver wrote: > @deng - There was a short erroneous period where 2 streams were reading >

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Austin Weaver
@deng - There was a short erroneous period where 2 streams were reading from the same topic and group id were running at the same time. We saw errors in this and stopped the extra stream. That being said, I would think regardless that the auto.offset.reset would kick in sine documentation says that

Re: Issue with offset management using Spark on Dataproc

2019-04-30 Thread Akshay Bhardwaj
Hi Austin, Are you using Spark Streaming or Structured Streaming? For better understanding, could you also provide sample code/config params for your spark-kafka connector for the said streaming job? Akshay Bhardwaj +91-97111-33849 On Mon, Apr 29, 2019 at 10:34 PM Austin Weaver wrote: > Hey