Re: Setting startingOffsets to earliest in structured streaming never catches up

2017-01-23 Thread Michael Armbrust
+1 to Ryan's suggestion of setting maxOffsetsPerTrigger. This way you can at least see how quickly it is making progress towards catching up. On Sun, Jan 22, 2017 at 7:02 PM, Timothy Chan wrote: > I'm using version 2.02. > > The difference I see between using latest and earliest is a series of

Re: Setting startingOffsets to earliest in structured streaming never catches up

2017-01-22 Thread Timothy Chan
I'm using version 2.02. The difference I see between using latest and earliest is a series of jobs that take less than a second vs. one job that goes on for over 24 hours. On Sun, Jan 22, 2017 at 6:54 PM Shixiong(Ryan) Zhu wrote: > Which Spark version are you using? If you are using 2.1.0, coul

Re: Setting startingOffsets to earliest in structured streaming never catches up

2017-01-22 Thread Shixiong(Ryan) Zhu
Which Spark version are you using? If you are using 2.1.0, could you use the monitoring APIs ( http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries) to check the input rate and the processing rate? One possible issue is that the Kafka source l