Kafka rdds need to start from a specified offset, you really don't want the executors just starting at whatever offset happened to be latest at the time they ran.
If you need a way to figure out the latest offset at the time the driver starts up, you can always use a consumer to read the offsets and then pass that to Assign (just make sure that consumer is closed before the job starts so you don't get group id conflicts). You can even make your own implementation of ConsumerStrategy, which should allow you to do pretty much whatever you need to get the consumer in the state you want. On Mon, Aug 21, 2017 at 6:57 PM, swetha kasireddy <swethakasire...@gmail.com> wrote: > Hi Cody, > > I think the Assign is used if we want it to start from a specified offset. > What if we want it to start it from the latest offset with something like > returned by "auto.offset.reset" -> "latest",. > > > Thanks! > > On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Yes, you can start from specified offsets. See ConsumerStrategy, >> specifically Assign >> >> >> http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store >> >> On Tue, Aug 15, 2017 at 1:18 PM, SRK <swethakasire...@gmail.com> wrote: >> > Hi, >> > >> > How to force Spark Kafka Direct to start from the latest offset when the >> > lag >> > is huge in kafka 10? It seems to be processing from the latest offset >> > stored >> > for a group id. One way to do this is to change the group id. But it >> > would >> > mean that each time that we need to process the job from the latest >> > offset >> > we have to provide a new group id. >> > >> > Is there a way to force the job to run from the latest offset in case we >> > need to and still use the same group id? >> > >> > Thanks! >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-force-Spark-Kafka-Direct-to-start-from-the-latest-offset-when-the-lag-is-huge-in-kafka-10-tp29071.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> > > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org