Hey guys, relatively new Spark Dev here and i'm seeing some kafka offset issues and was wondering if you guys could help me out.
I am currently running a spark job on Dataproc and am getting errors trying to re-join a group and read data from a kafka topic. I have done some digging and am not sure what the issue is. I have auto.offset.reset set to earliest so it should being reading from the earliest available non-committed offset and initially my spark logs look like this : 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-11 to offset 5553330. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-2 to offset 5555553. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-3 to offset 5555484. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-4 to offset 5555586. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-5 to offset 5555502. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-6 to offset 5555561. 19/04/29 16:30:30 INFO org.apache.kafka.clients.consumer.internals.Fetcher: [Consumer clientId=consumer-1, groupId=demo-group] Resetting offset for partition demo.topic-7 to offset 5555542.``` But then the very next line I get an error trying to read from a nonexistent offset on the server (you can see that the offset for the partition differs from the one listed above, so I have no idea why it would be attempting to read form that offset, here is the error on the next line: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {demo.topic-11=4544296} Any ideas to why my spark job is constantly going back to this offset (4544296), and not the one it outputs originally (5553330)? It seems to be contradicting itself w a) the actual offset it says its on and the one it attempts to read and b) saying no configured reset policy -- Austin Weaver Software Engineer FLYR, Inc. www.flyrlabs.com