Hi,
I'm using spark streaming with kafka and I need to clear the offset and
re-compute all things. I deleted checkpoint directory in HDFS and reset
kafka offset with "kafka-run-class kafka.tools.ImportZkOffsets". I can
confirm the offset is set to 0 in kafka:
~ > kafka-run-class kafka.tools.ConsumerOffsetChecker --group
adhoc_data_spark --topic adhoc_data --zookeeper szq1.appadhoc.com:2181
Group Topic Pid Offset logSize
Lag Owner
adhoc_data_spark adhoc_data 0 0 5280743
5280743 none
But when I restart spark streaming, the offset is reset to logSize, I
cannot figure out why is that, can anybody help? Thanks.