Hi all, I would like to ask a question related to the size of Kafka stream. I want to put data (e.g., file *.csv) to Kafka then use Spark streaming to get the output from Kafka and then save to Hive by using SparkSQL. The file csv is about 100MB with ~250K messages/rows (Each row has about 10 fields of integer). I see that Spark Streaming first received two partitions/batches, the first is of 60K messages and the second is of 50K msgs. But from the third batch, Spark just received 200 messages for each batch (or partition). I think that this problem is coming from Kafka or some configuration in Spark. I already tried to configure with the setting "auto.offset.reset=largest", but every batch only gets 200 messages.
Could you please tell me how to fix this problem? Thank you so much. Best regards, Alex