Re: Verifying correctness of StreamingFileSink (Kafka -> S3)

2019-10-16 Thread Kostas Kloudas
Hi Amran, If you want to know from which partition your input data come from, you can always have a separate bucket for each partition. As described in [1], you can extract the offset/partition/topic information for an incoming record and based on this, decide the appropriate bucket to put the rec

Verifying correctness of StreamingFileSink (Kafka -> S3)

2019-10-15 Thread amran dean
I am evaluating StreamingFileSink (Kafka 0.10.11) as a production-ready alternative to a current Kafka -> S3 solution. Is there any way to verify the integrity of data written in S3? I'm confused how the file names (e.g part-1-17) map to Kafka partitions, and further unsure how to ensure that no K