> > Upload to S3 is partitioned by the "key" field. I.e, one folder per key. It > does offset management to make sure offset commit is in sync with S3 upload.
We do this in several spots and I wish we had built our system in such a way that we could just open source it. I’m sure many people have solved this repeatedly. We’ve had significant disk performance issues when the number of keys is large (40,000-ish in our case) — you can’t be expected to open a file per key. That’s why something like the fixed slicing strategy I described can make a big difference. Wes