> 
> Upload to S3 is partitioned by the "key" field. I.e, one folder per key. It
> does offset management to make sure offset commit is in sync with S3 upload.

We do this in several spots and I wish we had built our system in such a way 
that we could just open source it. I’m sure many people have solved this 
repeatedly. We’ve had significant disk performance issues when the number of 
keys is large (40,000-ish in our case) — you can’t be expected to open a file 
per key. That’s why something like the fixed slicing strategy I described can 
make a big difference.

Wes

Reply via email to