Re: Manifest List Files

2019-06-05 Thread Ryan Blue
Our compaction isn't very sophisticated right now. We have a service that gets notifications when tables are updated and for the tables that have opted in, we load the metadata for all of the files in the changed partitions. Then we run bin packing (see BinPacking

Re: Manifest List Files

2019-06-04 Thread Anton Okolnychyi
Not directly related to this topic, but still pretty interesting as we mentioned the PR for rewriting manifests. Ryan, could you, also, share some insights on how you do compactions? Do you compact metadata separately from bin-packing files? How frequently do you expire snapshots? Do you expos

Re: Manifest List Files

2019-06-03 Thread Erik Wright
Thanks for sharing those observations. They are very pertinent. On Mon, Jun 3, 2019 at 5:19 PM Ryan Blue wrote: > Repeated conflicts is something that we keep an eye on in our > infrastructure. We have streaming tables that are written to every 10 > minutes from multiple regions, commits to move

Re: Manifest List Files

2019-06-03 Thread Ryan Blue
Repeated conflicts is something that we keep an eye on in our infrastructure. We have streaming tables that are written to every 10 minutes from multiple regions, commits to move the files back to a single region, and compaction all happening at the same time. We don't really see a significant prob

Re: Manifest List Files

2019-06-03 Thread Erik Wright
Thanks for the response, Ryan. I can certainly see the benefits of manifest files are. I can see that with potentially long lists of valid snapshots, each having long lists of manifest files, the mere process of committing a new snapshot could, itself, become costly and increase the likelihood of c

Re: Manifest List Files

2019-06-03 Thread Ryan Blue
Hi Erik, Manifest lists serve two purposes: 1. Reduce the amount of data tracked by the root metadata file 2. Provide a rough index over manifest files to cut down on planning time Manifests are reused to cut down on the amount of work required in a commit, but by doing this we end up with