Sync notes for 2 December 2020

2020-12-04 Thread Ryan Blue
Hi everyone, I just wrote up my notes for the sync on Wednesday. Feel free to comment if I’ve missed anything or didn’t remember clearly. Here’s a quick summary of the main points: - Highlights: Spark stored procedures are available, Glue and Nessie catalogs are committed, and Hive now sup

Re: Does data file deletion always rewrite the manifest file

2020-12-04 Thread Ryan Blue
Hi Vivekanand, You're right that a manifest with a file that is deleted will be rewritten and replaced. Scan planning will ignore any deleted data file in a manifest. Whether a file is deleted is controlled by the manifest entry's status, which could be added, existing, or deleted. Using those thr

Re: Iceberg - Hive schema synchronization

2020-12-04 Thread Ryan Blue
A few replies inline. On Thu, Nov 26, 2020 at 3:49 AM Peter Vary wrote: > I think the column mapping should also be 1-to-1. Hive would have trouble > writing to a table if it didn't include all required columns. I think that > the right thing is for all engines to provide uniform access to all c

Re: S3 strong read-after-write consistency

2020-12-04 Thread Ryan Blue
It isn't clear whether this S3 consistency change also fixes the negative caching (HEAD when file doesn't exist causes later HEAD to not see the file), but I think that it does not fix it because there was a PR opened to add consistency using LIST before a HEAD operation. I think it is still a goo