Re: Wide tables in V4

2025-06-02 Thread Bart Samwel
ueries where > only a small number of columns are projected from a wide table. > I agree that it's an interesting idea, but it does add a lot of complexity, and I'm not convinced that it's better from a performance standpoint (metadata size increase, more I/Os). If we can g

Re: Wide tables in V4

2025-05-30 Thread Bart Samwel
On Fri, May 30, 2025 at 3:33 PM Péter Váry wrote: > One key advantage of introducing Physical Files is the flexibility to vary > RowGroup sizes across columns. For instance, wide string columns could > benefit from smaller RowGroups to reduce memory pressure, while numeric > columns could use lar

Re: Spec changes for deletion vectors

2024-10-17 Thread Bart Samwel
I hope it's OK if I chime in. I'm one of the people responsible for the format for position deletes that is used in Delta Lake and I've been reading along with the discussion. Given that the main sticking point is whether this compatibility is worth the associated "not pure" spec, I figured that ma

Re: [DISCUSS] Define calendar used in specification?

2024-09-12 Thread Bart Samwel
I have some historical context that may or may not be relevant. I still remember how we did the transition for Spark. This was ca. 2019, and there were still many people mixing Spark 2.x and 3.0. Also, many other systems were still using Java 7 which only supported Julian. As a result, Spark 3.0+ c