Re: [Format] Make fields required?

Antoine Pitrou Mon, 20 Jan 2020 07:35:42 -0800


Le 20/01/2020 à 16:26, Jacques Nadeau a écrit :
> I think it is too late in the game to make this fundamental change. It
> would be very hard to assess whether it is no op or has massive
> implications to existing datasets. Just among Dremio customers in the 30
> days we stored more than 100mm datasets that leveraged the current format.


To be clear, I agree that we need to check that our various validation
and integration suites pass properly.  But once that is done and
assuming all the metadata variations are properly tested, data
variations should not pose any problem.

> I'm supportive of enforcing non nulls on the write side but I don't think
> we should change the current read behavior.

The write side is irrelevant here, since the concern is to protect
reliably against invalid input (especially due to malicious intent).

The read behaviour would be kept unchanged in the face of *valid* input
- but it would become deterministic and robust in the face of *invalid*
input - which it isn't today.

Of course, we can hand-write all the NULL checks on the read side.  My
concern is not the one-time cost of doing so, but the long-term
fragility of such a strategy (every refactor or format addition is a
threat to the robustness of the IPC reader).  I don't think a potential
long-standing history of security issues in Arrow would help adoption.

Regards

Antoine.

Re: [Format] Make fields required?

Reply via email to