It's a bit outside the scope of this discussion, but I've looked at
those R Jira issues before, and I think the challenge is how the code
will "know" what fill values are being used. If you start putting
field-level metadata in a schema object, you're playing a dangerous
game if that schema gets attached to a record batch / array where the
same fill value is not being used. The only "safe" way, I think, would
be to have metadata at the ArrayData level, but I'm not sure that's a
good idea.

On Mon, Mar 8, 2021 at 1:07 PM Neal Richardson
<neal.p.richard...@gmail.com> wrote:
>
> What was the resolution of this discussion? Was a JIRA made?
>
> It occurred to me recently that, if we decided that values masked by null
> bits need to be filled with a known value, this could open up optimizations
> in some use cases. For example, when reading a file into R, if we could
> specify what to use for the known null values, we could use R's missing
> value sentinels and then get pure zero-copy access. Some related JIRAs:
>
> https://issues.apache.org/jira/browse/ARROW-8348
> https://issues.apache.org/jira/browse/ARROW-7767
> https://issues.apache.org/jira/browse/ARROW-3263
>
> Neal
>
> On Sat, Feb 20, 2021 at 4:30 PM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Le 21/02/2021 à 01:05, Wes McKinney a écrit :
> > > I agree that we should avoid leaking uninitialized memory in places
> > > where we have control over it. I could imagine a third party project
> > > having UBSAN warnings and then tracing the origin of them to something
> > > in Arrow that they then have to work around. As for the potential
> > > performance implications, we'll have to be vigilant with
> > > microbenchmarks.
> >
> > We're generally already doing this when we're careful, so we're already
> > paying the price (which I would estimate intuitively quite small).
> > Unfortunately, there doesn't seem to be an obvious way to check it
> > systematically on CI, but Valgrind can occasionally uncover it.
> >
> > Regards
> >
> > Antoine.
> >

Reply via email to