Thanks Bart. I'll give it a try. Presto has done something very similar on
this (thanks DB for finding this!). They published an article ([1]) last
year with a very thorough analysis on all the cases which I think can be
used as a reference for the implementation in Spark.
[1]: https://prestosql.i
IMO it's worth an attempt. The previous attempts seem to be closed because
of a general sense that this gets messy and leads to lots of special cases,
but that's just how it is. This optimization would make the difference
between getting sub-par performance for using some of these datatypes to
gett