We've had some evidence for a while now that the kernel functions suffer from an overhead problem that prevents us from effectively utilizing cache. The latest and greatest evidence of this might be [1]. A number of people have made some very interesting suggestions that I think could really cut down on the overhead (e.g. preallocated buffers). However, whenever we start a discussion on implementation it ends up getting bogged down because there is a lot of existing code here and a massive refactor would be too difficult.
I'd like to propose we add a second kernel function registry. There doesn't need to be any user facing API change. We could probably use an approach like [2] to proxy to the old function registry when the newer registry doesn't contain the asked-for function. This would allow us to focus on creating an efficient function registry without having to worry about refactoring the existing kernels all at once. Once we are happy with the new registry we can start to migrate the existing kernel functions over to the new registry. I don't expect there will need to be a lot of change to the existing kernel functions but whatever change is there can be done incrementally. Does this seem like a good approach? Am I missing something or does anyone know of a better way to fix the existing implementation? The main risk I can see is that we don't end up completing the migration and end up maintaining two registries forever. However, we have enough interested people here at Voltron Data that I feel confident we can get this pushed through. [1] https://github.com/apache/arrow/pull/13179 [2] https://github.com/apache/arrow/pull/13252