Given I don't see any input from the DuckDB / Velox development team (this discussion seems primarily Arrow developers) I have filed a ticket in DuckDB requesting their consideration[1] and tried to bump the attention of the existing ticket in Velox[2]. Perhaps their input will provide a way forward.
[1]: https://github.com/duckdb/duckdb/discussions/9248 [2]: https://github.com/facebookincubator/velox/discussions/4362#discussioncomment-7209755 On Tue, Oct 3, 2023 at 3:24 AM Antoine Pitrou <anto...@python.org> wrote: > > Le 03/10/2023 à 01:36, Matt Topol a écrit : > > > > The cost of conversion is actually significantly higher than the actual > > overhead of simply accessing the values in either representation, leading > > to a high potential for bottleneck. For systems like Velox and DuckDB > where > > it's important to be able to return results as fast as possible, if they > > have an operation with a throughput of several hundred MB/s or even G/s, > > this conversion cost would become a huge bottleneck to returning results > > given several cases of converting Raw Pointer views to the offset-based > > views go as low as ~22MB/s. > > I think you misread the benchmark numbers. It's 22 MItems/s, not 22 MB/s. > Since that number is for the kLongAndSeldomInlineable case, I assume the > MB/s would two or three orders of magnitude higher. > > Regards > > Antoine. >