> Vectorized query execution streamlines operations by processing a block >of 1024 rows at a time.
The real win of vectorization + columnar is that you get to take advantage of them at the same time. We get to execute the function once per 1024 rows when things are repeating - particularly true when the repetition naturally clusters together (like a Date field). And what's the possibility that no-row has nulls in a file - the current mode of operation prevents small fractions of nulls from hurting the whole runtime. Making it 1024 also was because of the way Java -XX:+UseNUMA allocates pages. If you look at LLAP startup options, you'll notice that it does most of the allocations off the TLAB, to restrict the allocations to the same NUMA zone as the thread. No such easy solutions exist for larger allocations. so 1) we'd lose isRepeating=true when you increase the block size 2) we get slower memory with NUMA interleaving when you increase the size of allocations. 3) we lose low-pause GC effects when allocating from the Humongous section of the G1GC The illusion of Java is that allocations are free - imagine a count(1) that allocates a huge array before returning versus a reader which reuses the same memory to read chunks and operate, which one would pause more often for the GC? > VQE would be very useful especially with ORC as it basically means that >one can process the whole column separately thus improving performance of >the query. Why and How? The columnar layout already process the whole column separately because of how chunks are read out. There's more to be done there for pure performance, of course - we could run the pre-exec filters on the String dictionaries and then only run pure int:int comparisons for the offsets, we could execute deterministic UDFs once per dictionary offset to make the isRepeating model operate across a whole ORC stripe. Cheers, Gopal