Hi, As part of adding vector functionality to AsterixDB, I have been exploring possible optimizations for vector computations. One promising direction is leveraging SIMD operations to accelerate these calculations. Although Java offers autovectorization to utilize SIMD, this approach requires the operations to be branchless (i.e., no conditional branching like if/else), and it may not always be triggered when vector calculations get complex.
I have considered two main options for SIMD-enabled vector computation: 1. Java Vector API: Introduced as an incubation feature since Java 17, the Vector API is part of the long-term Project Valhalla. While it remains in incubation and likely won’t be finalized until Project Valhalla completes, the API already supports the basic operations needed for our distance metrics, such as Euclidean Distance, Manhattan Distance, Cosine Similarity, and Dot Product. It also provides a primitive Vector<E> type which could serve as a native storage for embeddings. 2. Foreign Function & Memory API: This allows calling optimized C/C++ libraries directly from Java. We could either leverage existing highly-optimized vector computation libraries or implement our own native code. However, packaging and ensuring compatibility of native libraries across different target platforms may introduce complexity. If you are aware of other solutions or have feedback on these options, I would appreciate your insights. Thank you, Calvin Dani