Yes, Ill look into the JNA project too and explore approach 2 with both FFM and JNA.
I’ll prototype both approach 1 and 2 and update with a status in here. > On Jun 10, 2025, at 1:50 PM, Ian Maxon <ima...@apache.org> wrote: > > The Vector API is in OpenJDK, so I think the licensing should be OK: > https://openjdk.org/jeps/508 > > The main problem is the fact it isn't a stable API yet, and it relies > on Valhalla. It would be a judgement call on how much we expect it to > change over time, and how difficult it would be to migrate things to > follow those changes. It would also be a bet that by the time > everything is done, these set of JDK features are more or less > stabilized. > > Using FFI/JNI would be a more traditional way to go about it. FFI is > new and better than JNI, so if we choose to go with that, it should be > less painful. FFI is a preview feature, which is less risky than an > incubating feature. > > There is also the JNA project, which wraps JNI to make it simpler: > https://github.com/java-native-access/jna . I'm assuming most of the > libraries we might want to use are mostly computational, so they > wouldn't have many platform-specific dependencies, just architecture > specific ones. I think it also handles the build aspect of it, which > FFI doesn't directly. Assuming the libraries we would want to use > aren't in libc or otherwise can't be assumed to be present, we would > have to include them in the jar somehow. > > >> On Tue, Jun 10, 2025 at 8:27 AM Mike Carey <dtab...@gmail.com> wrote: >> >> Q: Are there licensing gotchas with approach 1 (which otherwise sounds >> nicer from a maintenance standpoint)? We need to be sure that everything >> we use is Apache-okay in terms of licensing. It would be fun to see >> some preliminary numbers on perf, e.g., for KNN, each way, were it as >> easy as changing which function(s) to call... :-) That would help >> quantify the two options (vs. each other and vs. none) too. >> >>> On 6/10/25 7:24 AM, Calvin Dani wrote: >>> Hi, >>> >>> As part of adding vector functionality to AsterixDB, I have been exploring >>> possible optimizations for vector computations. One promising direction is >>> leveraging SIMD operations to accelerate these calculations. Although Java >>> offers autovectorization to utilize SIMD, this approach requires the >>> operations to be branchless (i.e., no conditional branching like if/else), >>> and it may not always be triggered when vector calculations get complex. >>> >>> I have considered two main options for SIMD-enabled vector computation: >>> >>> 1. Java Vector API: Introduced as an incubation feature since Java 17, the >>> Vector API is part of the long-term Project Valhalla. While it remains in >>> incubation and likely won’t be finalized until Project Valhalla completes, >>> the API already supports the basic operations needed for our distance >>> metrics, such as Euclidean Distance, Manhattan Distance, Cosine Similarity, >>> and Dot Product. It also provides a primitive Vector<E> type which could >>> serve as a native storage for embeddings. >>> >>> 2. Foreign Function & Memory API: This allows calling optimized C/C++ >>> libraries directly from Java. We could either leverage existing >>> highly-optimized vector computation libraries or implement our own native >>> code. However, packaging and ensuring compatibility of native libraries >>> across different target platforms may introduce complexity. >>> >>> If you are aware of other solutions or have feedback on these options, I >>> would appreciate your insights. >>> >>> Thank you, >>> Calvin Dani >>>