Yes, Ill look into the JNA project too and explore approach 2 with both FFM and 
JNA. 

I’ll prototype both approach 1 and 2 and update with a status in here. 

> On Jun 10, 2025, at 1:50 PM, Ian Maxon <ima...@apache.org> wrote:
> 
> The Vector API is in OpenJDK, so I think the licensing should be OK:
> https://openjdk.org/jeps/508
> 
> The main problem is the fact it isn't a stable API yet, and it relies
> on Valhalla. It would be a judgement call on how much we expect it to
> change over time, and how difficult it would be to migrate things to
> follow those changes. It would also be a bet that by the time
> everything is done, these set of JDK features are more or less
> stabilized.
> 
> Using FFI/JNI would be a more traditional way to go about it. FFI is
> new and better than JNI, so if we choose to go with that, it should be
> less painful. FFI is a preview feature, which is less risky than an
> incubating feature.
> 
> There is also the JNA project, which wraps JNI to make it simpler:
> https://github.com/java-native-access/jna . I'm assuming most of the
> libraries we might want to use are mostly computational, so they
> wouldn't have many platform-specific dependencies, just architecture
> specific ones. I think it also handles the build aspect of it, which
> FFI doesn't directly. Assuming the libraries we would want to use
> aren't in libc or otherwise can't be assumed to be present, we would
> have to include them in the jar somehow.
> 
> 
>> On Tue, Jun 10, 2025 at 8:27 AM Mike Carey <dtab...@gmail.com> wrote:
>> 
>> Q:  Are there licensing gotchas with approach 1 (which otherwise sounds
>> nicer from a maintenance standpoint)? We need to be sure that everything
>> we use is Apache-okay in terms of licensing.  It would be fun to see
>> some preliminary numbers on perf, e.g., for KNN, each way, were it as
>> easy as changing which function(s) to call...  :-)  That would help
>> quantify the two options (vs. each other and vs. none) too.
>> 
>>> On 6/10/25 7:24 AM, Calvin Dani wrote:
>>> Hi,
>>> 
>>> As part of adding vector functionality to AsterixDB, I have been exploring
>>> possible optimizations for vector computations. One promising direction is
>>> leveraging SIMD operations to accelerate these calculations. Although Java
>>> offers autovectorization to utilize SIMD, this approach requires the
>>> operations to be branchless (i.e., no conditional branching like if/else),
>>> and it may not always be triggered when vector calculations get complex.
>>> 
>>> I have considered two main options for SIMD-enabled vector computation:
>>> 
>>> 1. Java Vector API: Introduced as an incubation feature since Java 17, the
>>> Vector API is part of the long-term Project Valhalla. While it remains in
>>> incubation and likely won’t be finalized until Project Valhalla completes,
>>> the API already supports the basic operations needed for our distance
>>> metrics, such as Euclidean Distance, Manhattan Distance, Cosine Similarity,
>>> and Dot Product. It also provides a primitive Vector<E> type which could
>>> serve as a native storage for embeddings.
>>> 
>>> 2. Foreign Function & Memory API: This allows calling optimized C/C++
>>> libraries directly from Java. We could either leverage existing
>>> highly-optimized vector computation libraries or implement our own native
>>> code. However, packaging and ensuring compatibility of native libraries
>>> across different target platforms may introduce complexity.
>>> 
>>> If you are aware of other solutions or have feedback on these options, I
>>> would appreciate your insights.
>>> 
>>> Thank you,
>>> Calvin Dani
>>> 

Reply via email to