Quoting Photon <https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf> Paper: > > After query planning, DBR launches tasks to execute the stages of the > plan. In a task with Photon, the Photon execution node first serializes the > Photon part of the plan into a Protobuf [6] message. This message is passed > via the Java Native Interface (JNI) [8] to the Photon C++ library, which > deserializes the Protobuf and converts it into a Photon-internal plan.
Let's see what others have done. E.g., Photon, Velox (+ Apache Gluten to use Velox in Spark), Apache DataFusion Comet (Apache DataFusion is written in Rust). On Wed, Jun 11, 2025 at 1:55 AM Calvin Dani <calvinthomas.d...@gmail.com> wrote: > > Yes, Ill look into the JNA project too and explore approach 2 with both > FFM and JNA. > > I’ll prototype both approach 1 and 2 and update with a status in here. > > > On Jun 10, 2025, at 1:50 PM, Ian Maxon <ima...@apache.org> wrote: > > > > The Vector API is in OpenJDK, so I think the licensing should be OK: > > https://openjdk.org/jeps/508 > > > > The main problem is the fact it isn't a stable API yet, and it relies > > on Valhalla. It would be a judgement call on how much we expect it to > > change over time, and how difficult it would be to migrate things to > > follow those changes. It would also be a bet that by the time > > everything is done, these set of JDK features are more or less > > stabilized. > > > > Using FFI/JNI would be a more traditional way to go about it. FFI is > > new and better than JNI, so if we choose to go with that, it should be > > less painful. FFI is a preview feature, which is less risky than an > > incubating feature. > > > > There is also the JNA project, which wraps JNI to make it simpler: > > https://github.com/java-native-access/jna . I'm assuming most of the > > libraries we might want to use are mostly computational, so they > > wouldn't have many platform-specific dependencies, just architecture > > specific ones. I think it also handles the build aspect of it, which > > FFI doesn't directly. Assuming the libraries we would want to use > > aren't in libc or otherwise can't be assumed to be present, we would > > have to include them in the jar somehow. > > > > > >> On Tue, Jun 10, 2025 at 8:27 AM Mike Carey <dtab...@gmail.com> wrote: > >> > >> Q: Are there licensing gotchas with approach 1 (which otherwise sounds > >> nicer from a maintenance standpoint)? We need to be sure that everything > >> we use is Apache-okay in terms of licensing. It would be fun to see > >> some preliminary numbers on perf, e.g., for KNN, each way, were it as > >> easy as changing which function(s) to call... :-) That would help > >> quantify the two options (vs. each other and vs. none) too. > >> > >>> On 6/10/25 7:24 AM, Calvin Dani wrote: > >>> Hi, > >>> > >>> As part of adding vector functionality to AsterixDB, I have been > exploring > >>> possible optimizations for vector computations. One promising > direction is > >>> leveraging SIMD operations to accelerate these calculations. Although > Java > >>> offers autovectorization to utilize SIMD, this approach requires the > >>> operations to be branchless (i.e., no conditional branching like > if/else), > >>> and it may not always be triggered when vector calculations get > complex. > >>> > >>> I have considered two main options for SIMD-enabled vector computation: > >>> > >>> 1. Java Vector API: Introduced as an incubation feature since Java 17, > the > >>> Vector API is part of the long-term Project Valhalla. While it remains > in > >>> incubation and likely won’t be finalized until Project Valhalla > completes, > >>> the API already supports the basic operations needed for our distance > >>> metrics, such as Euclidean Distance, Manhattan Distance, Cosine > Similarity, > >>> and Dot Product. It also provides a primitive Vector<E> type which > could > >>> serve as a native storage for embeddings. > >>> > >>> 2. Foreign Function & Memory API: This allows calling optimized C/C++ > >>> libraries directly from Java. We could either leverage existing > >>> highly-optimized vector computation libraries or implement our own > native > >>> code. However, packaging and ensuring compatibility of native libraries > >>> across different target platforms may introduce complexity. > >>> > >>> If you are aware of other solutions or have feedback on these options, > I > >>> would appreciate your insights. > >>> > >>> Thank you, > >>> Calvin Dani > >>> > -- *Regards,* Wail Alkowaileet