Re: Vector Computation Optimization Approaches for AsterixDB

Wail Alkowaileet Fri, 13 Jun 2025 03:33:33 -0700

Quoting Photon
<https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf>
Paper:
>
> After query planning, DBR launches tasks to execute the stages of the
> plan. In a task with Photon, the Photon execution node first serializes the
> Photon part of the plan into a Protobuf [6] message. This message is passed
> via the Java Native Interface (JNI) [8] to the Photon C++ library, which
> deserializes the Protobuf and converts it into a Photon-internal plan.



Let's see what others have done. E.g., Photon, Velox (+ Apache Gluten to
use Velox in Spark), Apache DataFusion Comet (Apache DataFusion is written
in Rust).

On Wed, Jun 11, 2025 at 1:55 AM Calvin Dani <calvinthomas.d...@gmail.com>
wrote:

>
> Yes, Ill look into the JNA project too and explore approach 2 with both
> FFM and JNA.
>
> I’ll prototype both approach 1 and 2 and update with a status in here.
>
> > On Jun 10, 2025, at 1:50 PM, Ian Maxon <ima...@apache.org> wrote:
> >
> > The Vector API is in OpenJDK, so I think the licensing should be OK:
> > https://openjdk.org/jeps/508
> >
> > The main problem is the fact it isn't a stable API yet, and it relies
> > on Valhalla. It would be a judgement call on how much we expect it to
> > change over time, and how difficult it would be to migrate things to
> > follow those changes. It would also be a bet that by the time
> > everything is done, these set of JDK features are more or less
> > stabilized.
> >
> > Using FFI/JNI would be a more traditional way to go about it. FFI is
> > new and better than JNI, so if we choose to go with that, it should be
> > less painful. FFI is a preview feature, which is less risky than an
> > incubating feature.
> >
> > There is also the JNA project, which wraps JNI to make it simpler:
> > https://github.com/java-native-access/jna . I'm assuming most of the
> > libraries we might want to use are mostly computational, so they
> > wouldn't have many platform-specific dependencies, just architecture
> > specific ones. I think it also handles the build aspect of it, which
> > FFI doesn't directly. Assuming the libraries we would want to use
> > aren't in libc or otherwise can't be assumed to be present, we would
> > have to include them in the jar somehow.
> >
> >
> >> On Tue, Jun 10, 2025 at 8:27 AM Mike Carey <dtab...@gmail.com> wrote:
> >>
> >> Q:  Are there licensing gotchas with approach 1 (which otherwise sounds
> >> nicer from a maintenance standpoint)? We need to be sure that everything
> >> we use is Apache-okay in terms of licensing.  It would be fun to see
> >> some preliminary numbers on perf, e.g., for KNN, each way, were it as
> >> easy as changing which function(s) to call...  :-)  That would help
> >> quantify the two options (vs. each other and vs. none) too.
> >>
> >>> On 6/10/25 7:24 AM, Calvin Dani wrote:
> >>> Hi,
> >>>
> >>> As part of adding vector functionality to AsterixDB, I have been
> exploring
> >>> possible optimizations for vector computations. One promising
> direction is
> >>> leveraging SIMD operations to accelerate these calculations. Although
> Java
> >>> offers autovectorization to utilize SIMD, this approach requires the
> >>> operations to be branchless (i.e., no conditional branching like
> if/else),
> >>> and it may not always be triggered when vector calculations get
> complex.
> >>>
> >>> I have considered two main options for SIMD-enabled vector computation:
> >>>
> >>> 1. Java Vector API: Introduced as an incubation feature since Java 17,
> the
> >>> Vector API is part of the long-term Project Valhalla. While it remains
> in
> >>> incubation and likely won’t be finalized until Project Valhalla
> completes,
> >>> the API already supports the basic operations needed for our distance
> >>> metrics, such as Euclidean Distance, Manhattan Distance, Cosine
> Similarity,
> >>> and Dot Product. It also provides a primitive Vector<E> type which
> could
> >>> serve as a native storage for embeddings.
> >>>
> >>> 2. Foreign Function & Memory API: This allows calling optimized C/C++
> >>> libraries directly from Java. We could either leverage existing
> >>> highly-optimized vector computation libraries or implement our own
> native
> >>> code. However, packaging and ensuring compatibility of native libraries
> >>> across different target platforms may introduce complexity.
> >>>
> >>> If you are aware of other solutions or have feedback on these options,
> I
> >>> would appreciate your insights.
> >>>
> >>> Thank you,
> >>> Calvin Dani
> >>>
>


-- 

*Regards,*
Wail Alkowaileet

Re: Vector Computation Optimization Approaches for AsterixDB

Reply via email to