Re: Apache Arrow | Graph Algorithms & Data Structures

Bechir Ben Daadouch Fri, 30 Jun 2023 13:20:15 -0700

Hi,

Adam, thank you :)

Aldrin:

First, I tried to implement the basic data structures needed for graph
algorithms, namely a queue and a heapq (for breadth first search
and dijkstra (weighted shortest path)). That's when I had some problems
since Apache Arrow objects are immutable.

This is how I implemented the queue:
https://replit.com/@BechirBen2/ArrowQueue#arrow_array_queue.py

I used the concat_arrays method in the enqueue method, basically creating a
new array every time I add an object to the queue.

This is how I implemented breadth first search:
https://replit.com/@BechirBen2/Graph-Apache-Arrow#arrow_breadh_first_search.py

As you can see I had to resort to converting the objects to python objects
and using python Set.

for the Heapq, I haven't tried since I'll probably be doing some back and
forth between Arrow objects and Python types, I'd be better off using the
Heapq module from Python directly.

Since I can't do operations elementwise, I saw no use in using adjacency
lists. Here is a screenshot on how I represented the data:
https://ibb.co/pxyFpb6
I basically saved the nodes as Int64 and the neighboring nodes in a List.

For the sql approach I did something similar to this:
https://imgbb.com/0rPGB42

I basically use the pyarrow.compute module:
1- Filter the neighbors of the start Node: https://ibb.co/2gMDHnX
2- use pc.list_flatten and then check if the target node is in the
neighboring nodes using pc.is_in(flat_neighbors, target_node)

3- If not, I convert the pyarrow table to pandas, then I explode the
neighboring_nodes, and save the start_node in a new column called
source_path: https://ibb.co/60Vj0RX

and I continue like this until we reach the target or we reach the last
node(s) without neighbors

Gavin: Thank you for taking the time to give a code example. I already did
^^

On Fri, Jun 30, 2023 at 8:55 PM Gavin Ray <[email protected]> wrote:

> This isn't particularly efficient, but could you do something like this?
>
> https://replit.com/@GavinRay97/EnlightenedRichAdministration#main.py
>
> On Fri, Jun 30, 2023 at 1:10 PM Aldrin <[email protected]> wrote:
>
> > > But I found out very quickly that I won't be able to... using only
> > Apache Arrow without resorting to other libraries.
> >
> > > I am aiming to assess the viability of Apache Arrow for graph
> algorithms
> > and data structures...
> >
> > > I also gave a shot at doing it similar to a certain SQL method...
> >
> > I'm curious about these portions of what you've said.
> >
> > Could you share what you have tried and what roadblocks you're hitting?
> > Are you struggling with mutability? How are you representing your data?
> You
> > mention heapq, but it's not clear if you're using an adjacency matrix or
> > adjacency lists or if you're using a more normalized relational format.
> >
> > Thanks!
> >
> >
> > # ------------------------------
> >
> > # Aldrin
> >
> >
> > https://github.com/drin/
> >
> > https://gitlab.com/octalene
> >
> > https://keybase.io/octalene
>

Re: Apache Arrow | Graph Algorithms & Data Structures

Reply via email to