On 6/30/23 04:21, Bechir Ben Daadouch wrote:
> Dear Apache Arrow Dev Community,
> 
> My name is Bechir, I am currently working on a project that involves
> implementing graph algorithms in Apache Arrow.
> 
> The initial plan was to construct a node structure and a subsequent graph
> that would encompass all the nodes. However, I quickly realized that due to
> Apache Arrow's columnar format, this approach was not feasible.
> 
> I tried a couple of things, including the implementation of the
> shortest-path algorithm. However, I rapidly discovered that manipulating
> arrow objects, particularly when applying graph algorithms, proved more
> complex than anticipated and it became very clear that I would need to
> resort to some data structures outside of what arrow offers (i.e.: Heapq
> wouldn't be possible using arrow).
> 
> I also gave a shot at doing it similar to a certain SQL method (see:
> https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I
> ended up having to resort to using Pandas for some transformations.
> 
> My next course of action is to experiment with compressed sparse rows,
> hoping to execute Matrix Multiplication using this method. But honestly,
> with what I know right now, I remain skeptical about the feasibility
> of it. However,
> before committing to this approach, I would greatly appreciate your opinion
> based on your experience with Apache Arrow.
> 
> Thank you very much for your time.
> 
> Looking forward to potentially discussing this further.
> 
> Many thanks,
> Bechir
> 
Arrow may not be the best choice for most graph algorithms as they
typically require random memory accesses that will be difficult to
coalesce into forms that allow for vectorization. If your data will fit
in memory of a single node, you might consider:
https://github.com/DrTimothyAldenDavis/GraphBLAS
https://pypi.org/project/python-graphblas/
https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl

Reply via email to