gsvgit opened a new issue, #13545: URL: https://github.com/apache/datafusion/issues/13545
### Is your feature request related to a problem or challenge? SQL (standard) was recently extended with property graph querying features (PGQ): [ISO standard](https://www.iso.org/standard/79473.html), [theoretical foundations](https://arxiv.org/abs/2409.01102). I wonder if DataFusion can be extended with PGQ. ### Describe the solution you'd like All parts should be extended. The most nontrivial part is interconnection between traditional SQL and graph analysis (path-related evaluations). While it is possible to store graph in columnar storage (e.g. Apache Arrow), it may be inefficient for path-related queries (while pretty efficient for attributes-of-vertex-related analytical queries). So, specific path-indexes may be required. Even more, in some cases it may be good idea to store graph topology in separated storage in specific format (e.g. sparse adjacency matrix, similar to [FalkorDB](https://docs.falkordb.com/design/#the-theory-ideas-behind-falkordb)). On the other hand, even if we store graph in columnar storage, linear-algebra-related primitives can be useful for path querying ([DuckPGQ: Efficient Property Graph Queries in an analytical RDBMS](https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.cidrdb.org/cidr2023/papers/p66-wolde.pdf&ved=2ahUKEwiw3On90PSJAxVpIhAIHZLeCnQQFnoECBcQAQ&usg=AOvVaw3a0YNXq5JLEWEJ4lNRAKdk)). So, logical and physical plans should provide not only specific operators, but support balancing between data representation. ### Describe alternatives you've considered _No response_ ### Additional context Can something like [this project](https://github.com/code-sam/graphblas_sparse_linear_algebra) be used for physical level of linear algebra? [Possible theoretical foundations](https://www.irif.fr/~rogova/thesis_Alexandra_Rogova.pdf). It may be a first step to support [GQL](https://www.iso.org/standard/76120.html). I'm interested in such a system design and development, but I'm aware that such an extension of DataFusion may leads to system recreation. So, I want to discuss this direction: should we extend DataFusion or create new independent system. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
