GitHub user SYaoJun created a discussion: Proposal: Introduce Vortex Columnar Format Support in GraphAr
## Background Currently, several emerging columnar file formats—such as [Vortex](https://github.com/vortex-data/vortex), [Lance](https://github.com/lance-format/lance), [F3](https://github.com/future-file-format/F3), BtrBlocks, Nimble, and Parquet variants—demonstrate strong performance advantages in specific scenarios. I wonder whether supporting these formats in GraphAr could significantly reduce storage overhead and improve query performance at scale. ## Benefits 1. Introducing the Vortex columnar format can improve storage efficiency and query performance through better compression and vectorized execution. 2. It enables more flexible column-level encoding strategies, which can better align with analytical graph workloads. 3. Vortex is designed to be GPU-friendly, particularly in AI and analytics scenarios. ## Effects of Modifications 1. Storage layer implementation and format adapters 2. All binding languages require adoption. ```shell enum class FileType : int32_t { CSV = 0, PARQUET = 1, ORC = 2, JSON = 3 }; ``` ## Evidence from Other Projects Vortex has already been integrated into DuckDB, where it demonstrates substantial performance improvements on analytical workloads such as TPC-H. Reported results show significant gains in scan efficiency and query execution time compared to traditional columnar formats. detail in this [blog](https://duckdb.org/2026/01/23/duckdb-vortex-extension). What do others think about this idea? GitHub link: https://github.com/apache/incubator-graphar/discussions/887 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
