Hey all, Wanted to share a project I've been hacking on: Carquet, a pure C library for reading/writing Parquet files.
https://github.com/Vitruves/carquet The pitch is simple: no C++, no Boost, no Arrow dependency. Just C99 with zstd/zlib for compression (auto-fetched by CMake if missing). Should build pretty much anywhere. What works: - Read/write all physical types - Dictionary, RLE, Delta, BYTE_STREAM_SPLIT encodings - Snappy, LZ4, ZSTD, GZIP compression (Snappy/LZ4 are internal implementations) - SIMD paths for x86 (SSE/AVX2/AVX512) and ARM (NEON/SVE), with scalar fallbacks - Big-endian support What's missing (for now): - Nested types / repetition levels (only flat schemas) - Encryption - Bloom filters are read-only I've tested interop with PyArrow-generated files but would be curious if anyone spots edge cases that break. The codebase is ~15k lines, MIT licensed. Not trying to replace parquet-cpp obviously - different tradeoffs. More aimed at embedded stuff or places where pulling in Arrow isn't practical. Feedback welcome, happy to fix any spec violations you might notice. Cheers, Johan NATTER Doctor of Pharmacy and PhD Candidate in Organic and Medicinal Chemistry Laboratory of Therapeutic Innovation, CNRS UMR7200 Bihel-Schmitt Group | Strasbourg Drug Institute Email: [email protected] <mailto:[email protected]>.fr Phone: +33 6 72 34 71 54 <tel:+33672347154> Web: Bihel-Schmitt Group <https://medchem.unistra.fr/chemogenomique-et-chimie-medicinale-ccm/groupe-bihel-schmitt/> | IMS <https://ims.unistra.fr/> Social: X/Twitter <https://x.com/JohanNatter> | LinkedIn <https://www.linkedin.com/in/johan-natter-117a36158/> I occasionally send messages outside regular working hours. These messages are not urgent and do not require a prompt response.
