Hey all,

Wanted to share a project I've been hacking on: Carquet, a pure C library for 
reading/writing Parquet files.

https://github.com/Vitruves/carquet

The pitch is simple: no C++, no Boost, no Arrow dependency. Just C99 with 
zstd/zlib for compression (auto-fetched by CMake if missing). Should build 
pretty much anywhere.

  What works:
  - Read/write all physical types
  - Dictionary, RLE, Delta, BYTE_STREAM_SPLIT encodings
  - Snappy, LZ4, ZSTD, GZIP compression (Snappy/LZ4 are internal 
implementations)
  - SIMD paths for x86 (SSE/AVX2/AVX512) and ARM (NEON/SVE), with scalar 
fallbacks
  - Big-endian support

  What's missing (for now):
  - Nested types / repetition levels (only flat schemas)
  - Encryption
  - Bloom filters are read-only

I've tested interop with PyArrow-generated files but would be curious if anyone 
spots edge cases that break. The codebase is ~15k lines, MIT licensed.

Not trying to replace parquet-cpp obviously - different tradeoffs. More aimed 
at embedded stuff or places where pulling in Arrow isn't practical.

Feedback welcome, happy to fix any spec violations you might notice.

Cheers,
Johan NATTER
Doctor of Pharmacy and PhD Candidate in Organic and Medicinal Chemistry
Laboratory of Therapeutic Innovation, CNRS UMR7200
Bihel-Schmitt Group | Strasbourg Drug Institute

Email: [email protected] <mailto:[email protected]>.fr
Phone: +33 6 72 34 71 54 <tel:+33672347154>
Web: Bihel-Schmitt Group 
<https://medchem.unistra.fr/chemogenomique-et-chimie-medicinale-ccm/groupe-bihel-schmitt/>
 | IMS <https://ims.unistra.fr/>
Social: X/Twitter <https://x.com/JohanNatter> | LinkedIn 
<https://www.linkedin.com/in/johan-natter-117a36158/>

I occasionally send messages outside regular working hours. These messages are 
not urgent and do not require a prompt response.

Reply via email to