Hi Kyle, I'm not sure if Rust contributors monitor this list, you might have better luck opening an issue on the Rust Repo [1]
[1] https://github.com/apache/arrow-rs On Sun, Feb 27, 2022 at 7:28 PM Kyle Barron <[email protected]> wrote: > Hello! > > I've used Arrow a decent bit in Python and JS but I'm pretty new to Rust. > I'm trying to write a minimal binding of Rust's Parquet to WebAssembly in > order to decode Parquet files to Arrow on the web. I have code that works > <https://github.com/kylebarron/parquet-wasm/blob/main/src/lib.rs> but > only some of the time. For example this test data > <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/works.parquet> > (created here > <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/generate_data.py#L40-L43>) > seems to work with the js arrow.RecordBatchReader > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/www/index.js#L50-L52> > but other test data > <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/not_work.parquet> > (created here > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/data/generate_data.py#L45-L48>) > raises with "Error: Expected to read 1249648 metadata bytes, but only read > 300.". > > Based on logging, it *seems* as if parsing the Parquet file goes > smoothly. It's only writing the Arrow IPC format that fails (on the JS side > when trying to verify it). I'm currently trying to create the StreamWriter > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L122-L123>, > then write all the Arrow RecordBatches into the writer > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L127-L128>, > then finish the writer > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L142>, > and send the output back to JS > <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L145-L156> > . > > Has anyone seen a similar problem before, or any suggestions of where to > debug further? Alternatively, if an end-to-end example exists of reading > from a parquet file and returning an Arrow buffer would be very helpful to > see. > > Best, > Kyle Barron > >
