Hi Kyle,
I'm not sure if Rust contributors monitor this list, you might have better
luck opening an issue on the Rust Repo [1]

[1] https://github.com/apache/arrow-rs

On Sun, Feb 27, 2022 at 7:28 PM Kyle Barron <[email protected]> wrote:

> Hello!
>
> I've used Arrow a decent bit in Python and JS but I'm pretty new to Rust.
> I'm trying to write a  minimal binding of Rust's Parquet to WebAssembly in
> order to decode Parquet files to Arrow on the web. I have code that works
> <https://github.com/kylebarron/parquet-wasm/blob/main/src/lib.rs> but
> only some of the time. For example this test data
> <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/works.parquet>
>  (created here
> <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/generate_data.py#L40-L43>)
> seems to work with the js arrow.RecordBatchReader
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/www/index.js#L50-L52>
>  but other test data
> <https://github.com/kylebarron/parquet-wasm/blob/9495a87e00ae7073966d171bdcbfa1b87c63991b/data/not_work.parquet>
>  (created here
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/data/generate_data.py#L45-L48>)
> raises with "Error: Expected to read 1249648 metadata bytes, but only read
> 300.".
>
> Based on logging, it *seems* as if parsing the Parquet file goes
> smoothly. It's only writing the Arrow IPC format that fails (on the JS side
> when trying to verify it). I'm currently trying to create the StreamWriter
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L122-L123>,
> then write all the Arrow RecordBatches into the writer
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L127-L128>,
> then finish the writer
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L142>,
> and send the output back to JS
> <https://github.com/kylebarron/parquet-wasm/blob/79580c64c698570fd1a8a48b55698ca0be630aa8/src/lib.rs#L145-L156>
> .
>
> Has anyone seen a similar problem before, or any suggestions of where to
> debug further? Alternatively, if an end-to-end example exists of reading
> from a parquet file and returning an Arrow buffer would be very helpful to
> see.
>
> Best,
> Kyle Barron
>
>

Reply via email to