Usage of Azure filesystem with fsspec and adlfs and pyarrow to download a list of blobs (parquets) concurrently with columns pruning and rows filtering

2023-12-01 Thread Luca Maurelli
I'm new to these libraries so bear with me, I am learning a lot these days. I started using fsspec and adlfs with the idea of switching between a cloud storage to a local storage with little effort. I read that adlfs makes use of the Azure Blob Storage Python SDK which supports the use of async/

[Parquet] How to write hive partitioning with partitioning keys in the file

2023-12-01 Thread Haocheng Liu
Hi community, Hope this email finds you well. Can folk guide how to write hive partitioning with partitioning keys *in the file*? Right now only the subset of the data will be written. Both Python pyarrow.dataset.wite_dataset(...)

Issue with Apache Arrow C++ Setup using Conda on Windows

2023-12-01 Thread Divyansh Khatri
Greetings, I am writing to report an issue I am encountering while attempting to set up Apache Arrow C++ using Conda on Windows. I have successfully followed the first two steps outlined in the official documentation ( https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for