Thank you very much for the helpful response, Alenka. This provides much more
clarity to the partitioning system and how I should be interacting with it. I’m
in the process of re-processing my dataset to use integers for the date
partitioning, but still use strings for the site identifiers. I do
Hello Kelton,
playing around with the files you referenced and with the code you added
the following can be observed and improved to make the code work:
*1) Defining the partitioning of a dataset*
When running *data.files* on your dataset shows that the files are
partitioned according to the *hi
An example using the pyarrow.dataset api…
data = ds.dataset("global-radiosondes/hires_sonde", filesystem=fs,
format="parquet",
partitioning=["year", "month", "day", "hour", "site"])
subset = (ds.field("year") == "2022") & (ds.field("month") == "01") \
& (ds.field(
Hello - I’m not sure if this is a bug, or if I’m not using the API correctly,
but I have a partitioned parquet dataset stored on a Google Cloud Bucket that I
am attempting to load for analysis. However, when applying filters to the
dataset (using both the pyarrow.dataset and pyarrow.parquet.Parq