Hi, I've hit an issue in Python (3.9.12) where creating a Pyarrow dataset over a remote filesystem (such as GCS filesystem), and then opening a batch iterator over the dataset and having the program immediately exit / clean-up afterwards causes a PyGILState_Release error to get thrown. This is with pyarrow version v7.0.0.
The error looks like: Fatal Python error: PyGILState_Release: thread state 0x7fbfd4002380 must be current when releasing Python runtime state: finalizing (tstate=0x55a079959380) Thread 0x00007fbfff5ee400 (most recent call first): <no Python frame> Example reproduce code: import pandas as pd import pyarrow.dataset as ds # Get GCS fsspec filesystem fs = get_gcs_fs() dummy_df = pd.DataFrame({"a": [1,2,3]}) # Write out some dummy data for us to load a dataset from data_path = "test-bucket/debug-arrow-datasets/data.parquet" with fs.open(data_path, "wb") as f: dummy_df.to_parquet(f) dummy_ds = ds.dataset([data_path], filesystem=fs) batch_iter = dummy_ds.to_batches() # Program finish # Putting some buffer time after the iterator is opened causes the issue to go away # import time # time.sleep(1) Using local parquet files for the dataset, adding some buffer time between iterator open and program exit (via time.sleep or something else), or consuming the entire iterator seems to make the issue go away. Is this reproducible if you swap in your own GCS filesystem? Thanks, Alex