[ https://issues.apache.org/jira/browse/ARROW-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-5310: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/16728 > [Python] better error message on creating ParquetDataset from empty directory > ----------------------------------------------------------------------------- > > Key: ARROW-5310 > URL: https://issues.apache.org/jira/browse/ARROW-5310 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Joris Van den Bossche > Assignee: Joris Van den Bossche > Priority: Minor > Labels: dataset, dataset-parquet-read, parquet > Fix For: 1.0.0 > > > Currently, you get when {{path}} is an existing but empty directory: > {code:python} > >>> dataset = pq.ParquetDataset(path) > --------------------------------------------------------------------------- > IndexError Traceback (most recent call last) > <ipython-input-16-346f72ae525e> in <module> > ----> 1 dataset = pq.ParquetDataset(path) > ~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, > path_or_paths, filesystem, schema, metadata, split_row_groups, > validate_schema, filters, metadata_nthreads, memory_map) > 989 > 990 if validate_schema: > --> 991 self.validate_schemas() > 992 > 993 if filters is not None: > ~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self) > 1025 self.schema = self.common_metadata.schema > 1026 else: > -> 1027 self.schema = self.pieces[0].get_metadata().schema > 1028 elif self.schema is None: > 1029 self.schema = self.metadata.schema > IndexError: list index out of range > {code} > That could be a nicer error message. > Unless we actually want to allow this? (although I am not sure there are good > use cases of empty directories to support this, because from an empty > directory we cannot get any schema or metadata information?) > It is only failing when validating the schemas, so with > {{validate_schema=False}} it actually returns a ParquetDataset object, just > with an empty list for {{pieces}} and no schema. So it would be easy to not > error when validating the schemas as well for this empty-directory case. -- This message was sent by Atlassian Jira (v8.20.10#820010)