This seems like a bug or a miss. I opened: https://issues.apache.org/jira/browse/ARROW-11353 to track a fix.
On Sun, Jan 17, 2021 at 9:18 PM Steve Kim <chairm...@gmail.com> wrote: > > This should be possible already, at least on git master but perhaps also > > in 2.0.0. Which problem are you encountering? > > With pyarrow 2.0.0, I encountered the following: > > ``` > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> import pyarrow.dataset as ds > >>> pa.__version__ > '2.0.0' > >>> schema = pa.schema([pa.field("utf8", pa.utf8())]) > >>> table = pa.Table.from_pydict({"utf8": ["foo", "bar"]}, schema) > >>> pq.write_table(table, "/tmp/example.parquet") > >>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())]) > >>> ds.dataset("/tmp/example.parquet", schema=large_schema, > format="parquet").to_table() > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "pyarrow/_dataset.pyx", line 405, in > pyarrow._dataset.Dataset.to_table > File "pyarrow/_dataset.pyx", line 2262, in > pyarrow._dataset.Scanner.to_table > File "pyarrow/error.pxi", line 122, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status > pyarrow.lib.ArrowTypeError: fields had matching names but differing types. > From: utf8: string To: utf8: large_string > ``` > > I reproduced this behavior with pyarrow built from source on the master > branch (5f1be953). >