This seems like a bug or a miss.  I opened:
https://issues.apache.org/jira/browse/ARROW-11353 to track a fix.

On Sun, Jan 17, 2021 at 9:18 PM Steve Kim <chairm...@gmail.com> wrote:

> > This should be possible already, at least on git master but perhaps also
> > in 2.0.0.  Which problem are you encountering?
>
> With pyarrow 2.0.0, I encountered the following:
>
> ```
> >>> import pyarrow as pa
> >>> import pyarrow.parquet as pq
> >>> import pyarrow.dataset as ds
> >>> pa.__version__
> '2.0.0'
> >>> schema = pa.schema([pa.field("utf8", pa.utf8())])
> >>> table = pa.Table.from_pydict({"utf8": ["foo", "bar"]}, schema)
> >>> pq.write_table(table, "/tmp/example.parquet")
> >>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])
> >>> ds.dataset("/tmp/example.parquet", schema=large_schema,
> format="parquet").to_table()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow/_dataset.pyx", line 405, in
> pyarrow._dataset.Dataset.to_table
>   File "pyarrow/_dataset.pyx", line 2262, in
> pyarrow._dataset.Scanner.to_table
>   File "pyarrow/error.pxi", line 122, in
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
> pyarrow.lib.ArrowTypeError: fields had matching names but differing types.
> From: utf8: string To: utf8: large_string
> ```
>
> I reproduced this behavior with pyarrow built from source on the master
> branch (5f1be953).
>

Reply via email to