Micah Kornfield created ARROW-11353: ---------------------------------------
Summary: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema Key: ARROW-11353 URL: https://issues.apache.org/jira/browse/ARROW-11353 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield {{The following shouldn't throw}} {{>>> import pyarrow as pa}} {{>>> import pyarrow.parquet as pq}} {{>>> import pyarrow.dataset as ds}} {{>>> pa.__version__}} {{'2.0.0'}} {{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}} {{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}} {{>>> pq.write_table(table, "/tmp/example.parquet")}} {{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}} {{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}} {{format="parquet").to_table()}} {{Traceback (most recent call last):}} {{ File "<stdin>", line 1, in <module>}} {{ File "pyarrow/_dataset.pyx", line 405, in}} {{pyarrow._dataset.Dataset.to_table}} {{ File "pyarrow/_dataset.pyx", line 2262, in}} {{pyarrow._dataset.Scanner.to_table}} {{ File "pyarrow/error.pxi", line 122, in}} {{pyarrow.lib.pyarrow_internal_check_status}} {{ File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}} {{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}} {{From: utf8: string To: utf8: large_string}} -- This message was sent by Atlassian Jira (v8.3.4#803005)