[ https://issues.apache.org/jira/browse/ARROW-4267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4267: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/15939 > [Python/C++][Parquet] Segfault when reading rowgroups with duplicated columns > ----------------------------------------------------------------------------- > > Key: ARROW-4267 > URL: https://issues.apache.org/jira/browse/ARROW-4267 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 0.11.1 > Reporter: Florian Jetter > Assignee: Uwe Korn > Priority: Minor > Labels: parquet, pull-request-available > Fix For: 0.12.1, 0.13.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > When reading a row group using duplicated columns I receive a segfault. > {code:python} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({ > "col": ["A", "B"] > }) > table = pa.Table.from_pandas(df) > buf = pa.BufferOutputStream() > pq.write_table(table, buf) > parquet_file = pq.ParquetFile(buf.getvalue()) > parquet_file.read_row_group(0) > parquet_file.read_row_group(0, columns=["col"]) > # boom > parquet_file.read_row_group(0, columns=["col", "col"]) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)