[ 
https://issues.apache.org/jira/browse/ARROW-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-375.
--------------------------------
    Resolution: Fixed

Issue resolved by pull request 204
[https://github.com/apache/arrow/pull/204]

> columns parameter in parquet.read_table() raises KeyError for valid column
> --------------------------------------------------------------------------
>
>                 Key: ARROW-375
>                 URL: https://issues.apache.org/jira/browse/ARROW-375
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Christopher Aycock
>            Assignee: Wes McKinney
>
> Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have
> {code:none}
> In [1]: from pyarrow import parquet
> In [2]: t = 
> parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')
> In [3]: t.to_pandas()
> Out[3]: 
>    age name
> 0    1    A
> 1    2    B
> 2    3    C
> In [4]: t = 
> parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', 
> columns=['age'])
> ---------------------------------------------------------------------------
> KeyError                                  Traceback (most recent call last)
> <ipython-input-4-5cf213819489> in <module>()
> ----> 1 t = 
> parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', 
> columns=['age'])
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in 
> pyarrow.parquet.read_table 
> (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
>     143         return reader.read_all()
>     144     else:
> --> 145         column_idxs = [reader.column_name_idx(column) for column in 
> columns]
>     146         arrays = [reader.read_column(column_idx) for column_idx in 
> column_idxs]
>     147         return Table.from_arrays(columns, arrays)
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in 
> pyarrow.parquet.ParquetReader.column_name_idx 
> (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
>     102                 
> self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())]
>  = i
>     103 
> --> 104         return self.column_idx_map[column_name]
>     105 
>     106     def read_column(self, int column_index):
> KeyError: 'age'
> {code}
> This happens on both Mac and Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to