[ https://issues.apache.org/jira/browse/ARROW-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-375. -------------------------------- Resolution: Fixed Issue resolved by pull request 204 [https://github.com/apache/arrow/pull/204] > columns parameter in parquet.read_table() raises KeyError for valid column > -------------------------------------------------------------------------- > > Key: ARROW-375 > URL: https://issues.apache.org/jira/browse/ARROW-375 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Christopher Aycock > Assignee: Wes McKinney > > Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have > {code:none} > In [1]: from pyarrow import parquet > In [2]: t = > parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet') > In [3]: t.to_pandas() > Out[3]: > age name > 0 1 A > 1 2 B > 2 3 C > In [4]: t = > parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', > columns=['age']) > --------------------------------------------------------------------------- > KeyError Traceback (most recent call last) > <ipython-input-4-5cf213819489> in <module>() > ----> 1 t = > parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', > columns=['age']) > /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in > pyarrow.parquet.read_table > (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)() > 143 return reader.read_all() > 144 else: > --> 145 column_idxs = [reader.column_name_idx(column) for column in > columns] > 146 arrays = [reader.read_column(column_idx) for column_idx in > column_idxs] > 147 return Table.from_arrays(columns, arrays) > /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in > pyarrow.parquet.ParquetReader.column_name_idx > (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)() > 102 > self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] > = i > 103 > --> 104 return self.column_idx_map[column_name] > 105 > 106 def read_column(self, int column_index): > KeyError: 'age' > {code} > This happens on both Mac and Linux. -- This message was sent by Atlassian JIRA (v6.3.4#6332)