Scott Burns created ARROW-5195: ---------------------------------- Summary: read_csv ignores null_values on string types Key: ARROW-5195 URL: https://issues.apache.org/jira/browse/ARROW-5195 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.13.0 Environment: Python 3.6, PyArrow 0.13.0, AWS linux, debian-slim in docker Reporter: Scott Burns
Let's write a simple CSV with NULL values in a string column: {quote}with open('foo.csv', 'w') as fobj: fobj.write('col1,col2\n1,value\n2,NULL') table = csv.read_csv('foo.csv') table.column('col2').null_count # => 0 {quote} table.column('col2').null_count will be 0, I think it should be 1. Passing in {{ConvertOptions(null_values=["NULL"])}} doesn't help. Note that {{pandas.read_csv}} parses these NULLs correctly so I have a workaround available. But I'd prefer to natively read CSV from pyarrow if possible :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)