Bogdan Klichuk created ARROW-5811: ------------------------------------- Summary: pyarrow.csv.read_csv: Ability to not infer column types. Key: ARROW-5811 URL: https://issues.apache.org/jira/browse/ARROW-5811 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.13.0 Environment: Ubuntu Xenial Reporter: Bogdan Klichuk
I'm trying to read CSV as is. All columns as strings. I don't know the schema of these CSVs and they will vary as they are provided by user. Right now i'm using pandas.read_csv(dtype=str) which works great, but since final destination of these CSVs are parquet files it seems like much more efficient to use pyarrow.csv.read_csv in future, as soon as this becomes available :) I tried things like `pyarrow.csv.read_csv(convert_types=ConvertOptions(columns_types=defaultdict(lambda: 'string')))` but it doesn't work. Maybe I just didnt' find something that already exists? :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)