Dave Hirschfeld created ARROW-5568:
--------------------------------------
Summary: [Python] Allow parsing more general JSON formats
Key: ARROW-5568
URL: https://issues.apache.org/jira/browse/ARROW-5568
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Dave Hirschfeld
I have JSON data where the columnar (line-delimited) part is in a `data` subkey:
{code:java}
{
"metadata": {"name": "block1"},
"data" : [
{"a": 1, "b": 2.0, "c": "foo", "d": false},
{"a": 4, "b": -5.5, "c": null, "d": true}
]
}
{code}
It would be good if the arrow JSON parser could allow specifying where the
columnar data is stored.
Since the `metadata` is also important to me it would be even better if the
rest of the JSON could be returned as a Python dict with the only the specified
keys parsed as arrow tables - e.g.
{code:java}
>>> block1 = json.read_json(fn, tables=['data'])
>>> block1['data']
pyarrow.Table
a: int64
b: double
c: string
d: bool
>>> block1['metadata']
{'name': 'block1'}
>>> block1
{
"metadata": {"name": "block1"},
"data" : pyarrow.Table
}{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)