Re: RFC: Type inference rules

2018-11-30 Thread Wes McKinney
On Fri, Nov 30, 2018 at 10:00 AM Ben Kietzman wrote: > > I think the fallback graph approach might still be useful in the case of > parsing with unions allowed, albeit with a much broader graph. > > For example, > > INT64 + STRING -> UNION(INT64, STRING) > T + UNION (*) -> UNION(T, *) > # ... > >

Re: RFC: Type inference rules

2018-11-30 Thread Ben Kietzman
I think the fallback graph approach might still be useful in the case of parsing with unions allowed, albeit with a much broader graph. For example, INT64 + STRING -> UNION(INT64, STRING) T + UNION (*) -> UNION(T, *) # ... Related: how should ordinarily convertible types be handled in the contex

Re: RFC: Type inference rules

2018-11-30 Thread Wes McKinney
I think there's two useful modes for for schema-on-read: * Unions allowed * Unions not allowed We haven't implemented union inference for converting Python sequences yet. see e.g. In [1]: import pyarrow as pa In [2]: pa.array([{'a': 'foo'}, {'a': 'bar'}]) Out[2]: -- is_valid: all not null -- c

Re: RFC: Type inference rules

2018-11-30 Thread Antoine Pitrou
Le 30/11/2018 à 15:43, Ben Kietzman a écrit : > Hi Antoine, > > The conversion of previous blocks is part of the fall back mechanism I'm > trying to describe. When type inference fails (even in a different block), > conversion of all blocks of the column is attempted to the next type in the > fa

Re: RFC: Type inference rules

2018-11-30 Thread Francois Saint-Jacques
Hello, With JSON and other "typed" formats (msgpack, protobuf, ...) you need to take account unions, e.g. {a: "herp", b: 10} {a: true, c: "derp"} The type for `a` would be union. I think we should also evaluate into investing at ingesting different schema DSL (protobuf idl, json-schema) to avoi

Re: RFC: Type inference rules

2018-11-30 Thread Ben Kietzman
Hi Antoine, The conversion of previous blocks is part of the fall back mechanism I'm trying to describe. When type inference fails (even in a different block), conversion of all blocks of the column is attempted to the next type in the fallback graph. If there is no problem with the fallback grap

Re: RFC: Type inference rules

2018-11-30 Thread Antoine Pitrou
Hi Ben, Le 30/11/2018 à 02:19, Ben Kietzman a écrit : > Currently, to figure out which types may be inferred and under which > circumstances they will be inferred involves digging through code. I think > it would be useful to have an API for expressing type inference rules. > Ideally this would

RFC: Type inference rules

2018-11-29 Thread Ben Kietzman
Currently, to figure out which types may be inferred and under which circumstances they will be inferred involves digging through code. I think it would be useful to have an API for expressing type inference rules. Ideally this would be provided as utility functions alongside StringConverter and us