Null can also come up when converting a column with only NA values in a CSV file. I don't remember for sure, but I think the same can happen with JSON files as well.
Can't we accept both forms when reading? It sounds like it should be reasonably easy. Regards Antoine. Le 06/09/2019 à 17:36, Wes McKinney a écrit : > hi Micah, > > Null wouldn't come up that often in practice. It could happen when > converting from pandas, for example > > In [8]: df = pd.DataFrame({'col1': np.array([np.nan] * 10, dtype=object)}) > > In [9]: t = pa.table(df) > > In [10]: t > Out[10]: > pyarrow.Table > col1: null > metadata > -------- > {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, > "' > b'stop": 10, "step": 1}], "column_indexes": [{"name": null, > "field' > b'_name": null, "pandas_type": "unicode", "numpy_type": "object", > ' > b'"metadata": {"encoding": "UTF-8"}}], "columns": [{"name": > "col1"' > b', "field_name": "col1", "pandas_type": "empty", "numpy_type": > "o' > b'bject", "metadata": null}], "creator": {"library": "pyarrow", > "v' > b'ersion": "0.14.1.dev464+g40d08a751"}, "pandas_version": > "0.24.2"' > b'}'} > > I'm inclined to make the change without worrying about backwards > compatibility. If people have been persisting data against the > recommendations of the project, the remedy is to use an older version > of the library to read the files and write them to something else > (like Parquet format) in the meantime. > > Obviously come 1.0.0 we'll begin to make compatibility guarantees so > this will be less of an issue. > > - Wes > > On Thu, Sep 5, 2019 at 11:14 PM Micah Kornfield <emkornfi...@gmail.com> wrote: >> >> Hi Wes and others, >> I don't have a sense of where Null arrays get created in the existing code >> base? >> >> Also, do you think it is worth the effort make this backwards compatible. >> We could in theory tie the buffer count to having the continuation value >> for alignment. >> >> The one area were I'm slightly concerned is we seem to have users in the >> wild who are depending on backwards compatibility, and I'm try to better >> understand the odds that we break them. >> >> Thanks, >> Micah >> >> On Thu, Sep 5, 2019 at 7:25 AM Wes McKinney <wesmck...@gmail.com> wrote: >> >>> hi folks, >>> >>> One of the as-yet-untested (in integration tests) parts of the >>> columnar specification is the Null layout. In C++ we additionally >>> implemented this by writing two length-0 "placeholder" buffers in the >>> RecordBatch data header, but since the Null layout has no memory >>> allocated nor any buffers in-memory it may be more proper to write no >>> buffers (since the length of the Null layout is all you need to >>> reconstruct it). There are 3 implementations of the placeholder >>> version (C++, Go, JS, maybe also C#) but it never got implemented in >>> Java. While technically this would break old serialized data, I would >>> not expect this to be very frequently occurring in many of the >>> currently-deployed Arrow applications >>> >>> Here is my C++ patch >>> >>> https://github.com/apache/arrow/pull/5287 >>> >>> I'm not sure we need to formalize this with a vote but I'm interested >>> in the community's feedback on how to proceed here. >>> >>> - Wes >>>