[Java] Is hardcoding NullVector .getField() intentional?

2021-07-23 Thread Al Taylor
Hi, I recently encountered the fact that the .getField() method of NullVector returns a Field with a hardcoded name. https://github.com/apache/arrow/blob/apache-arrow-5.0.0/java/vector/src/main/java/org/apache/arrow/vector/NullVector.java#L66 This is currently hardcoded to public static fin

[Rust] Is Rust Arrow's deserialization code zero-copy?

2021-04-06 Thread Al Taylor
Hi, I was reading around the rust-arrow codebase, evaluating it for potential future use. I'm particularly interested in zero-copy processing. I could very well be wrong here, as I don't have a lot of rust experience, but it looks like the code for reading buffers out of IPC messages is copyin

[Python] Dictionary Arrays with duplicate values jumbling on round-trip to parquet

2020-10-08 Thread Al Taylor
Hi, I've found the following odd behaviour when round-tripping data via parquet using pyarrow, when the data contains dictionary arrays with duplicate values. ```python import pyarrow as pa import pyarrow.parquet as pq my_table = pa.Table.from_batches( [ pa.Recor

Pyarrow RecordBatchStreamWriter and dictionaries

2020-09-28 Thread Al Taylor
Hi, I've found that when I serialize two recordbatches which have a dictionary-encoded field, but different encoding dictionaries to a sequence of pybytes with a RecordBatchStreamWriter, then deserialize using pa.ipc.open_stream(), the dictionaries get jumbled. (or at least, on deserialization

[jira] [Created] (ARROW-8773) pyarrow schema.empty_table() does not preserve nullability of fields

2020-05-12 Thread Al Taylor (Jira)
Al Taylor created ARROW-8773: Summary: pyarrow schema.empty_table() does not preserve nullability of fields Key: ARROW-8773 URL: https://issues.apache.org/jira/browse/ARROW-8773 Project: Apache Arrow