This has been reported as https://issues.apache.org/jira/browse/ARROW-10237,
and is in the meantime also already fixed.
Joris
On Thu, 8 Oct 2020 at 18:20, Wes McKinney wrote:
> I haven't looked closely but it looks like a bug, can someone open a
> JIRA issue and copy the reproducible example?
>
I haven't looked closely but it looks like a bug, can someone open a
JIRA issue and copy the reproducible example?
On Thu, Oct 8, 2020 at 10:57 AM Jadczak, Matt
wrote:
>
> I am unsure if this behaviour is intended (and duplicate values should be
> forbidden), but it seems to me that the reason t
I am unsure if this behaviour is intended (and duplicate values should be
forbidden), but it seems to me that the reason this is happening is that when
re-encoding an Arrow dictionary as a Parquet one, the function at
https://github.com/apache/arrow/blob/4bbb74713c6883e8523eeeb5ac80a1e1f8521674/
Hi,
I've found the following odd behaviour when round-tripping data via parquet
using pyarrow, when the data contains dictionary arrays with duplicate values.
```python
import pyarrow as pa
import pyarrow.parquet as pq
my_table = pa.Table.from_batches(
[
pa.Recor