Hi folks,

I've encountered a bug when doing swiss join using a big exec batch, say,
larger than 65535 rows, on the probe side. It turns out to be that in the
algorithm, it is using `uint16_t` to represent the index within the probe
exec batch (the materialize_batch_ids_buf
<https://github.com/apache/arrow/blob/f951f0c42040ba6f584831621864f5c23e0f023e/cpp/src/arrow/acero/swiss_join.cc#L1897C8-L1897C33>),
and row id larger than 65535 will be silently overflow and cause the result
nonsense.

One thing to note is that I'm not exactly using the acero "the acero way".
Instead I carve out some pieces of code from acero and run them
individually. So I'm just wondering that, is this overflow considered a
bug? Or is large exec batch something that should be avoided? (And does
acero have any logic preventing that from happening, e.g., some wild man
like me just throws it an arbitrary large exec batch?)

Thanks.

*Rossi*

Reply via email to