Kontinuation opened a new issue, #1615:
URL: https://github.com/apache/datafusion-comet/issues/1615

   ### Describe the bug
   
   Running TPC-DS SF=1 using [queries-spark/q23.sql in 
datafusion-benchmarks](https://github.com/apache/datafusion-benchmarks/blob/main/tpcds/queries-spark/q23.sql)
 fails after https://github.com/apache/datafusion-comet/pull/1605 is merged. 
The exception is raised by the native side:
   
   ```
   org.apache.comet.CometNativeException: range end index 18446744072743568078 
out of range for slice of length 0
           at 
comet::errors::init::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/errors.rs:151)
           at <alloc::boxed::Box<F,A> as 
core::ops::function::Fn<Args>>::call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/boxed.rs:2007)
           at 
std::panicking::rust_panic_with_hook(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:836)
           at 
std::panicking::begin_panic_handler::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:701)
           at 
std::sys::backtrace::__rust_end_short_backtrace(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/sys/backtrace.rs:168)
           at 
rust_begin_unwind(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:692)
           at 
core::panicking::panic_fmt(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panicking.rs:75)
           at 
core::slice::index::slice_end_index_len_fail::do_panic::runtime(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panic.rs:218)
           at 
core::slice::index::slice_end_index_len_fail::do_panic(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/intrinsics/mod.rs:3869)
           at 
core::slice::index::slice_end_index_len_fail(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panic.rs:223)
           at <core::ops::range::Range<usize> as 
core::slice::index::SliceIndex<[T]>>::index(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/slice/index.rs:437)
           at core::slice::index::<impl core::ops::index::Index<I> for 
[T]>::index(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/slice/index.rs:16)
           at 
arrow_data::transform::variable_size::extend_offset_values(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/variable_size.rs:38)
           at 
arrow_data::transform::variable_size::build_extend::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/variable_size.rs:57)
           at <alloc::boxed::Box<F,A> as 
core::ops::function::Fn<Args>>::call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/boxed.rs:2007)
           at 
arrow_data::transform::MutableArrayData::extend(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/mod.rs:722)
           at 
comet::execution::operators::copy::copy_array(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:233)
           at 
comet::execution::operators::copy::copy_or_unpack_array(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:280)
           at 
comet::execution::operators::copy::CopyStream::copy::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:196)
           at 
core::iter::adapters::map::map_try_fold::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/map.rs:95)
           at 
core::iter::traits::iterator::Iterator::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:2370)
           at <core::iter::adapters::map::Map<I,F> as 
core::iter::traits::iterator::Iterator>::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/map.rs:121)
           at <core::iter::adapters::GenericShunt<I,R> as 
core::iter::traits::iterator::Iterator>::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:191)
           at 
core::iter::traits::iterator::Iterator::try_for_each(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:2431)
           at <core::iter::adapters::GenericShunt<I,R> as 
core::iter::traits::iterator::Iterator>::next(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:174)
           at 
alloc::vec::Vec<T,A>::extend_desugared(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/mod.rs:3535)
           at <alloc::vec::Vec<T,A> as 
alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_extend.rs:19)
           at <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_from_iter_nested.rs:42)
           at <alloc::vec::Vec<T> as 
alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_from_iter.rs:34)
           at <alloc::vec::Vec<T> as 
core::iter::traits::collect::FromIterator<T>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/mod.rs:3427)
           at 
core::iter::traits::iterator::Iterator::collect(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:1971)
           at <core::result::Result<V,E> as 
core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/result.rs:1985)
           at 
core::iter::adapters::try_process(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:160)
           at <core::result::Result<V,E> as 
core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/result.rs:1985)
           at 
core::iter::traits::iterator::Iterator::collect(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:1971)
           at 
comet::execution::operators::copy::CopyStream::copy(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:193)
           at <comet::execution::operators::copy::CopyStream as 
futures_core::stream::Stream>::poll_next::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:214)
           at 
core::task::poll::Poll<T>::map(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/task/poll.rs:54)
           at <comet::execution::operators::copy::CopyStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:213)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at <S as 
futures_core::stream::TryStream>::try_poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:206)
           at <futures_util::stream::try_stream::try_fold::TryFold<St,Fut,T,F> 
as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/try_stream/try_fold.rs:81)
           at 
datafusion_physical_plan::joins::hash_join::collect_left_input::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:960)
           at <futures_util::future::future::map::Map<Fut,F> as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/map.rs:55)
           at <futures_util::future::future::Map<Fut,F> as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/lib.rs:86)
           at <core::pin::Pin<P> as 
core::future::future::Future>::poll(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/future/future.rs:124)
           at <futures_util::future::future::shared::Shared<Fut> as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/shared.rs:322)
           at 
futures_util::future::future::FutureExt::poll_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/mod.rs:558)
           at 
datafusion_physical_plan::joins::utils::OnceFut<T>::get_shared(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/utils.rs:1091)
           at 
datafusion_physical_plan::joins::hash_join::HashJoinStream::collect_build_side(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1406)
           at 
datafusion_physical_plan::joins::hash_join::HashJoinStream::poll_next_impl(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1381)
           at <datafusion_physical_plan::joins::hash_join::HashJoinStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1628)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <datafusion_physical_plan::projection::ProjectionStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <datafusion_physical_plan::projection::ProjectionStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <comet::execution::operators::copy::CopyStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:213)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at 
datafusion_physical_plan::joins::hash_join::HashJoinStream::fetch_probe_batch(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1427)
           at 
datafusion_physical_plan::joins::hash_join::HashJoinStream::poll_next_impl(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1384)
           at <datafusion_physical_plan::joins::hash_join::HashJoinStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1628)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <datafusion_physical_plan::projection::ProjectionStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <datafusion_physical_plan::projection::ProjectionStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at 
<datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/aggregates/row_hash.rs:647)
           at <core::pin::Pin<P> as 
futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
           at 
futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
           at <futures_util::stream::stream::next::Next<St> as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/next.rs:32)
           at 
futures_util::future::future::FutureExt::poll_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/mod.rs:558)
           at <futures_util::async_await::poll::PollOnce<F> as 
core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/async_await/poll.rs:37)
           at 
comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:544)
           at 
tokio::runtime::park::CachedParkThread::block_on::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284)
           at 
tokio::task::coop::with_budget(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:167)
           at 
tokio::task::coop::budget(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:133)
           at 
tokio::runtime::park::CachedParkThread::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284)
           at 
tokio::runtime::context::blocking::BlockingRegionGuard::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/blocking.rs:66)
           at 
tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:87)
           at 
tokio::runtime::context::runtime::enter_runtime(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/runtime.rs:65)
           at 
tokio::runtime::scheduler::multi_thread::MultiThread::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:86)
           at 
tokio::runtime::runtime::Runtime::block_on_inner(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:370)
           at 
tokio::runtime::runtime::Runtime::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:342)
           at 
comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:544)
           at 
comet::errors::curry::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/errors.rs:485)
           at 
std::panicking::try::do_call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:584)
           at __rust_try(__internal__:0)
           at 
std::panicking::try(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:547)
           at 
std::panic::catch_unwind(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panic.rs:358)
           at 
comet::errors::try_unwrap_or_throw(/home/wherobots/datafusion-comet/native/core/src/errors.rs:499)
           at 
Java_org_apache_comet_Native_executePlan(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:498)
           at <unknown>(__internal__:0)
           at org.apache.comet.Native.executePlan(Native Method)
           at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:137)
           at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:135)
           at 
org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
           at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:135)
           at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:156)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at 
org.apache.comet.CometBatchIterator.hasNext(CometBatchIterator.java:50)
           at org.apache.comet.Native.executePlan(Native Method)
           at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:137)
           at 
org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:135)
           at 
org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
           at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:135)
           at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:156)
           at 
org.apache.spark.sql.comet.execution.shuffle.CometNativeShuffleWriter.write(CometNativeShuffleWriter.scala:101)
           at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
           at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
           at org.apache.spark.scheduler.Task.run(Task.scala:141)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
           at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
           at java.base/java.lang.Thread.run(Thread.java:1589)
   ```
   
   This is caused by auto-broadcasting the smaller side which contains empty 
record batches. The empty StringArrays in the empty record batches were not 
correctly exported through the Arrow C Data Interface. The very large value 
`18446744072743568078` in the error message is the first offset value in the 
offset buffer, it should be `0` when the array is empty (see [Arrow Columnar 
Format 
Spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout)
 for details). However, it turns out to be some garbled data.
   
   There were efforts in the past for fixing problems exporting empty var-sized 
binary array, https://github.com/apache/arrow/issues/40038 and the 
corresponding PR https://github.com/apache/arrow/pull/40043 exports a non-null 
offset buffers for empty arrays. However, this fix still has one problem: the 
newly allocated offset buffer is not properly initialized, which leaves garbled 
offset value in the offset buffer and produces this problem.
   
   This problem cannot be reproduced on recent versions of macOS, because macOS 
[fills freed memory blocks with 
0](https://developer.apple.com/documentation/macos-release-notes/macos-13-release-notes#Memory-Allocation),
 which is naturally the valid initial value for the offset buffer and covers up 
the problem.
   
   ### Steps to reproduce
   
   Run TPC-DS benchmark on Linux using 
https://github.com/apache/datafusion-benchmarks:
   
   ```bash
   spark-submit \
       --master local[8] \
       --conf spark.driver.memory=3g \
       --conf spark.memory.offHeap.enabled=true \
       --conf spark.memory.offHeap.size=16g \
       --conf spark.jars=$COMET_JAR_PR \
       --conf spark.driver.extraClassPath=$COMET_JAR_PR \
       --conf spark.executor.extraClassPath=$COMET_JAR_PR \
       --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions 
\
       --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \
       --conf spark.comet.enabled=true \
       --conf spark.comet.exec.shuffle.enabled=true \
       --conf spark.comet.exec.shuffle.mode=auto \
       --conf spark.comet.exec.shuffle.compression.codec=lz4 \
       --conf spark.comet.exec.replaceSortMergeJoin=false \
       --conf spark.comet.exec.sortMergeJoinWithJoinFilter.enabled=false \
       --conf spark.comet.cast.allowIncompatible=true \
       --conf spark.comet.exec.shuffle.fallbackToColumnar=true \
       tpcbench.py \
       --benchmark tpcds \
       --data $TPCDS_DATA \
       --queries ../../tpcds/queries-spark \
       --output tpc-results
   ```
   
   It will fail at the second query in Q23.
   
   
   ### Expected behavior
   
   TPC-DS Q23 should finish successfully.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to