Eliaaazzz opened a new pull request, #37532: URL: https://github.com/apache/beam/pull/37532
Updates #37531 ## Summary This PR adds an **opt-in stateless bundle-local size-aware batching path** for variable-length inference workloads in `RunInference`. It introduces `SortAndBatchElements` in `apache_beam/transforms/util.py`, which: 1. Buffers elements within a bundle (`StartBundle` → `FinishBundle`) 2. Orders elements by size (default `len(x)`, overridable via `element_size_fn`) 3. Forms batches under existing constraints (`max_batch_size`, `max_batch_weight`) Default behavior remains unchanged unless this path is enabled. ## Motivation `BatchElements` is count-based. For heavy-tail length distributions, long outliers can inflate padding cost for many short elements in the same batch, increasing tail latency and reducing effective throughput. This PR provides a stateless (bundle-local, no shuffle) way to improve batch composition under variable-length inputs. ## Mechanism clarification A strict-control ablation is included to isolate effects: * **Fixed boundaries + intra-batch sorting only**: negligible gain on Pareto * **Size-aware ordering + `max_batch_weight`**: significant gain In this workload, gains are primarily consistent with **boundary changes under weight constraints after size-aware ordering**, rather than intra-batch reordering alone. ## Benchmark methodology Script: `apache_beam/transforms/sort_and_batch_benchmark.py` * N=20 trials (3 warmup trials excluded) * Same fixed corpus and seed for A/B * Metrics: padding ratio, throughput (Ktok/s), E2E latency, batch p95/p99 * Invariants checked: element/token conservation ## Pareto (heavy-tail) results Configuration: * N=10,000 elements * `max_batch_size=32`, `max_batch_weight=2000` * Input stats: mean=4.2, median=1, std=18.5, max=500 Baseline → Stateless: * Padding ratio: `13.00x → 3.19x` (**↓75.5%**) * Throughput median: `2.2 → 7.3 Ktok/s` (**↑230.4%**) * E2E latency p95: `18924.3 → 5728.6 ms` (**↓69.7%**) * Batch latency p95: `283.3 → 64.4 ms` (**↓77.3%**) * Batch latency p99: `1011.9 → 632.5 ms` (**↓37.5%**) * Batch count: `313 → 321` (+3%) * Invariants: elements/tokens matched ## Scope Included in this PR: * Stateless, bundle-local implementation (`SortAndBatchElements`) * Unit tests * Benchmark + strict-control ablation Not included in this PR: * Stateful/global keying strategy * Cross-worker/global reordering * Auto-tuning or public weighting knobs ## Files changed * `apache_beam/transforms/util.py` * `apache_beam/transforms/util_test.py` * `apache_beam/transforms/sort_and_batch_benchmark.py` ## Notes Claims in this PR are scoped to the Pareto heavy-tail setup used above. Broader-distribution conclusions and stateful/global strategy are follow-up work. <img width="823" height="341" alt="image" src="https://github.com/user-attachments/assets/97b71315-34f1-412b-8e3b-971af0ac0f5e" /> <img width="956" height="1023" alt="image" src="https://github.com/user-attachments/assets/f7e2c445-f4a1-448a-a96d-83f9d954c2ac" /> ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
