mattcasters opened a new pull request, #7383:
URL: https://github.com/apache/hop/pull/7383
# Walkthrough: Moving Average (Last N Events) Aggregation
We have successfully implemented the **Moving Average (Last N Events)**
aggregation type for the Group By transform and verified it with both unit
tests and a pipeline integration test, as well as documented the feature.
Issue: #7023
## Changes Made
### 1. Type Configuration & Model
-
**[Aggregation.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/java/org/apache/hop/pipeline/transforms/groupby/Aggregation.java)**:
- Added the integer constant `TYPE_GROUP_MOVING_AVERAGE = 23`.
- Added short label `"MOVING_AVG"` to `typeGroupLabel` and long
description key `MOVING_AVERAGE` to `typeGroupLongDesc`.
- Added an `orderField` property with `@HopMetadataProperty` annotation to
allow specifying the sort/order field. This is persisted to XML/JSON metadata
automatically.
- Updated `clone()`, `equals()`, and `hashCode()` to support the new field.
### 2. Runtime Implementation
-
**[GroupByData.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/java/org/apache/hop/pipeline/transforms/groupby/GroupByData.java)**:
- Added an array of sliding windows `movingAvgWindows`
(ArrayDeque<Double>[]) to hold the values inside the rolling window.
- Added list tracking fields (`movingAvgSourceIndexes`,
`movingAvgTargetIndexes`, `movingAvgWidths`, `movingAvgIndexes`) for on-the-fly
calculations.
-
**[GroupBy.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/java/org/apache/hop/pipeline/transforms/groupby/GroupBy.java)**:
- **processRow**: Initialized the sliding windows array and tracking lists
if `MOVING_AVG` type is configured.
- **newAggregate**: Resets and clears the sliding window array for the
active aggregation index on group changes.
- **addMovingAverages**: Fold new values into the `ArrayDeque`. Trims the
deque to window size $N$. Computes average of elements. Emits `null` if window
size is less than $N$ (partial window handling).
- Appends `addMovingAverages` to the buffer replay loops to correctly
calculate rolling averages row-by-row.
-
**[GroupByMeta.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/java/org/apache/hop/pipeline/transforms/groupby/GroupByMeta.java)**:
- Declared `MOVING_AVG` as outputting `IValueMeta.TYPE_NUMBER`.
### 3. UI Dialog
-
**[GroupByDialog.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/java/org/apache/hop/pipeline/transforms/groupby/GroupByDialog.java)**:
- Added a 5th column: **Order field** (populated via dropdown from
previous step fields).
- Configured `getData()` to load, `ok()` to retrieve/save, and
`setComboBoxes()` to suggest field values for the new column.
- Forces "Include all rows" checkbox selection when `MOVING_AVG` is
selected.
### 4. Internationalization
-
**[messages_en_US.properties](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/main/resources/org/apache/hop/pipeline/transforms/groupby/messages/messages_en_US.properties)**:
- Added description: `Moving average (last N rows)`.
- Added Order Field column name and tooltip descriptions.
### 5. Documentation
-
**[groupby.adoc](file:///home/matt/git/mattcasters/hop/docs/hop-user-manual/modules/ROOT/pages/pipeline/transforms/groupby.adoc)**:
- Added `Moving average (last N rows)` to the lists of available aggregate
methods.
- Described specifying window size in the `Value` column and pre-sorting
fields in the `Order field` column.
---
## Verification & Testing
### 1. Automated Unit Tests
We created a new JUnit 5 test class:
-
**[MovingAverageAggregationTest.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/groupby/src/test/java/org/apache/hop/pipeline/transforms/groupby/MovingAverageAggregationTest.java)**:
- Tests partial window nulls, sliding window updates, null skipping,
resets on group change, window size 1, and result pass-through.
We updated:
-
**[GroupByMetaTest.java](file:///home/matt/git/mattcasters/hop/plugins/transforms/GroupByMetaTest.java)**:
- Verifies round-trip XML configuration serialization/deserialization.
### 2. Integration Pipeline Unit Test
We created a self-contained pipeline test case inside the integration tests
project:
-
**[0006-groupby-moving-average.hpl](file:///home/matt/git/mattcasters/hop/integration-tests/transforms/0006-groupby-moving-average.hpl)**:
Generates a sorted dataset of values for two groups and runs the Group By
transform with $N=3$ moving average, followed by a `Validate` Dummy step.
-
**[golden-groupby-moving-average.csv](file:///home/matt/git/mattcasters/hop/integration-tests/transforms/datasets/golden-groupby-moving-average.csv)**
&
**[golden-groupby-moving-average.json](file:///home/matt/git/mattcasters/hop/integration-tests/transforms/metadata/dataset/golden-groupby-moving-average.json)**:
Golden dataset containing expected rolling average outputs (retains empty
fields for partial windows).
- **[0006-groupby-moving-average
UNIT.json](file:///home/matt/git/mattcasters/hop/integration-tests/transforms/metadata/unit-test/0006-groupby-moving-average%20UNIT.json)**:
Pipeline unit test mapping `Validate` transform results to the golden data set.
-
**[main-0006-groupby.hwf](file:///home/matt/git/mattcasters/hop/integration-tests/transforms/main-0006-groupby.hwf)**:
Action `Run Group By tests` now executes our moving average unit test as well.
---
## Test Executions
1. **Transform Unit Tests**:
```bash
./mvnw clean test -pl plugins/transforms/groupby
-Dtest="GroupByMetaTest,MovingAverageAggregationTest"
```
**Result**: Build Success. All 8 tests passed successfully!
2. **Integration Test Workflow**:
```bash
sh hop run -e "IT transforms" -f main-0006-groupby.hwf -r local
```
**Result**: Workflow execution finished successfully. `Validate -
golden-groupby-moving-average : Test passed successfully against golden data
set`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]