Alexey Zinoviev created IGNITE-12257: ----------------------------------------
Summary: [ML] Add Feature Filter for ML Partitioned Dataset Key: IGNITE-12257 URL: https://issues.apache.org/jira/browse/IGNITE-12257 Project: Ignite Issue Type: Improvement Affects Versions: 2.9 Reporter: Alexey Zinoviev Assignee: Alexey Zinoviev Fix For: 2.9 The behavior of this method ignores possible feature choosing on the previous levels and we have no ability to make feature engineering during the preprocessing like simple sql: filter, exclude, produce new features and so on public SimpleDatasetData build( LearningEnvironment env, Iterator<UpstreamEntry<K, V>> upstreamData, long upstreamDataSize, C ctx) { // Prepares the matrix of features in flat column-major format. int cols = -1; double[] features = null; int ptr = 0; while (upstreamData.hasNext()) { UpstreamEntry<K, V> entry = upstreamData.next(); Vector row = preprocessor.apply(entry.getKey(), entry.getValue()).features(); if (cols < 0) { cols = row.size(); features = new double[Math.toIntExact(upstreamDataSize * cols)]; } else assert row.size() == cols : "Feature extractor must return exactly " + cols + " features"; for (int i = 0; i < cols; i++) features[Math.toIntExact(i * upstreamDataSize + ptr)] = row.get(i); ptr++; } return new SimpleDatasetData(features, Math.toIntExact(upstreamDataSize)); } -- This message was sent by Atlassian Jira (v8.3.4#803005)