[Impala-ASF-CR] IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2)

Surya Hebbar (Code Review) Tue, 19 Nov 2024 16:56:16 -0800

Surya Hebbar has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/21683 )


Change subject: IMPALA-13304: Include aggregate instance-level metrics within 
experimental profile(V2)
......................................................................

IMPALA-13304: Include aggregate instance-level metrics within experimental 
profile(V2)

Currently, instance-level details of fragment events are being
completely omitted from the experimental profile in contrast with
the traditional profile. This is in order to limit the profile size
from growing rapidly with increasing number of instances.

This patch introduces aggregate instance-level metrics to profile V2
or the experimental profile without allowing the profile size to
grow rapidly.

In addition to aggregate instance-level metrics, the following
predefined aggregate info strings are being accomodated into the
experimental profile(as per AGGREGATED_INFO_STRINGS_REQUIRED).

- Table Name

When the number of instances is more than 5, to limit the profile size
from growing rapidly, only the following aggregate metrics are included
in the profile. The aggregated event metrics are calculated by splitting
the event timestamps into 5 divisions, each spanning a duration of 20%
between the maximum and minimum instance timestamps.
- w.r.t an event's division of timestamps generated from processing
    a particular plan node -
  * Maximum timestamp
  * Minimum timestamp
  * Average timestamp
  * Total no. of instances

In case the number of instances is less than or equal to 5,
all instances' event timestamps are included.

To minimize the time other threads are obstructed from writing to
'event_sequences_map_', required attributes are copied into vectors
before processing the aggregated event sequences.

The aggregate metrics are calculated with minimal overhead through
assignment to a particular divison without the need for sorting,
resulting in a time complexity of O(n) with only two passes through the
entire list of timestamps.

To further optimize the performance, the aggregates are calculated
by circumventing the requirement to store each division's timestamps,
utilizing only the memory needed for a single value per metric,
instead of the entire range of values, while also reusing the
previously allocated vectors.

If any fragment instances report only a subset of events due to failure
or error, only those event timestamps are skipped for aggregate metrics
calculation, while incorporating the available timestamps.

In case of missing events, the timestamps are ordered and aligned
through the analysis of 'label_idxs'. If there is at least a single
instance with a complete set of events, all the instances that
contain missing timestamps are ordered and aligned efficiently
by passing through a mantained reference to the reordering of labels
themselves. Otherwise, the initial ordering and alignment are kept.

For efficiently copying the calculated values without internally
reallocating on each insertion, memory is preallocated for each array
of metrics using RapidJSON library.

On using the experimental JSON profile, within a particular plan node's
profile, the following structure is used.

When no. of instances > 5 -
{
  profile_name : <PLAN_NODE_NAME>,
  num_children : <NUM_CHILDREN>
  node_metadata : <NODE_METADATA_OBJECT>
  event_sequences :
  [{
    events : // An example event
    [{
      label : "Open Started""
      ts_stat :
      {
        min : [ 2257887941, ...4 other division's minimum timestamps ],
        max : [ 3257887941, ...4 other division's maximum timestamps ],
        avg : [ 2757887941, ...4 other division's average timestamps ]
        count : [ 2, ...4 other counts of divison's no. of instances ]
      }
    }, ...other plan node's events
    ]
  }],
  counters : <COUNTERS_OBJECT_ARRAY>,
  child_profiles : <CHILD_PROFILES>
}

When no. of instances <= 5 -
{
  profile_name : <PLAN_NODE_NAME>,
  num_children : <NUM_CHILDREN>
  node_metadata : <NODE_METADATA_OBJECT>
  event_sequences :
  [{
    offset : 0
    events : // An example event
    [{
      label : "Open Started""
      ts_list : [ 2257887941, ...4 other instance's timestamps ]
    }, ...other plan node's events
    ]
  }],
  counters : <COUNTERS_OBJECT_ARRAY>,
  child_profiles : <CHILD_PROFILES>
}

Note: In the above structures, unlike a plan node's profile,
a fragment's profile does not contain the 'node_metadata' field.

Additionally, the aggregate info strings are represented in the
following manner.

{
  "info_strings" :
  [{
    "key": "<info string's key>(s)",
    "value": [<distinct info string values>]
  }]
}

Note: The instance indexes are currently being omitted.

Generated the latest expected JSON profile outputs from the
'impala-profile-tool' using the stored impala profile logs.

Added tests in tests/observability for profile v2's JSON output,
after inclusion of the new expected JSON profile formats for
both text and JSON.

Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
---
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
A 
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2.expected.json
A 
testdata/impala-profiles/impala_profile_log_tpcds_compute_stats_v2.expected.pretty_extended.json
M tests/observability/test_profile_tool.py
5 files changed, 8,078 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/21683/9
--
To view, visit http://gerrit.cloudera.org:8080/21683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Gerrit-Change-Number: 21683
Gerrit-PatchSet: 9
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>

[Impala-ASF-CR] IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2)

Reply via email to