Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21683 )
Change subject: IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2) ...................................................................... Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h@940 PS6, Line 940: std::vector<std::vector<int32_t>> label_idxs; : > This will duplicate EventSequenceMap typedef in L786. Maybe rename this int Done http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2952 PS6, Line 2952: vector<vector<vector<int64_t>>> timestamps_vec; : // Vector of event labels, addres > Please put comment what each index in vector represent. Done http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2968 PS6, Line 2968: // Value : Event label's ordered(actual) index (int32_t) > I have used key value pairs from 'event_sequence_contents.labels' map for o This is AggEventSequence: struct AggEventSequence { // Unique labels in the event sequence. Values are [0, labels.size() - 1] std::map<std::string, int32_t> labels; // One entry per instance. Each entry contains the label indices for that instance's // event sequence. std::vector<std::vector<int32_t>> label_idxs; // One entry per instance. Each entry contains the timestamps for that instance's // event sequence. std::vector<std::vector<int64_t>> timestamps; }; Let say, there are two instance, of EXCHANGE_NODE with "Node Lifecycle Event Timeline" like this: EXCHANGE_NODE (id=216): ExecOption: Codegen Enabled Node Lifecycle Event Timeline: 2s270ms - Open Started: 640.998ms (640.998ms) - Open Finished: 2s270ms (1s629ms) - First Batch Requested: 2s270ms (41.261us) - First Batch Returned: 2s270ms (28.924us) - Last Batch Returned: 2s270ms (125.000ns) - Closed: 2s270ms (59.527us) But due to failure or some errors, the second instance does not report "Last Batch Returned". $ git grep -n "Last Batch Return" be/src/ be/src/exec/exec-node-util.h:68: if (*eos_) node_->events_->MarkEvent("Last Batch Returned"); Will label_idxs[0] have the same content as label_idxs[1]? I imagine: label_idxs[0] = [0, 1, 2, 3, 4, 5] label_idxs[1] = [0, 1, 2, 3, 5] And timestamps[0].size() < timestamps[1].size() accordingly? http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2985 PS6, Line 2985: EventSequenceToJsonHelper(&event_sequences_json, d > Done. I have added comments now along and have also created another helper I meant only new function for (timestamps_vec[es_idx].size() > BUCKET_SIZE) case, but current EventSequenceToJsonHelper is OK too. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/21683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214 Gerrit-Change-Number: 21683 Gerrit-PatchSet: 7 Gerrit-Owner: Surya Hebbar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Wed, 23 Oct 2024 23:30:55 +0000 Gerrit-HasComments: Yes
