Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21683 )

Change subject: IMPALA-13304: Include aggregate instance-level metrics within 
experimental profile(V2)
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.h@940
PS6, Line 940:     std::vector<std::vector<int32_t>> label_idxs;
             :
> This will duplicate EventSequenceMap typedef in L786. Maybe rename this int
Done


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2952
PS6, Line 2952:     vector<vector<vector<int64_t>>> timestamps_vec;
              :     // Vector of event labels, addres
> Please put comment what each index in vector represent.
Done


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2968
PS6, Line 2968:           // Value : Event label's ordered(actual) index 
(int32_t)
> I have used key value pairs from 'event_sequence_contents.labels' map for o
This is AggEventSequence:

  struct AggEventSequence {
    // Unique labels in the event sequence. Values are [0, labels.size() - 1]
    std::map<std::string, int32_t> labels;

    // One entry per instance. Each entry contains the label indices for that 
instance's
    // event sequence.
    std::vector<std::vector<int32_t>> label_idxs;

    // One entry per instance. Each entry contains the timestamps for that 
instance's
    // event sequence.
    std::vector<std::vector<int64_t>> timestamps;
  };

Let say, there are two instance, of EXCHANGE_NODE with "Node Lifecycle Event 
Timeline" like this:

        EXCHANGE_NODE (id=216):
          ExecOption: Codegen Enabled
          Node Lifecycle Event Timeline: 2s270ms
             - Open Started: 640.998ms (640.998ms)
             - Open Finished: 2s270ms (1s629ms)
             - First Batch Requested: 2s270ms (41.261us)
             - First Batch Returned: 2s270ms (28.924us)
             - Last Batch Returned: 2s270ms (125.000ns)
             - Closed: 2s270ms (59.527us)

But due to failure or some errors, the second instance does not report "Last 
Batch Returned".

$ git grep -n "Last Batch Return" be/src/
be/src/exec/exec-node-util.h:68:    if (*eos_) node_->events_->MarkEvent("Last 
Batch Returned");

Will label_idxs[0] have the same content as label_idxs[1]? I imagine:
label_idxs[0] = [0, 1, 2, 3, 4, 5]
label_idxs[1] = [0, 1, 2, 3, 5]

And timestamps[0].size() < timestamps[1].size() accordingly?


http://gerrit.cloudera.org:8080/#/c/21683/6/be/src/util/runtime-profile.cc@2985
PS6, Line 2985: EventSequenceToJsonHelper(&event_sequences_json, d
> Done. I have added comments now along and have also created another helper
I meant only new function for (timestamps_vec[es_idx].size() > BUCKET_SIZE) 
case, but current EventSequenceToJsonHelper is OK too. Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/21683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Gerrit-Change-Number: 21683
Gerrit-PatchSet: 7
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Wed, 23 Oct 2024 23:30:55 +0000
Gerrit-HasComments: Yes

Reply via email to