Surya Hebbar has posted comments on this change. (
http://gerrit.cloudera.org:8080/21683 )
Change subject: IMPALA-13304: Include aggregate instance-level metrics within
experimental profile(V2)
......................................................................
Patch Set 8:
Also, thank you for prompting me regarding the missing events case.
Please consider this example, to see some limitations and complications arising
from the current aggregation mechanism(i.e. 'labels', 'label_idxs',
'timestamps').
The events are streamed from instances in this exact order.
Instance 1 -
"Open Started" - 2280ms
"Open Finished" - 3270ms
"Last Batch Returned" - 3470ms
"Closed" - 4100ms
Instance 2 -
"Open Started" - 1280ms
"Open Finished" - 1330ms
"First Batch Requested" - 1370ms
"Closed" - 3100ms
Result of aggregation -
labels -
"Open Started" - 0, "Open Finished" - 1, "Last Batch Returned" - 2, "Closed" -
3, "First Batch Requested" - 4
label_idxs -
0 1 2 3
0 1 4 3
timestamps -
2280 3270 3470 4100
1280 1330 1370 3100
Note : The resulting 'labels', 'timestamps' and 'label_idxs' will be consistent
as 'SortEvents' is being called before delivering events.
The proper alignment of timestamps (with zeros substituted) is -
(possible to generate using 'label_idxs')
2280 3270 3470 4100 0
1280 1330 0 3100 1370
The proper order and alignment of timestamps(with zeros substituted) is -
(functionally not possible to generate, even after using 'labels'(map),
'label_idxs' and 'timestamps')
2280 3270 0 3470 4100
1280 1330 1370 0 3100
(note: identifying the order of events, through comparison between timestamps
across instances is wrong)
In this manner, if the first few instances responsible for generating 'labels'
map have missing events, it is functionally not possible to extract the perfect
order.
Hence, the current aggregation mechanism, fails to provide an absolute method
to align and order the missing events' timestamp in many edge cases.
Although a definite ordering is not always possible, I have come up with the
following approach to generate the most probable orderings(using 'label_idxs'
from all instances).
1. Consider each index in 'label_idxs' as a node
2. While mantaining a set of visited nodes
3. Add adjacent indexes in 'label_idxs' as parent(or children), if they exist
in visited nodes(Add to visited nodes, otherwise)
4. The result will be
-> A degenerate binary tree (a successful definite order was found)
-> A single multi-terminal unrooted tree(definite order, except where there
are multiple terminals)
-> A forest of disjoint trees(not functionally possible to find definite
order, need to merge all trees in any order)
5. Level-order traversal of the final tree will provide a perfect order / one
of the most probable orderings.
I believe this would result in a space and time complexity of O(n).
In addition to the computational overhead, even after such a complex ordering
process, still there is a possibility of not finding the perfect order(when it
is not functionally definite, i.e. consider indexes [0, 1, 2] [3, 4, 5]).
Considering all of these factors, in such cases, I think any one of the
following choices can be made, to preserve the simplicity and efficiency -
Either
1. Skip the small number of missing events(accept the rarely occuring small
irregularity in timestamp's aggregate values)
or
2. Ignore all the events from irregular instance(accept the loss of all
instance's timestamp values in the aggregate)
or
3. Declare a predifined order for event labels(accept the overhead of
reordering timestamps, even if just a single event is missing)
(also, it may not be possible to have a predefined order for events in cases
like AsyncCodegen)
After inspecting all these aspects, I have currently decided to go with the
first option here.
Please let me know, if there is a much simpler way through this or if I have
misunderstood something.
--
To view, visit http://gerrit.cloudera.org:8080/21683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Gerrit-Change-Number: 21683
Gerrit-PatchSet: 8
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Fri, 15 Nov 2024 21:03:04 +0000
Gerrit-HasComments: No