[ https://issues.apache.org/jira/browse/HIVE-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360203#comment-16360203 ]
liyunzhang commented on HIVE-14285: ----------------------------------- [~kgyrtkirk]: I want to ask a question that the works in 1 Stage(like Map1, Map4 in Stage-1) are executed in parallel although Map4 is after Map1 in explain. {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez #### A masked pattern was here #### Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE) #### A masked pattern was here #### Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart filterExpr: ds is not null (type: boolean) Statistics: Num rows: 2000 Data size: 389248 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ds (type: string) outputColumnNames: _col0 Statistics: Num rows: 2000 Data size: 368000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 2000 Data size: 368000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: llap LLAP IO: no inputs Map 4 Map Operator Tree: TableScan alias: srcpart_date filterExpr: ((date = '2008-04-08') and ds is not null) (type: boolean) Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((date = '2008-04-08') and ds is not null) (type: boolean) Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ds (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Dynamic Partitioning Event Operator Target column: ds (string) Target Input: srcpart Partition key expr: ds Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE Target Vertex: Map 1 Execution mode: llap LLAP IO: no inputs Reducer 2 Execution mode: llap Reduce Operator Tree: Merge Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: string) 1 _col0 (type: string) Statistics: Num rows: 2200 Data size: 404800 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Reducer 3 Execution mode: llap Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} If works in the Stage are executed in parallel, I guess there is no problem for {{ExplainTask#getBasictypeKeyedMap}} which sort works by the work#name in my previous question. Although it causes Map10 is in front of Map6 in above example in explain, Map10 and Map6 are executed parallel in the runtime. > Explain outputs: map-entry ordering of non-primitive objects. > --------------------------------------------------------------- > > Key: HIVE-14285 > URL: https://issues.apache.org/jira/browse/HIVE-14285 > Project: Hive > Issue Type: Improvement > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Minor > Fix For: 2.3.0 > > Attachments: HIVE-14285.1.patch > > > In HIVE-12244 I've left behind some ugly backward compatible getters with > {{@Explain}} decorations to keep the qtests from breaking. > There were heavy explain plan changes when I used {{Path}} objects as keys in > {{@Explain}} marked methods. > I've looked into the causes of this: > * there is a {{TreeSet}} in there to keep all the keys in order. > * but: {{org.apache.hadoop.fs.Path}} uses a different sort order (inherited > from {{java.net.URI}} )...it sorts the paths using > priorities:[schema,schemeSpecificPart,host,path,query,fragment] > considering that the output is an explain result(possibly read by humans): > i don't think this sophisticated sort order can be useful. > {{ExplainTask#outputMap}} always calls toString() on the keys before using > them...so the most painless solution would be to change all the keys inside > the treeset to simple strings (in case it's not a primitive already); this > would restore the original behaviour for me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)