Hi everyone,
I had to kill some queries that were taking forever, and it turns out
they were doing cartesian products (missing ON clause on a JOIN).
I wonder how I could see that in the EXPLAIN output (which I still find
a bit cryptic). Specifically, the stage that it was stuck in was this:
Stage: Stage-7
Map Reduce
Alias -> Map Operator Tree:
$INTNAME
Reduce Output Operator
sort order:
tag: 1
value expressions:
expr: _col1
type: int
$INTNAME1
Reduce Output Operator
sort order:
tag: 0
value expressions:
expr: _col0
type: bigint
expr: _col1
type: string
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0} {VALUE._col1}
1 {VALUE._col1}
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col3
File Output Operator
compressed: true
GlobalTableId: 0
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Is there anything in there that should have alerted me?
I found out by looking at the query, but I wonder if the query plan (if
I could read it) would have given me that information.
Thanks a lot
David Morel