Hi Bing, Something seems wrong about your create table statements. You are using "LOAD DATA LOCAL INPATH" to load data into Hive tables. This makes me think that the files /home/biadmin/hivetbl/student_details.txt and /home/biadmin/hivetbl/student_score.txt are on the local drive. In such a case you want to copy them onto HDFS for use by Hive tables and if so, doing "LOAD DATA LOCAL INPATH" is the right thing to do.
However, when you create a table and specify the location ...STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'... This location refers to the HDFS location. If you don't specify this, Hive will choose this by for you by default. However, if you do specify it, it's your responsibility to ensure that this location doesn't conflict with another Hive table. Long story short, a leaf level directory in HDFS stores the partitions/buckets and contents of 1 table. In your create table statement, you were pointing both Hive tables to the same HDFS location /home/biadmin/hivetbl. If you could make your two tables, point to different HDFS locations like, '/usr/hive/warehouse/student_score' and '/usr/hive/warehouse/student_details' and then do your load data inpath statements, that should fix the problem. Good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: [email protected] "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "Bing Li" <[email protected]> To: [email protected], [email protected] Sent: Monday, January 16, 2012 5:06:00 AM Subject: Hive JOIN fails if SELECT statement contains fields from the first table. 1. I create two Hive table: Hive> CREATE EXTERNAL TABLE student_details (studentid INT,studentname STRING,age INT,gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION ‘/home/biadmin/hivetbl'; Hive>CREATE EXTERNAL TABLE student_score(studentid INT, classid INT,score FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/home/biadmin/hivetbl'; 2. Load data HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_details.txt' OVERWRITE INTO TABLE student_details; HIVE>LOAD DATA LOCAL INPATH '/home/biadmin/hivetbl/student_score.txt' OVERWRITE INTO TABLE student_score; 3. Run inner join Hive> SELECT a.studentid,a.studentname,a.age,b.classid,b.score,c.classname FROM student_details a JOIN student_score b ON (a.studentid = b.studentid); Result: There are the following exception: cannot find field studentname from [0:studentid, 1:classid, 2:score] [My Question]: studentname is a field of the table student_details (The first table), why search it in the table student_score(the second table)? log is like that; ... ... 2012-01-15 23:24:41,727 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201201152221_0014_m_000000_3: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"studentid":106,"classid":null,"score":635.0} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) ... 4 more Caused by: java.lang.RuntimeException: cannot find field studentname from [0:studentid, 1:classid, 2:score] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:200) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 5 more
