Thanks, Mark. I found the problem. For some reason, Hive is not able to write Avro output file when the schema has a complex field with NULL option. It read without any problem; but cannot write with that structure. For example, Insert was failing on this array of structure field.
{ "name": "Passenger", "type": [{"type":"array","items": {"type":"record", "name": "PAXStruct", "fields": [ { "name":"PAXCode", "type":["string", "null"] }, { "name":"PAXQuantity","type":["int", "null"] } ] } }, "null"] } I removed the last "null" clause and it's working okay now. Regards, Sadu On Thu, Apr 4, 2013 at 12:36 AM, Mark Grover <grover.markgro...@gmail.com>wrote: > Can you please check your Jobtracker logs? The is a generic error related > to grabbing the Task Attempt Log URL, the real error is in JT logs. > > > On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: > >> Hi Dean, >> >> I tried inserting a bucketed hive table from a non-bucketed table using >> insert overwrite .... select from clause; but I get the following error. >> >> ---------------------------------------------------------------------------------- >> Exception in thread "Thread-225" java.lang.NullPointerException >> at >> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44) >> at >> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186) >> at >> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142) >> at java.lang.Thread.run(Thread.java:662) >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.MapRedTask >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> Both tables have same structure except that that one has CLUSTERED BY >> CLAUSE and other not. >> >> Some columns are defined as Array of Structs. The Insert statement works >> fine if I take out those complex columns. Are there any known issues >> loading STRUCT or ARRAY OF STRUCT fields? >> >> >> Thanks for your time and help. >> >> Sadu >> >> >> >> >> On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler < >> dean.wamp...@thinkbiganalytics.com> wrote: >> >>> The table can be external. You should be able to use this data with >>> other tools, because all bucketing does is ensure that all occurrences for >>> records with a given key are written into the same block. This is why >>> clustered/blocked data can be joined on those keys using map-side joins; >>> Hive knows it can cache ab individual block in memory and the block will >>> hold all records across the table for the keys in that block. >>> >>> So, Java MR apps and Pig can still read the records, but they won't >>> necessarily understand how the data is organized. I.e., it might appear >>> unsorted. Perhaps HCatalog will allow other tools to exploit the structure, >>> but I'm not sure. >>> >>> dean >>> >>> >>> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: >>> >>>> Thanks, Dean. >>>> >>>> Does that mean, this bucketing is exclusively Hive feature and not >>>> available to others like Java, Pig, etc? >>>> >>>> And also, my final tables have to be managed tables; not external >>>> tables, right? >>>> . >>>> Thank again for your time and help. >>>> >>>> Sadu >>>> >>>> >>>> >>>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler < >>>> dean.wamp...@thinkbiganalytics.com> wrote: >>>> >>>>> I don't know of any way to avoid creating new tables and moving the >>>>> data. In fact, that's the official way to do it, from a temp table to the >>>>> final table, so Hive can ensure the bucketing is done correctly: >>>>> >>>>> https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html >>>>> >>>>> In other words, you might have a big move now, but going forward, >>>>> you'll want to stage your data in a temp table, use this procedure to put >>>>> it in the final location, then delete the temp data. >>>>> >>>>> dean >>>>> >>>>> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde >>>>> <saduhe...@gmail.com>wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> We run M/R jobs to parse and process large and highly complex xml >>>>>> files into AVRO files. Then we build external Hive tables on top the >>>>>> parsed >>>>>> Avro files. The hive tables are partitioned by day; but they are still >>>>>> huge >>>>>> partitions and joins do not perform that well. So I would like to try >>>>>> out creating buckets on the join key. How do I create the buckets on the >>>>>> existing HDFS files? I would prefer to avoid creating another set of >>>>>> tables >>>>>> (bucketed) and load data from non-bucketed table to bucketed tables if at >>>>>> all possible. Is it possible to do the bucketing in Java as part of the >>>>>> M/R >>>>>> jobs while creating the Avro files? >>>>>> >>>>>> Any help / insight would greatly be appreciated. >>>>>> >>>>>> Thank you very much for your time and help. >>>>>> >>>>>> Sadu >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Dean Wampler, Ph.D.* >>>>> thinkbiganalytics.com >>>>> +1-312-339-1330 >>>>> >>>>> >>>> >>> >>> >>> -- >>> *Dean Wampler, Ph.D.* >>> thinkbiganalytics.com >>> +1-312-339-1330 >>> >>> >> >