Glad to hear! On Fri, Apr 5, 2013 at 3:02 PM, Sadananda Hegde <saduhe...@gmail.com> wrote:
> Thanks, Mark. > > I found the problem. For some reason, Hive is not able to write Avro > output file when the schema has a complex field with NULL option. It read > without any problem; but cannot write with that structure. For example, > Insert was failing on this array of structure field. > > { "name": "Passenger", "type": > [{"type":"array","items": > {"type":"record", > "name": "PAXStruct", > "fields": [ > { "name":"PAXCode", > "type":["string", "null"] }, > { > "name":"PAXQuantity","type":["int", "null"] } > ] > } > }, "null"] > } > > I removed the last "null" clause and it's working okay now. > > Regards, > Sadu > > > On Thu, Apr 4, 2013 at 12:36 AM, Mark Grover > <grover.markgro...@gmail.com>wrote: > >> Can you please check your Jobtracker logs? The is a generic error related >> to grabbing the Task Attempt Log URL, the real error is in JT logs. >> >> >> On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: >> >>> Hi Dean, >>> >>> I tried inserting a bucketed hive table from a non-bucketed table using >>> insert overwrite .... select from clause; but I get the following error. >>> >>> ---------------------------------------------------------------------------------- >>> Exception in thread "Thread-225" java.lang.NullPointerException >>> at >>> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44) >>> at >>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186) >>> at >>> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142) >>> at java.lang.Thread.run(Thread.java:662) >>> FAILED: Execution Error, return code 2 from >>> org.apache.hadoop.hive.ql.exec.MapRedTask >>> >>> -------------------------------------------------------------------------------------------------------------------------- >>> >>> Both tables have same structure except that that one has CLUSTERED BY >>> CLAUSE and other not. >>> >>> Some columns are defined as Array of Structs. The Insert statement works >>> fine if I take out those complex columns. Are there any known issues >>> loading STRUCT or ARRAY OF STRUCT fields? >>> >>> >>> Thanks for your time and help. >>> >>> Sadu >>> >>> >>> >>> >>> On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler < >>> dean.wamp...@thinkbiganalytics.com> wrote: >>> >>>> The table can be external. You should be able to use this data with >>>> other tools, because all bucketing does is ensure that all occurrences for >>>> records with a given key are written into the same block. This is why >>>> clustered/blocked data can be joined on those keys using map-side joins; >>>> Hive knows it can cache ab individual block in memory and the block will >>>> hold all records across the table for the keys in that block. >>>> >>>> So, Java MR apps and Pig can still read the records, but they won't >>>> necessarily understand how the data is organized. I.e., it might appear >>>> unsorted. Perhaps HCatalog will allow other tools to exploit the structure, >>>> but I'm not sure. >>>> >>>> dean >>>> >>>> >>>> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde >>>> <saduhe...@gmail.com>wrote: >>>> >>>>> Thanks, Dean. >>>>> >>>>> Does that mean, this bucketing is exclusively Hive feature and not >>>>> available to others like Java, Pig, etc? >>>>> >>>>> And also, my final tables have to be managed tables; not external >>>>> tables, right? >>>>> . >>>>> Thank again for your time and help. >>>>> >>>>> Sadu >>>>> >>>>> >>>>> >>>>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler < >>>>> dean.wamp...@thinkbiganalytics.com> wrote: >>>>> >>>>>> I don't know of any way to avoid creating new tables and moving the >>>>>> data. In fact, that's the official way to do it, from a temp table to the >>>>>> final table, so Hive can ensure the bucketing is done correctly: >>>>>> >>>>>> https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html >>>>>> >>>>>> In other words, you might have a big move now, but going forward, >>>>>> you'll want to stage your data in a temp table, use this procedure to put >>>>>> it in the final location, then delete the temp data. >>>>>> >>>>>> dean >>>>>> >>>>>> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde <saduhe...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We run M/R jobs to parse and process large and highly complex xml >>>>>>> files into AVRO files. Then we build external Hive tables on top the >>>>>>> parsed >>>>>>> Avro files. The hive tables are partitioned by day; but they are still >>>>>>> huge >>>>>>> partitions and joins do not perform that well. So I would like to try >>>>>>> out creating buckets on the join key. How do I create the buckets on the >>>>>>> existing HDFS files? I would prefer to avoid creating another set of >>>>>>> tables >>>>>>> (bucketed) and load data from non-bucketed table to bucketed tables if >>>>>>> at >>>>>>> all possible. Is it possible to do the bucketing in Java as part of the >>>>>>> M/R >>>>>>> jobs while creating the Avro files? >>>>>>> >>>>>>> Any help / insight would greatly be appreciated. >>>>>>> >>>>>>> Thank you very much for your time and help. >>>>>>> >>>>>>> Sadu >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Dean Wampler, Ph.D.* >>>>>> thinkbiganalytics.com >>>>>> +1-312-339-1330 >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Dean Wampler, Ph.D.* >>>> thinkbiganalytics.com >>>> +1-312-339-1330 >>>> >>>> >>> >> >