ok. so it sounds like you are doing A/B testing then. so if it works in your sandbox but doesn't in prod then you can slowing transform your sandbox - one component at time - to look like your prod system until it breaks. The last component you add then is an area of interest.
CTAS is short for "Create Table <blah> AS" On Fri, May 17, 2013 at 11:25 AM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > Hi > I actually did all of the following > - tested all UDFs…they return values correctly > - tested left side of LEFT OUTER JOIN > - tested right side of LEFT OUTER JOIN > > But when I add that ON statement > * sh.date_seller=h.header_date* > > I start getting this error…and this script has had no change for 3 > weeks….used to run fine in production and we did 15 days of aggregations > using this script. > Two days back we installed LZO compression on the production > servers….Circumstancial…but the script is failing after that LZO jar > install…Maybe totally unrelated > > As we speak I am testing this script on my sandbox which I am fairly > sure will work since I don't have LZO compression on my sandbox but I want > to verify > > What is CTAS semantics ? I don't know so please tell me… But even if I > create intermediate tables, I will eventually need to join them… > > Thanks > sanjay > > From: Stephen Sprague <sprag...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Friday, May 17, 2013 11:18 AM > > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: Re: need help with an error - script used to work and now it > does not :-( > > in the meantime why don't you breakup your single query into a series > of queries (using CTAS semantics to create intermediate tables ). > > The idea is narrow the problem down to a minimal size that _isolates the > problem_ . what you have there is overly complex to expect someone to > troubleshoot for you. try to minimize the failure case. take out your > UDF's. Does it work then or fail? strip it down to the bare necessities! > > > On Fri, May 17, 2013 at 10:56 AM, Sanjay Subramanian < > sanjay.subraman...@wizecommerce.com> wrote: > >> I am using Hive 0.9.0+155 that is bundled in Cloudera Manager version >> 4.1.2 >> Still getting the errors listed below :-( >> Any clues will be be cool !!! >> Thanks >> >> sanjay >> >> >> From: Sanjay Subramanian <sanjay.subraman...@wizecommerce.com> >> Date: Thursday, May 16, 2013 9:42 PM >> >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Re: need help with an error - script used to work and now it >> does not :-( >> >> :-( Still facing problems in large datasets >> Were u able to solve this Edward ? >> Thanks >> sanjay >> >> From: Sanjay Subramanian <sanjay.subraman...@wizecommerce.com> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org> >> Date: Thursday, May 16, 2013 8:25 PM >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Re: need help with an error - script used to work and now it >> does not :-( >> >> Thanks Edward…I just checked all instances of guava jars…except those >> in red all seem same version >> >> /usr/lib/hadoop/client/guava-11.0.2.jar >> /usr/lib/hadoop/client-0.20/guava-11.0.2.jar >> /usr/lib/hadoop/lib/guava-11.0.2.jar >> /usr/lib/hadoop-httpfs/webapps/webhdfs/WEB-INF/lib/guava-11.0.2.jar >> /usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar >> /usr/lib/oozie/libtools/guava-11.0.2.jar >> /usr/lib/hive/lib/guava-11.0.2.jar >> /usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar >> /usr/lib/hbase/lib/guava-11.0.2.jar >> /usr/lib/flume-ng/lib/guava-11.0.2.jar >> /usr/share/cmf/lib/cdh3/guava-r09-jarjar.jar >> /usr/share/cmf/lib/guava-12.0.1.jar >> >> But I made a small change in my query (I just removed the text marked >> in blue) that seemed to solve it at least for the test data set that I >> had….Now I need to run it in production for a days worth of data >> >> Will keep u guys posted >> >> >> ------------------------------------------------------------------------------------------------------------ >> SELECT >> h.header_date_donotquery * as date_*, >> h.header_id as *impression_id*, >> h.header_searchsessionid as *search_session_id*, >> h.cached_visitid *as visit_id* , >> split(h.server_name_donotquery,'[\.]')[0] *as server*, >> h.cached_ip *ip*, >> h.header_adnodeid *ad_nodes*, >> >> ------------------------------------------------------------------------------------------------------------ >> >> Thanks >> >> sanjay >> >> >> From: Edward Capriolo <edlinuxg...@gmail.com> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org> >> Date: Thursday, May 16, 2013 7:51 PM >> To: "user@hive.apache.org" <user@hive.apache.org> >> Subject: Re: need help with an error - script used to work and now it >> does not :-( >> >> Ironically I just got a misleading error like this today. What >> happened was I upgraded to hive 0.10.One of my programs was liked to guava >> 15 but hive provides guava 09 on the classpath confusing things. I also had >> a similar issue with mismatched slf 4j and commons-logger. >> >> >> On Thu, May 16, 2013 at 10:34 PM, Sanjay Subramanian < >> sanjay.subraman...@wizecommerce.com> wrote: >> >>> 2013-05-16 18:57:21,094 FATAL [IPC Server handler 19 on 40222] >>> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: >>> attempt_1368666339740_0135_m_000104_1 - exited : >>> java.lang.RuntimeException: Error in configuring object >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) >>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) >>> ... 9 more >>> Caused by: java.lang.RuntimeException: Error in configuring object >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) >>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) >>> ... 14 more >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) >>> ... 17 more >>> Caused by: java.lang.RuntimeException: Map operator initialization failed >>> at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) >>> ... 22 more*Caused by: java.lang.RuntimeException: cannot find field >>> header_date from >>> [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2add5681, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspect*or$MyField@295a4523, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6571120a, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6257828d, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@5f3c296b, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@66c360a5, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@24fe2558, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2945c761, >>> >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2424c672] >>> at >>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345) >>> at >>> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:100) >>> at >>> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922) >>> at >>> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) >>> at >>> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) >>> at >>> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) >>> at >>> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) >>> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) >>> at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) >>> ... 22 more >>> >>> *MY SCRIPT is given below* >>> ===================== >>> hive -hiveconf hive.root.logger=INFO,console -hiveconf >>> mapred.job.priority=VERY_HIGH -e " >>> SET hive.exec.compress.output=true; >>> SET mapred.reduce.tasks=16; >>> SET >>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; >>> add jar ${JAR_NAME_AND_PATH}; >>> create temporary function collect as >>> 'com.wizecommerce.utils.hive.udf.GenericUDAFCollect'; >>> create temporary function isnextagip as >>> 'com.wizecommerce.utils.hive.udf.IsNextagIP'; >>> create temporary function isfrombot as >>> 'com.wizecommerce.utils.hive.udf.IsFromBot'; >>> create temporary function processblankkeyword as >>> 'com.wizecommerce.utils.hive.udf.ProcessBlankKeyword'; >>> create temporary function getSellersProdImpr as >>> 'com.wizecommerce.utils.hive.udf.GetSellersWithValidSellerIdsProdImpr'; >>> create temporary function getProgramCode as >>> 'com.wizecommerce.utils.hive.udf.GetProgramCodeFromSellerClickContext'; >>> INSERT OVERWRITE DIRECTORY >>> '/user/beeswax/warehouse/${HIVE_OUTPUT_TBL}/${DATE_STR}' >>> SELECT >>> h.header_date_donotquery as date_, >>> h.header_id as impression_id, >>> h.header_searchsessionid as search_session_id, >>> h.cached_visitid as visit_id , >>> split(h.server_name_donotquery,'[\.]')[0] as server, >>> h.cached_ip ip, >>> h.header_adnodeid ad_nodes, >>> if(concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|', >>> if(h.seller_sellerid is null, >>> 'null',cast(h.seller_sellerid as STRING)), >>> if(h.seller_tagid is >>> null,'null',cast(h.seller_tagid as STRING)), >>> cast(IF(h.seller_subtotal IS >>> NULL, -1, h.seller_subtotal) as STRING), >>> cast(IF(h.seller_pricetier IS >>> NULL, -1, h.seller_pricetier) as STRING), >>> cast(IF(h.seller_pricerank >>> IS NULL, -1, h.seller_pricerank) as STRING), >>> cast(IF(h.seller_cpc IS NULL, -1, >>> h.seller_cpc) as STRING), >>> h.program_code_notnull)))) = '', >>> NULL, concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|', >>> if(h.seller_sellerid is null, >>> 'null',cast(h.seller_sellerid as STRING)), >>> if(h.seller_tagid is >>> null,'null',cast(h.seller_tagid as STRING)), >>> cast(IF(h.seller_subtotal IS >>> NULL, -1, h.seller_subtotal) as STRING), >>> cast(IF(h.seller_pricetier IS >>> NULL, -1, h.seller_pricetier) as STRING), >>> cast(IF(h.seller_pricerank >>> IS NULL, -1, h.seller_pricerank) as STRING), >>> cast(IF(h.seller_cpc IS NULL, -1, >>> h.seller_cpc) as STRING), >>> h.program_code_notnull))))) as >>> visible_sellers, >>> >>> if(concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|', >>> if(sh.seller_id is >>> null,'null',cast(sh.seller_id as STRING)), >>> if(sh.tag_id is null, 'null', >>> cast(sh.tag_id as STRING)), >>> '-1.0', >>> cast(IF(sh.price_tier IS NULL, >>> -1, sh.price_tier) as STRING), >>> '-1', >>> cast(IF(sh.price_tier IS NULL, >>> -1.0, sh.price_tier*1.0) as STRING), >>> h.program_code_null)))) = '', >>> NULL, concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|', >>> if(sh.seller_id is >>> null,'null',cast(sh.seller_id as STRING)), >>> if(sh.tag_id is null, 'null', >>> cast(sh.tag_id as STRING)), >>> '-1.0', >>> cast(IF(sh.price_tier IS NULL, >>> -1, sh.price_tier) as STRING), >>> '-1', >>> cast(IF(sh.price_tier IS NULL, >>> -1.0, sh.price_tier*1.0) as STRING), >>> h.program_code_null))))) as >>> invisible_sellers >>> FROM >>> (SELECT >>> header_id, >>> header_date, >>> header_date_donotquery, >>> header_searchsessionid, >>> cached_visitid, >>> cached_ip, >>> header_adnodeid, >>> server_name_donotquery, >>> seller_sellerid, >>> seller_tagid, >>> cast (regexp_replace(seller_subtotal,',','.') as DOUBLE) as >>> seller_subtotal, >>> seller_pricetier, >>> seller_pricerank, >>> CAST(CAST(seller_cpc as INT) as DOUBLE) as seller_cpc, >>> cast(getProgramCode('${THISHOST}', >>> '${REST_API_SERVER_NAME}',seller_clickcontext) as STRING) as >>> program_code_notnull, >>> cast(getProgramCode('${THISHOST}', '${REST_API_SERVER_NAME}', >>> '') as STRING) as program_code_null >>> FROM >>> product_impressions_hive_only >>> WHERE >>> header_date='${DATE_STR}' >>> AND >>> cached_recordid IS NOT NULL >>> AND >>> isnextagip(cached_ip) = FALSE >>> AND >>> isfrombot(cached_visitid) = FALSE >>> AND >>> header_skipsellerloggingflag = 0 >>> ) h >>> >>> LEFT OUTER JOIN >>> (SELECT >>> * >>> FROM >>> prodimpr_seller_hidden >>> WHERE >>> date_seller = '${DATE_STR}' >>> ) sh >>> ON >>> h.header_id = sh.header_id >>> AND >>> sh.date_seller=h.header_date >>> GROUP BY >>> h.header_date_donotquery, >>> h.header_id, >>> h.header_searchsessionid, >>> h.cached_visitid, >>> h.server_name_donotquery, >>> h.cached_ip, >>> h.header_adnodeid >>> ; >>> " >>> >>> >>> CONFIDENTIALITY NOTICE >>> ====================== >>> This email message and any attachments are for the exclusive use of the >>> intended recipient(s) and may contain confidential and privileged >>> information. Any unauthorized review, use, disclosure or distribution is >>> prohibited. If you are not the intended recipient, please contact the >>> sender by reply email and destroy all copies of the original message along >>> with any attachments, from your computer system. If you are the intended >>> recipient, please be advised that the content of this message is subject to >>> access, review and disclosure by the sender's Email System Administrator. >>> >> >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator. >> >> CONFIDENTIALITY NOTICE >> ====================== >> This email message and any attachments are for the exclusive use of the >> intended recipient(s) and may contain confidential and privileged >> information. Any unauthorized review, use, disclosure or distribution is >> prohibited. If you are not the intended recipient, please contact the >> sender by reply email and destroy all copies of the original message along >> with any attachments, from your computer system. If you are the intended >> recipient, please be advised that the content of this message is subject to >> access, review and disclosure by the sender's Email System Administrator. >> > > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >