Having deja vu from when I did our HCatalog 0.4 install ... the issue is the datanucleus jars are out-of-date. I upgraded the datanucleus JARs and am now past this issue ... new issue is:
2013-10-09 20:20:36,411 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.hcatalog.common.HCatException : 2004 : HCatOutputFormat not initialized, setOutput has to be called at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:111) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:97) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:85) at org.apache.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:187) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:935) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896) at org.apache.hadoop.mapreduce.Job.submit(Job.java:531) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238) at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269) at java.lang.Thread.run(Thread.java:662) On Wed, Oct 9, 2013 at 1:05 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Hi, > > Long time user of HCatalog 0.4 and am testing out an upgrade to Hive / > HCatalog 0.11.0 as we need windowing functions and ORC > > I'm testing the HCatLoader from Pig and am getting the exceptions below > using this simple Pig script: > > sigs_in = load 'signals' using org.apache.hcatalog.pig.HCatLoader(); > describe sigs_in; > sigs = filter sigs_in by datetime_partition == '2013-10-07_0000'; > ... > > The exceptions (see below) occur in the Pig front-end processing, trying > to get the input paths. The Pig describe command returns the schema, so I > know there's some communication going on between the LoadFunc and the > metastore. Also, if I do: hcat -e "show partitions signals;" I get the > list of expected partitions on that table. > > Any ideas on where to start troubleshooting this issue? I'm using Pig 0.10 > with Hive / HCatalog 0.11.0 running on Hadoop 2.0.0-cdh4.1.2. > > I built Hive/HCatalog from source using: *ant clean package > -Dmvn.hadoop.profile=hadoop23 -Dhadoop.mr.rev=23* > > Exception: > > Caused by: java.io.IOException: > org.shaded.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:87) > at > org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:63) > at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:119) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:380) > ... 17 more > Caused by: org.shaded.thrift.transport.TTransportException: > java.net.SocketTimeoutException: Read timed out > at > org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at org.shaded.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.shaded.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.shaded.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.shaded.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at org.shaded.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > *at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:1738) > * > * at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:1722) > * > * at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:780) > * > * at > org.apache.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:112) > * > * at > org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:85) > * > * at > org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:85) > * > ... 20 more > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at > org.shaded.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > > > NOTE: Don't worry about the org.shaded.thrift package names as I had to > build a shaded JAR for my HCatalog clients to work-around Thrift version > issues on my classpath. I tested the same w/o the shading and received the > same error. > >