Hi All, I am having ClassNotFound problems w.r.t. a custom Load function.
- I am using Pig-0.7.0 with Hadoop-0.20.2 - Input to the job is a sequence file with custom key/value data - I am including the load UDF source below. Note that the UDF does not care about what is inside the sequence file ifself. - This is the Pig script I run: >> register MyUDF.jar; >> data = LOAD 'myfile.seq' USING MyLoader(); >> DUMP data; - IF I use a sequence file containing primitive types (key=int, val=text), the script and my load function behave exactly as expected - However, when I use the intended sequence file, I get the following ClassNotFound error: java.lang.RuntimeException: java.io.IOException: WritableName can't load class: com...DataOutputKey at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:1598) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1548) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:133) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.io.IOException: WritableName can't load class: com...DataOutputKey at org.apache.hadoop.io.WritableName.getClass(WritableName.java:73) at org.apache.hadoop.io.SequenceFile$Reader.getKeyClass(SequenceFile.java:1596) ... 10 more Caused by: java.lang.ClassNotFoundException: com...DataOutputKey at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762) at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71) - My UDF jar does contain the data types that appear to be missing when the sequence file reader is looking for those classes (have also double checked it by manually exploding the jar) - Any suggestions on how to resolve this issue? Thanks, CF === public class MyLoader extends FileInputLoadFunc { private SequenceFileRecordReader<Writable, Writable> recordReader = null; private Writable value; private final ArrayList<Object> parsedTuple = new ArrayList<Object>(50); protected TupleFactory tupleFactory = TupleFactory.getInstance(); @SuppressWarnings("unchecked") @Override public InputFormat getInputFormat() throws IOException { return new SequenceFileInputFormat<Writable, Writable>(); } @SuppressWarnings("unchecked") @Override public void prepareToRead(final RecordReader recordReader, final PigSplit pigSplit) throws IOException { this.recordReader = (SequenceFileRecordReader<Writable, Writable>) recordReader; } @Override public void setLocation(final String location, final Job job) throws IOException { FileInputFormat.setInputPaths(job, location); } private void parseValue(final Writable value) throws InvalidProtocolBufferException { parsedTuple.add("one"); } @Override public Tuple getNext() throws IOException { boolean goOn = true; Tuple t = null; try { goOn = recordReader.nextKeyValue(); } catch (final InterruptedException ie) { throw new IOException(ie); } if (!goOn) { return t; } final Writable value = recordReader.getCurrentValue(); if (value == null) { return t; } parseValue(value); t = tupleFactory.newTuple(parsedTuple); parsedTuple.clear(); goOn = false; return t; } }