Does anybody have any updates on this?
How can we have our own RecordReader in Hadoop pipes? When I try to print
the "context.getInputSplit", I get the filenames along with some junk
characters. As a result the file open fails.
Anybody got it working?
Viral.
11 Nov. wrote:
>
> I traced into the c++ recordreader code:
> WordCountReader(HadoopPipes::MapContext& context) {
> std::string filename;
> HadoopUtils::StringInStream stream(context.getInputSplit());
> HadoopUtils::deserializeString(filename, stream);
> struct stat statResult;
> stat(filename.c_str(), &statResult);
> bytesTotal = statResult.st_size;
> bytesRead = 0;
> cout << filename<<endl;
> file = fopen(filename.c_str(), "rt");
> HADOOP_ASSERT(file != NULL, "failed to open " + filename);
> }
>
> I got nothing for the filename virable, which showed the InputSplit is
> empty.
>
> 2008/3/4, 11 Nov. <[email protected]>:
>>
>> hi colleagues,
>> I have set up the single node cluster to test pipes examples.
>> wordcount-simple and wordcount-part work just fine. but
>> wordcount-nopipe can't run. Here is my commnad line:
>>
>> bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input
>> input/ -output out-dir-nopipe1
>>
>> and here is the error message printed on my console:
>>
>> 08/03/03 23:23:06 WARN mapred.JobClient: No job jar file set. User
>> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
>> 08/03/03 23:23:06 INFO mapred.FileInputFormat: Total input paths to
>> process : 1
>> 08/03/03 23:23:07 INFO mapred.JobClient: Running job:
>> job_200803032218_0004
>> 08/03/03 23:23:08 INFO mapred.JobClient: map 0% reduce 0%
>> 08/03/03 23:23:11 INFO mapred.JobClient: Task Id :
>> task_200803032218_0004_m_000000_0, Status : FAILED
>> java.io.IOException: pipe child exception
>> at org.apache.hadoop.mapred.pipes.Application.abort(
>> Application.java:138)
>> at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(
>> PipesMapRunner.java:83)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
>> at org.apache.hadoop.mapred.TaskTracker$Child.main(
>> TaskTracker.java:1787)
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> at
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java
>> :313)
>> at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java
>> :335)
>> at
>> org.apache.hadoop.mapred.pipes.BinaryProtocol$UplinkReaderThread.run(
>> BinaryProtocol.java:112)
>>
>> task_200803032218_0004_m_000000_0:
>> task_200803032218_0004_m_000000_0:
>> task_200803032218_0004_m_000000_0:
>> task_200803032218_0004_m_000000_0: Hadoop Pipes Exception: failed to open
>> at /home/hadoop/hadoop-0.15.2-single-cluster
>> /src/examples/pipes/impl/wordcount-nopipe.cc:67 in
>> WordCountReader::WordCountReader(HadoopPipes::MapContext&)
>>
>>
>> Could anybody tell me how to fix this? That will be appreciated.
>> Thanks a lot!
>>
>
>
--
View this message in context:
http://www.nabble.com/Pipes-example-wordcount-nopipe.cc-failed-when-reading-from-input-splits-tp15807856p24084734.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.