SequenceFile.Sorter  design issue and class-check bug
-----------------------------------------------------

                 Key: HADOOP-6513
                 URL: https://issues.apache.org/jira/browse/HADOOP-6513
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.1
         Environment: hadoop 20.1, java 1.6.0_17,fedora
            Reporter: robert Cook


SequenceFile.Writer takes key/value classes as creation arguments and checks 
for validity on every append.
Reader does not take class arguments on creation because they are derived from 
the input file.
Sorter takes key/value classes as creation arguments??  no point.  should be 
derived from input.

In any case, SortPass does not compare Sorter key/value classes with input file 
classes.
No error is given for the following:

          private static void writeTest4(FileSystem fs, int count, int seed, 
Path file, 
                          SequenceFile.CompressionType compressionType, 
CompressionCodec codec, Configuration conf)
            throws IOException {
            fs.delete(file, true);
            LOG.info("creating " + count + " records with " + compressionType +
                     " compression");
            SequenceFile.Writer writer = 
              SequenceFile.createWriter(fs, conf, file, 
                          StringWritable.class, FloatWritable.class, 
compressionType, codec);
            FloatWritable x=new FloatWritable();
            StringWritable y=new StringWritable();
            for (int i = count-1; i >= 0; i--) {
              x.set(i);  y.set(""+i);
              writer.append(y, x);
            }
            writer.close();
          }

          private static void sortTest(FileSystem fs, int count, int megabytes, 
                          int factor, boolean fast, Path file, Configuration 
conf)
          throws IOException {
                  fs.delete(new Path(file+".sorted"), true);
                  SequenceFile.Sorter sorter = newSorter(fs, fast, megabytes, 
factor, conf);
                  LOG.debug("sorting " + count + " records");
                  sorter.sort(file, file.suffix(".sorted"));
                  LOG.info("done sorting " + count + " debug");
          }
          
          private static SequenceFile.Sorter newSorter(FileSystem fs, 
                          boolean fast,
                          int megabytes, int factor, Configuration conf) {
                  SequenceFile.Sorter sorter = 
                          fast
                          ? new SequenceFile.Sorter(fs, new 
IntWritable.Comparator(),
                                          FloatWritable.class, 
IntWritable.class, conf)
                  : new SequenceFile.Sorter(fs, FloatWritable.class, 
IntWritable.class, conf);
                          sorter.setMemory(megabytes * 1024*1024);
                          sorter.setFactor(factor);
                          return sorter;
          }
---------------------Note String/Float  does not match Float/Int
Macintosh-2:datanode bobcook$ od -c file          
0000000    S   E   Q 006 016   S   t   r   i   n   g   W   r   i   t   a
0000020    b   l   e  \r   F   l   o   a   t   W   r   i   t   a   b   l
0000040    e  \0  \0  \0  \0  \0  \0 203   `   n   E   J   z 272   d 352
0000060    w 177 373  \n 364   M 276  \0  \0  \0  \n  \0  \0  \0 006  \0
0000100   \0  \0 001  \0   4   @ 200  \0  \0  \0  \0  \0  \n  \0  \0  \0
0000120  006  \0  \0  \0 001  \0   3   @   @  \0  \0  \0  \0  \0  \n  \0
0000140   \0  \0 006  \0  \0  \0 001  \0   2   @  \0  \0  \0  \0  \0  \0
0000160   \n  \0  \0  \0 006  \0  \0  \0 001  \0   1   ? 200  \0  \0  \0
0000200   \0  \0  \n  \0  \0  \0 006  \0  \0  \0 001  \0   0  \0  \0  \0
*
0000220
Macintosh-2:datanode bobcook$ od -c file.sorted
0000000    S   E   Q 006  \r   F   l   o   a   t   W   r   i   t   a   b
0000020    l   e  \v   I   n   t   W   r   i   t   a   b   l   e  \0  \0
0000040   \0  \0  \0  \0   6 364 343  \r 256   h   U 222 365   T   7   l
0000060  357   i   ~   }  \0  \0  \0  \n  \0  \0  \0 006  \0  \0  \0 001
0000100   \0   4   @ 200  \0  \0  \0  \0  \0  \n  \0  \0  \0 006  \0  \0
0000120   \0 001  \0   3   @   @  \0  \0  \0  \0  \0  \n  \0  \0  \0 006
0000140   \0  \0  \0 001  \0   2   @  \0  \0  \0  \0  \0  \0  \n  \0  \0
0000160   \0 006  \0  \0  \0 001  \0   1   ? 200  \0  \0  \0  \0  \0  \n
0000200   \0  \0  \0 006  \0  \0  \0 001  \0   0  \0  \0  \0  \0        
0000216
NOTE OUTPUT FILE IS TOTALLY TOASTED, but no error was generated!
PS: Your evaluation of my previous bug reports was enlightening.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to