Text.toString violates its abstraction
--------------------------------------

                 Key: HADOOP-6883
                 URL: https://issues.apache.org/jira/browse/HADOOP-6883
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.1
         Environment: Linux
            Reporter: Gordon Sommers


I stumbled upon this when encoding a google protocol buffer in base64, and 
storing it in a Text object for serialization. Compare the following two lines:

byte [] decoded = b64.decode(val.getBytes())
//this does not return the same bytes as below and the result, after decoding 
the base64 successfully, is a very mangled protocol buffer

byte [] decoded = b64.decode(val.toString().getBytes());
//YES, toString() FIXES IT

Elsewhere in my code I also have: 
Text curline = new Text(values.next().toString());
byte [] raw = base64.decode(curline.getBytes());
//This does work.

It looks like the Text object must be toString'd (just once, somewhere, even if 
its later repacked in a Text) before it will have the proper byte 
representation. I would classify this as a leaky abstraction and ask that the 
reason please be isolated and the api fixed somehow so that other developers 
dont have to spend 3 days figuring out when Text.getBytes isn't returning the 
right bytes even though Text.toString prints exactly the right string 
representation and Text.toString.getBytes does return the right bytes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to