Harsh J created HIVE-13275:
------------------------------

             Summary: Add a toString method to BytesRefArrayWritable
                 Key: HIVE-13275
                 URL: https://issues.apache.org/jira/browse/HIVE-13275
             Project: Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 1.1.0
            Reporter: Harsh J
            Assignee: Harsh J
            Priority: Trivial
         Attachments: HIVE-13275.000.patch

RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
Streaming generally relies on the K/V pairs to be able to emit text 
representations (via toString()).

Since BytesRefArrayWritable has no toString() methods, the usage of the 
RCFileInputFormat causes object representation prints which are not useful.

Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
array), so its important to output them in a valid/parseable manner, as opposed 
to choosing a simple joining delimiter over the string representations of the 
inner elements.

I propose adding a standardised CSV formatting of the array data, such that 
users of Streaming can then parse the results in their own script. Since we 
have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to