[ https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J reassigned HIVE-13275: ------------------------------ Assignee: (was: Harsh J) > Add a toString method to BytesRefArrayWritable > ---------------------------------------------- > > Key: HIVE-13275 > URL: https://issues.apache.org/jira/browse/HIVE-13275 > Project: Hive > Issue Type: Improvement > Components: File Formats, Serializers/Deserializers > Affects Versions: 1.1.0 > Reporter: Harsh J > Priority: Trivial > Attachments: HIVE-13275.000.patch > > > RCFileInputFormat cannot be used externally for Hadoop Streaming today cause > Streaming generally relies on the K/V pairs to be able to emit text > representations (via toString()). > Since BytesRefArrayWritable has no toString() methods, the usage of the > RCFileInputFormat causes object representation prints which are not useful. > Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an > array), so its important to output them in a valid/parseable manner, as > opposed to choosing a simple joining delimiter over the string > representations of the inner elements. > I propose adding a standardised CSV formatting of the array data, such that > users of Streaming can then parse the results in their own script. Since we > have OpenCSV as a dependency already, we can make use of it for this purpose. -- This message was sent by Atlassian JIRA (v7.6.3#76005)