On Tue, May 28, 2013 at 8:45 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
> That does not really make sense. Your breaking the layered approache. > InputFormats read/write data, serdes interpret data based on the table > definition. its like asking "Why can't my input format run assembly code?" > The current model of: SerDe Input/OutputFormat FileSystem does well for text formats, but otherwise limits the input/output formats to doing binary data. That creates problems if the Input/OutputFormat has an integrated serialization mechanism. For example, ORC requires its SerDe and the OrcSerde just passes along the values through serialize and deserialize. Also note that other formats like SequenceFile are restricted because the SerDe is placed above the FileFormat. Hive's SequenceFile input format discards the key and requires the value to be Text or BytesWritable. That covers many cases, but certainly not all. On the other hand, if it was Hive's SequenceFile InputFormat that was creating the ObjectInspector, it could actually handle more complex types and let Hive usefully read a wider range of SequenceFiles. I would propose that it would be better to push SerDes down into the Input/OutputFormats that can be parameterized by the serialization. Using them for TextInput/OutputFormat and HBaseTableInput/OutputFormat makes a lot of sense, but in general that isn't true. -- Owen