[ https://issues.apache.org/jira/browse/HIVE-22337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor updated HIVE-22337: ---------------------------------- Attachment: (was: HIVE-22337.2.patch) > Improve and Expand Text-Based SerDes > ------------------------------------ > > Key: HIVE-22337 > URL: https://issues.apache.org/jira/browse/HIVE-22337 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 4.0.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22337.1.patch, HIVE-22337.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > * Add new SerDe package just for text-based formats: > org.apache.hadoop.hive.serde2.text.* > * Add new SerDe package just for text-based log formats: > org.apache.hadoop.hive.serde2.text.log.* > * Create a coherent hierarchy for processing delimited data: AbstractSerDe -> > TextSerDe -> EncodingAwareTextSerde -> DelimitedSerDe -> CsvTestSerDe > * Create a coherent hierarchy for processing regex'ed data: AbstractSerDe -> > TextSerDe -> EncodingAwareTextSerde -> RegexSerDe -> CommonFormatLogSerDe > * Create some standard text processors for super-quick out-of-the-box > processing: TSV SerDe and CSV SerDe > * Create some standard log processors for super-quick out-of-the-box > processing: Apache Common Log Format and Apache Combined Log Format (Apache > HTTP Server Log Parsers) > * Better default behaviors for processing text > The default behavior should allow users to quick query data without any > failures. > # When a blank line is encountered, insert a 'null' value for each column > # When there are fewer fields in the data than defined in the table schema, > shift all available fields left, and fill in 'null' values for all remaining > fields > # When there are too many fields in the data, the last field in the results > will contain all remaining values. Currently, the data is silently swallows > and a warning is issued in the YARN logs. A normal user will never see this > warning, especially if the job completes successfully. Better to (by > default) provide them all the data than to hide anything. > {code:none|title=CSV SerDe} > "1,2,3" = ["1","2","3"] > "1,2," = ["1","2",null] > "" = [null,null,null] > "1,2,3,4" = ["1","2","3,4"] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)