[ https://issues.apache.org/jira/browse/HIVE-22337?focusedWorklogId=442365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442365 ]
ASF GitHub Bot logged work on HIVE-22337: ----------------------------------------- Author: ASF GitHub Bot Created on: 07/Jun/20 00:26 Start Date: 07/Jun/20 00:26 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #815: URL: https://github.com/apache/hive/pull/815#issuecomment-640136438 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 442365) Time Spent: 0.5h (was: 20m) > Improve and Expand Text-Based SerDes > ------------------------------------ > > Key: HIVE-22337 > URL: https://issues.apache.org/jira/browse/HIVE-22337 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 4.0.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22337.1.patch, HIVE-22337.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > * Add new SerDe package just for text-based formats: > org.apache.hadoop.hive.serde2.text.* > * Add new SerDe package just for text-based log formats: > org.apache.hadoop.hive.serde2.text.log.* > * Create a coherent hierarchy for processing delimited data: AbstractSerDe -> > TextSerDe -> EncodingAwareTextSerde -> DelimitedSerDe -> CsvTextSerDe > * Create a coherent hierarchy for processing regex'ed data: AbstractSerDe -> > TextSerDe -> EncodingAwareTextSerde -> RegexSerDe -> CommonFormatLogSerDe > * Create some standard text processors for super-quick out-of-the-box > processing: TSV SerDe and CSV SerDe > * Create some standard log processors for super-quick out-of-the-box > processing: Apache Common Log Format and Apache Combined Log Format (Apache > HTTP Server Log Parsers) > * Better default behaviors for processing text > The default behavior should allow users to quick query data without any > failures. > # When a blank line is encountered, insert a 'null' value for each column > # When there are fewer fields in the data than defined in the table schema, > shift all available fields left, and fill in 'null' values for all remaining > fields > # When there are too many fields in the data, the last field in the results > will contain all remaining values. Currently, the data is silently swallows > and a warning is issued in the YARN logs. A normal user will never see this > warning, especially if the job completes successfully. Better to (by > default) provide them all the data than to hide anything. > {code:none|title=CSV SerDe} > "1,2,3" = ["1","2","3"] > "1,2," = ["1","2",null] > "" = [null,null,null] > "1,2,3,4" = ["1","2","3,4"] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)