I wrote some custom python parsing scripts using StingRay Reader ( http://stingrayreader.sourceforge.net/cobol.html ) that read in the copybooks and use the results to automatically generate hive table schema based on the source copybook. The EBCDIC data is then extracted to TAB separated ASCII values to load to Hive. Some tables had some very sparse column values, so in those cases, I bundled the sparse data into a catch-all JSON field in the Hive table.
The parser is able to handle both fixed-length records as well as variable-length VB-type records. Let me know if you have any questions regarding Stingray…. From: Nishanth S [mailto:nishanth.2...@gmail.com] Sent: Friday, June 02, 2017 10:07 AM To: user@hive.apache.org Subject: Migrating Variable Length Files to Hive [External Email] ________________________________ Hello hive users, We are looking at migrating files(less than 5 Mb of data in total) with variable record lengths from a mainframe system to hive.You could think of this as metadata.Each of these records can have columns ranging from 3 to n( means each record type have different number of columns) based on record type.What would be the best strategy to migrate this to hive .I was thinking of converting these files into one variable length csv file and then importing them to a hive table .Hive table will consist of 4 columns with the 4th column having comma separated list of values from column column 4 to n.Are there other alternative or better approaches for this solution.Appreciate any feedback on this. Thanks, Nishanth ====================================================================== THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.