Re: Serde for Cobol Layout to Hive table

Thejas Nair Thu, 28 May 2015 15:52:38 -0700

Yes, please create a jira.
However, there might not be other hive committers who are
knowledgeable enough about this format to be able to review it.
(Unless you are using some apache licensed library to parse the
information).
But you might want to create a jira and give it a try. If nobody is
able to review it, it might also imply that the better place for this
is outside hive project. Maybe a github project that you create with
apache license could be more suited for it.



On Thu, May 28, 2015 at 11:02 AM, Ram Manohar Bheemana
<rammanohar1...@gmail.com> wrote:
> Hi All,
>
> I am a newbie to contributors group, I want to contribute a custom Serde
> for Cobol file conversion. Could you please let me know details on how to
> do it. Do I need to create a JIRA for it?
>
> Below are the details of my idea:
>
> *Use Case:*
> Currently, there are multiple banking and insurance industries who are
> trying to implement their business models on hadoop technologies.
> Main issue they are facing is to convert the existing mainframe file
> layouts to hadoop file layout. This is extremely painful process as there
> multiple complications in conversion.
>
> *Pain Areas:*
> Below are some of the list:
> 1. EBCDIC format to ASCII conversion
> 2. Field separations are based on offset instead of separators.
> 3. array <structures> is determined run time using to previous fields (For
> Eg: WS-FIELD OCCURS 1 TO 50 TIMES DEPENDING ON WS-FIELD-LENGTH. )
> 4. REDEFINES of particular previous fields.
> 5. ...
>
> *Current work around:*
> Most of the industries are adding an additional layers in between mainframe
> and hadoop to convert the formats.
>
> *Proposed Solution:*
> Develop a custom Serde which exhibits below properties:
> 1. Cobol Layout is supplied through TBL PROPERTIES similar to AvroSerde and
> it will build the hive table definition automatically.
> 2. Deserailzer should be able to extract the field data based on the offset
> at runtime.
> 3. EBCDIC to ASCII to conversion is enabled through a property in
> TBLPROPERTIES
> 4. ...
>
> *Benefits:*
> 1. Easier migration from mainframe systems to hadoop
> 2. Removal of additional layers.
> 3. ..
>
> *Example:*
>
> *Input Mainframe file:*
>
> Ram Manohar 10123123123123123123123123123123king
> heheh        5012012012012012comment
> Lippi        3001001001darling
> Kanu         6006006006006006006loving
>
> *Cobol Layout:*
> 01 WS-VAR.
>    05 WS-NAME PIC X(12).
>    05 WS-MARKS-LENGTH PIC 9(2).
>    05 WS-marks OCCURS 0 to 25 TIMES DEPENDING ON WS-MARKS-LENGTH.
>       10 WS-MARK PIC 999.
>    05 WS-NICKNAME PIC X(6)
>
> *Hive DDL:*
>
> CREATE TABLE Cobol2Hive
> ROW FORMAT SERDE 'com.savy3.cobolserde.CobolSerde'
> LOCATION '/home/hduser/hive/warehouse/ram.db/lolol'
> TBLPROPERTIES ('cobol.layout'='01 WS-VAR. 05 WS-NAME PIC X(12). 05
> WS-MARKS-LENGTH PIC 9(2). 05 WS-marks OCCURS 0 to 25 TIMES DEPENDING ON
> WS-MARKS-LENGTH. 10 WS-MARK PIC 999. 05 WS-NICKNAME PIC X(6)');
>
> *Outptut:*
> select * from Cobol2Hive;
> OK
> Ram Manohar
> [{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},{"ws_mark":123},"ws_mark":123}]
> king
> heheh
> [{"ws_mark":12},{"ws_mark":12},{"ws_mark":12},{"ws_mark":12},{"ws_mark":12}]
> comment
> Lippi       [{"ws_mark":1},{"ws_mark":1},{"ws_mark":1}] darlin
> Kanu
> [{"ws_mark":6},{"ws_mark":6},{"ws_mark":6},{"ws_mark":6},{"ws_mark":6},{"ws_mark":6}]
> loving
>
> Please advise me if it is wrong group to post this.
> --
> Thanks,
> Ram Manohar. B

Re: Serde for Cobol Layout to Hive table

Reply via email to