On Nov 15, 4:41 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > On Sat, 15 Nov 2008 11:41:17 -0800, Ethan Furman <[EMAIL PROTECTED]> > declaimed the following in comp.lang.python: > > > > > len wrote: > <snip> > > > > Files are fixed format no field delimiters, fields are position and > > > length > > > records are terminated by newline. In cobol the read statement which > > > read > > > a record from the file automaticly mapped the date to the fieldnames > > > in > > > the cobol file definition. > > Sounds like standard COBOL record definitions. Next factor would be > if they are text format (human readable) or COBOL binary format (and if > so, are they using comp-1 integers or COBOL standard packed decimal?)... > Given the mention of new-line termination, probably not binary (though > technically, COBOL's fixed width files probably don't even require a > new-line). > > In either event, use of the struct module to break the input record > into a cluster of Python strings is probably useful, and may be more > efficient than a series of string slicing operations. > > Also, if the conversion is from file direct to database, it is > likely safe to leave most of the fields in text format; since MySQLdb > passes everything as delimited strings in the INSERT statement -- which > convert from "123.5" to float("123.5") -> 123.5 only to have the > cursor.execute() convert it back to "123.5" > > Exception: might want to convert date/time fields into Python > date/time objects and let MySQLdb handle conversion to/from MySQL > datetime formats. > > > > > Are the cobol file definitions available in a file that can be parsed, > > or are they buried in the source code? > > Hmmm, ever seen COBOL source? <G> > > Nothing is buried in COBOL -- the data section should have nicely > laid out record representations... (it's been some time, so this is > pseudo-COBOL) > > 01 MYRECORD > 03 NAME PIC A(50) > 03 DATE > 05 MONTH PIC 99 > 05 DAY PIC 99 > 05 YEAR PIC 9999 > 03 AGE PIC 999 > 03 ADDRESS > 05 STREET PIC X(50) > 05 CITY PIC A(50) > 05 STATE PIC A(50) > 05 ZIP PIC 99999-9999 > > > What type of data is in the files? Integer, float, character, date, etc. > > If new-line terminated, likely all is human readable text -- see my > above comment re: numeric conversions and MySQL > > > Once you have the data out, will you need access these same cobol files > > in the future? (i.e. more data is being added to them that you will > > need to migrate) > > That is what I considered key also... > > Best would be a one-time conversion -- once the new applications > have been checked out -- meaning the converter may be used multiple > times during development and testing of the new applications (to refresh > the development database with production data), but that in the end the > files become defunct and the new input process directly loads to the > production database. > > No indication of what type of processes the existing COBOL > application is performing, but I can easily visualize a pre-database > processing style, using sorted input files, with parallel readings > > read EMPLOYEE (with salary rate) > read TIMECARD (with hours) > > while EMPLOYEE.ID < TIMECARD.ID > write EXCEPTION No timecard for EMPLOYEE > read EMPLOYEE > while TIMECARD.ID < EMPLOYEE.ID > write EXCEPTION No employee for TIMECARD > read TIMECARD > > compute and write paycheck > > repeat until EOF on both EMPLOYEE and TIMECARD > > {side note: apologies for piggy-backing -- the original poster is using > an address that my filters are set to kill; as most of the spam on this > group has the same domain} > -- > Wulfraed Dennis Lee Bieber KD6MOG > [EMAIL PROTECTED] [EMAIL PROTECTED] > HTTP://wlfraed.home.netcom.com/ > (Bestiaria Support Staff: [EMAIL PROTECTED]) > HTTP://www.bestiaria.com/
If anyone is interested I have just posted on the group under the title 'Newbie code review of parsing program Please' Len -- http://mail.python.org/mailman/listinfo/python-list