About two years ago I wrote my most ambitious program to date, a hodge-podge collection of proprietary scripting, perl and shell files that collectively total about 20k lines of code. Amazingly it actually works and has saved my colleagues and I much time and effort. At the time I created this mess, I was playing "guess the correct proprietary syntax to do something" and "hunt and peck perl" games and squeezing this programming work into brief snippets of time away from what I am actually paid to do. I did not give much thought to design at the time and knew I would regret it later, which is now today! So now in my current few snippets of time I wish to redesign this program from scratch and make it much, ... , much easier to maintain the code and update the data tables, which change from time to time. And now that I have some version of python available on all of our current Solaris 10 systems (python versions 2.4.4 and 2.6.4), it seems like a fine time to (finally!) do some serious python learning.
Right now I have separated my data into their own files. Previously I had integrated the data with my source code files (Horrors!). Currently, a snippet from one of these data files is: NUMBER_FX:ONE; DATA_SOURCE:Timmerman; RELEASE_DATE:(11-2012); SERIAL_ROI:Chiasm; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; SERIAL_ROI:Optic_Nerve_R; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; SERIAL_ROI:Optic_Nerve_L; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; [...] PARALLEL_ROI:Lungs_Bilateral; CRITICAL_VOLUME_CC:1500.0; CRITICAL_VOLUME_DOSE_MAX_GY:7.0; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; PARALLEL_ROI:Lungs_Bilateral; CRITICAL_VOLUME_CC:1000.0; CRITICAL_VOLUME_DOSE_MAX_GY:7.6; V8GY:< 37.0%; V20GY: ; MAX_MEAN_DOSE: ; PARALLEL_ROI:Liver; CRITICAL_VOLUME_CC:700.0; CRITICAL_VOLUME_DOSE_MAX_GY:11.0; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; PARALLEL_ROI:Renal_Cortex_Bilateral; CRITICAL_VOLUME_CC:200.0; CRITICAL_VOLUME_DOSE_MAX_GY:9.5; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; [EOF] I just noticed that copying from my data file into my Google email resulted in all extra spaces being condensed into a single space. I do not know why this has just happened. Note that there are no tab characters. The [...] indicates omitted lines of serial tissue data and [EOF] just notes the end-of-file. I am far from ready to write any code at this point. I am trying to organize my data files, so that they will be easy to use by the programs that will process the data and also to be easily updated every time these data values get improved upon. For the latter, I envision writing a second program to enable anyone to update the data tables when we are given new values. But until that second program gets written, the data files would have to be opened and edited manually, which is why I have labels included in all-caps ending in a colon. This is so the editor will know what he is editing. So, basically the actual data fields fall between ":" and ";" . String representations of numbers will need to get converted to floats by the program. Some fields containing numbers are of a form like "< 0.2 cc" . These will get copied as is into a GUI display, while the "0.2" will be used in a computation and/or comparison. Also notice that in each data file there are two distinct groupings of records--one for serial tissue (SERIAL_ROI:) and one for parallel tissue (PARALLEL_ROI). The fields used are different for each grouping. Also, notice that some fields will have no values, but in other data files they will have values. And finally the header line at the top of the file identifies for what number of fractions (FX) the data is to be used for as well as the source of the data and date that the data was released by that source. Finally the questions! Will I easily be able to use python to parse this data as currently structured, or do I need to restructure this? I am not at the point where I am aware of what possibilities python offers to handle these data files. Also, my efforts to search the 'net did not turn up anything that really clicked for me as the way to go. I could not seem to come up with a search string that would bring up what I was really interested in: What are the best practices for organizing plain text data? Thanks! -- boB _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor