On Wed, Oct 8, 2014 at 10:56 AM, boB Stepp <robertvst...@gmail.com> wrote: > About two years ago I wrote my most ambitious program to date, a > hodge-podge collection of proprietary scripting, perl and shell files > that collectively total about 20k lines of code. Amazingly it actually > works and has saved my colleagues and I much time and effort. At the > time I created this mess, I was playing "guess the correct proprietary > syntax to do something" and "hunt and peck perl" games and squeezing > this programming work into brief snippets of time away from what I am > actually paid to do. I did not give much thought to design at the time > and knew I would regret it later, which is now today! So now in my > current few snippets of time I wish to redesign this program from > scratch and make it much, ... , much easier to maintain the code and > update the data tables, which change from time to time. And now that I > have some version of python available on all of our current Solaris 10 > systems (python versions 2.4.4 and 2.6.4), it seems like a fine time > to (finally!) do some serious python learning. > > Right now I have separated my data into their own files. Previously I > had integrated the data with my source code files (Horrors!). > Currently, a snippet from one of these data files is: > > NUMBER_FX:ONE; DATA_SOURCE:Timmerman; RELEASE_DATE:(11-2012); > > SERIAL_ROI:Chiasm; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; > MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; > SERIAL_ROI:Optic_Nerve_R; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; > MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; > SERIAL_ROI:Optic_Nerve_L; TEST_VOLUME:< 0.2 cc; VOLUME_MAX_GY:8.0; > MAX_PT_DOSE_GY:10.0; MAX_MEAN_DOSE: ; > > [...] > > PARALLEL_ROI:Lungs_Bilateral; CRITICAL_VOLUME_CC:1500.0; > CRITICAL_VOLUME_DOSE_MAX_GY:7.0; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; > PARALLEL_ROI:Lungs_Bilateral; CRITICAL_VOLUME_CC:1000.0; > CRITICAL_VOLUME_DOSE_MAX_GY:7.6; V8GY:< 37.0%; V20GY: ; MAX_MEAN_DOSE: > ; > PARALLEL_ROI:Liver; CRITICAL_VOLUME_CC:700.0; > CRITICAL_VOLUME_DOSE_MAX_GY:11.0; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; > PARALLEL_ROI:Renal_Cortex_Bilateral; CRITICAL_VOLUME_CC:200.0; > CRITICAL_VOLUME_DOSE_MAX_GY:9.5; V8GY: ; V20GY: ; MAX_MEAN_DOSE: ; > [EOF] > > I just noticed that copying from my data file into my Google email > resulted in all extra spaces being condensed into a single space. I do > not know why this has just happened. Note that there are no tab > characters. The [...] indicates omitted lines of serial tissue data > and [EOF] just notes the end-of-file. > > I am far from ready to write any code at this point. I am trying to > organize my data files, so that they will be easy to use by the > programs that will process the data and also to be easily updated > every time these data values get improved upon. For the latter, I > envision writing a second program to enable anyone to update the data > tables when we are given new values. But until that second program > gets written, the data files would have to be opened and edited > manually, which is why I have labels included in all-caps ending in a > colon. This is so the editor will know what he is editing. So, > basically the actual data fields fall between ":" and ";" . String > representations of numbers will need to get converted to floats by the > program. Some fields containing numbers are of a form like "< 0.2 cc" > . These will get copied as is into a GUI display, while the "0.2" will > be used in a computation and/or comparison. Also notice that in each > data file there are two distinct groupings of records--one for serial > tissue (SERIAL_ROI:) and one for parallel tissue (PARALLEL_ROI). The > fields used are different for each grouping. Also, notice that some > fields will have no values, but in other data files they will have > values. And finally the header line at the top of the file identifies > for what number of fractions (FX) the data is to be used for as well > as the source of the data and date that the data was released by that > source. > > Finally the questions! Will I easily be able to use python to parse > this data as currently structured, or do I need to restructure this? I > am not at the point where I am aware of what possibilities python > offers to handle these data files. Also, my efforts to search the 'net > did not turn up anything that really clicked for me as the way to go. > I could not seem to come up with a search string that would bring up > what I was really interested in: What are the best practices for > organizing plain text data? > > Thanks! > > -- > boB > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor
It looks like you have csv like data. Except you have a semicolon as a separator. Look at the csv module. That should work for you -- Joel Goldstick http://joelgoldstick.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor