On 12/04/2014 11:46 PM, C. Ng wrote:
Hi,

Given the sample text file below (where the gibberish represent the irrelevant 
portions) :

....
abcddsdfffgfg
ggfhghghgfhghgh   round 5 xccdcxcfd
sdfdffdfbcvcvbbvnghg score = 0.4533
abcddsdfffgfg     round 5 level = 0.15
ggfhghghgfhghgh   round 10 dfsdfdcdsd
sdfdffdfbcvcvbbvnghg score = 0.4213
sdsdaawddddsds    round 10 level = 0.13
......and so on....


I would like to extract the values for round, score and level:
5 0.4533 0.15
10 0.4213 0.13
....and so on...

Please advise me how it can be done, and what Python functions are useful.

There's lots of ambiguity in that "specification." Can you be sure, for example that the gibberish does not ever include the string "round", "score", or "level"?

Can you be sure that the relevant 3 lines for a given record are adjacent, and in that order? Do you happen to know that "round" always starts in a particular column?, and that "score" starts in another particular column?

How would you solve it by hand?  Something like the following?

OPen the file.
Skip all lines till column 19-23 contain "round"
find the first space delimited field starting in column 25, and call it round_num

On the next line, split the line into words, and save the last word into score_val

On the next line, take a substring of the line starting with column 23, parse it into words, and store the second word in level_num

Save the values round_num,score_val, and level_num in a tuple, or a string, or whatever you find useful, and append it to a result list.

Repeat till end of file.

Lots more error checking is possible, and advisable, but without knowing what the file really looks like, I see no point in guessing.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to