On 12/04/2014 11:46 PM, C. Ng wrote:
Hi,
Given the sample text file below (where the gibberish represent the irrelevant
portions) :
....
abcddsdfffgfg
ggfhghghgfhghgh round 5 xccdcxcfd
sdfdffdfbcvcvbbvnghg score = 0.4533
abcddsdfffgfg round 5 level = 0.15
ggfhghghgfhghgh round 10 dfsdfdcdsd
sdfdffdfbcvcvbbvnghg score = 0.4213
sdsdaawddddsds round 10 level = 0.13
......and so on....
I would like to extract the values for round, score and level:
5 0.4533 0.15
10 0.4213 0.13
....and so on...
Please advise me how it can be done, and what Python functions are useful.
There's lots of ambiguity in that "specification." Can you be sure, for
example that the gibberish does not ever include the string "round",
"score", or "level"?
Can you be sure that the relevant 3 lines for a given record are
adjacent, and in that order? Do you happen to know that "round" always
starts in a particular column?, and that "score" starts in another
particular column?
How would you solve it by hand? Something like the following?
OPen the file.
Skip all lines till column 19-23 contain "round"
find the first space delimited field starting in column 25, and call it
round_num
On the next line, split the line into words, and save the last word into
score_val
On the next line, take a substring of the line starting with column 23,
parse it into words, and store the second word in level_num
Save the values round_num,score_val, and level_num in a tuple, or a
string, or whatever you find useful, and append it to a result list.
Repeat till end of file.
Lots more error checking is possible, and advisable, but without knowing
what the file really looks like, I see no point in guessing.
--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list