En Fri, 15 Jan 2010 01:56:24 -0300, Eknath Venkataramani
<eknath.i...@gmail.com> escribió:
I have a txt file in the following format:
[code]
"confident" => {
count => 4,
trans => {
"ashahvasahta" => 0.74918568,
"atahmavaishahvaasa" => 0.09095465,
"pahraaram\.nbha" => 0.06990729,
"mailatae" => 0.02856427,
"utanai" => 0.01929341,
"anaa" => 0.01578552,
"uthaanae" => 0.01403157,
"jaitanae" => 0.01227762,
},
},
"consumers" => {
count => 4,
trans => {
"upabhaokahtaa" => 0.75144362,
...
and I need to extract "confident" , "ashahvasahta" from the first
record, "consumers", "upabhaokahtaa" from the second record...
i.e. "word in english" and the "first word in the probable-translations"
The most robust way would be to write a specific parser for such format.
Should be easy using pyparsing http://pyparsing.wikispaces.com/
If you can guarantee certain properties (e.g. lines like "confident",
"consumers" are always in a separate line; translations appear one per
line; no line breaks before/after the => sign, etc.) then you could
process the file line by line, looking at those separators. But only do
that is you are completely sure the format is fixed (e.g. the file is
computer-generated, not human-written). Anyway, it isn't much easier than
writing a real parser, and the latter is a lot more reliable. Learning how
to use a tool like pyparsing is in no way a waste of time.
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list