En Fri, 15 Jan 2010 01:56:24 -0300, Eknath Venkataramani <eknath.i...@gmail.com> escribió:

I have a txt file in the following format:

[code]
"confident" => {
  count => 4,
  trans => {
     "ashahvasahta" => 0.74918568,
    "atahmavaishahvaasa" => 0.09095465,
    "pahraaram\.nbha" => 0.06990729,
         "mailatae" => 0.02856427,
           "utanai" => 0.01929341,
             "anaa" => 0.01578552,
         "uthaanae" => 0.01403157,
         "jaitanae" => 0.01227762,
    },
},
"consumers" => {
  count => 4,
  trans => {
    "upabhaokahtaa" => 0.75144362,
...

and I need to extract "confident" , "ashahvasahta" from the first
record, "consumers",  "upabhaokahtaa" from the second record...
i.e. "word in english" and the "first word in the probable-translations"

The most robust way would be to write a specific parser for such format. Should be easy using pyparsing http://pyparsing.wikispaces.com/

If you can guarantee certain properties (e.g. lines like "confident", "consumers" are always in a separate line; translations appear one per line; no line breaks before/after the => sign, etc.) then you could process the file line by line, looking at those separators. But only do that is you are completely sure the format is fixed (e.g. the file is computer-generated, not human-written). Anyway, it isn't much easier than writing a real parser, and the latter is a lot more reliable. Learning how to use a tool like pyparsing is in no way a waste of time.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to