Re: extracting from web pages but got disordered words sometimes

Paul McGuire Sat, 27 Jan 2007 11:31:02 -0800

After looking at the pyparsing results, I think I see the problem with 
your original code.  You are selecting only the characters after the 
rightmost "-" character, but you really want to select everything to 
the right of "- -".  In some of the titles, the encoded Chinese 
includes a "-" character, so you are chopping off everything before 
that.


Try changing your code to:
    title=full_title.split("- -")[1]

I think then your original program will work.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: extracting from web pages but got disordered words sometimes

Reply via email to