Thank you, I tried again and I figured it out.
That's something with beautiful soup, I worked with it a year ago also
dealing with Chinese html pages and nothing error happened. I read the
old code and I find the difference. Change the page to unicode before
feeding to beautiful soup, then every
After looking at the pyparsing results, I think I see the problem with
your original code. You are selecting only the characters after the
rightmost "-" character, but you really want to select everything to
the right of "- -". In some of the titles, the encoded Chinese
includes a "-" charac
On Jan 27, 5:18 am, "Frank Potter" <[EMAIL PROTECTED]> wrote:
> There are ten web pages I want to deal with.
> fromhttp://www.af.shejis.com/new_lw/html/125926.shtml
> to http://www.af.shejis.com/new_lw/html/125936.shtml
>
> Each of them uses the charset of Chinese "gb2312", and firefox
> displ
There are ten web pages I want to deal with.
from http://www.af.shejis.com/new_lw/html/125926.shtml
to http://www.af.shejis.com/new_lw/html/125936.shtml
Each of them uses the charset of Chinese "gb2312", and firefox
displays all of them in the right form, that's readable Chinese.
My job is,