In article <roy-df05da.11460324122...@news.panix.com>, Roy Smith <r...@panix.com> wrote: >In article <rn%Bs.693798$nB6.605938@fx21.am4>, > Alister <alister.w...@ntlworld.com> wrote: > >> Indeed due to the poor quality of most websites it is not possible to be >> 100% accurate for all sites. >> >> personally I would start by checking the doc type & then the meta data as >> these should be quick & correct, I then use chardectect only if these >> fail to provide any result. > >I agree that checking the metadata is the right thing to do. But, I >wouldn't go so far as to assume it will always be correct. There's a >lot of crap out there with perfectly formed metadata which just happens >to be wrong. > >Although it pains me greatly to quote Ronald Reagan as a source of >wisdom, I have to admit he got it right with "Trust, but verify". It's
Not surprisingly, as an actor, Reagan was as good as his script. This one he got from Stalin. >the only way to survive in the unicode world. Write defensive code. >Wrap try blocks around calls that might raise exceptions if the external >data is borked w/r/t what the metadata claims it should be. The way to go, of course. Groetjes Albert -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst -- http://mail.python.org/mailman/listinfo/python-list