Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread Frederic Rentsch
John Nagle wrote: > Felipe Almeida Lessa wrote: > >> On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: >> >> >>> So do you want to remove "&" or replace them with "&" ? If you want >>> to replace it try the following; >>> >> I think he wants to replace them, but just t

Re: SPAM-LOW: Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread Duncan Booth
Andreas Lysdal <[EMAIL PROTECTED]> wrote: >> P.S. apos is handled specially as it isn't technically a >> valid html entity (and Python doesn't include it in its entity >> list), but it is an xml entity and recognised by many browsers so some >> people might use it in html. >> > Hey i fund this

Re: SPAM-LOW: Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread Andreas Lysdal
Duncan Booth skrev: > "Felipe Almeida Lessa" <[EMAIL PROTECTED]> wrote: > > >> On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: >> >>> So do you want to remove "&" or replace them with "&" ? If you >>> want to replace it try the following; >>> >> I think he wants to

Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread John Nagle
Felipe Almeida Lessa wrote: > On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: > >> So do you want to remove "&" or replace them with "&" ? If you want >> to replace it try the following; > > > I think he wants to replace them, but just the invalid ones. I.e., > > This & this &

Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread Duncan Booth
"Felipe Almeida Lessa" <[EMAIL PROTECTED]> wrote: > On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: >> So do you want to remove "&" or replace them with "&" ? If you >> want to replace it try the following; > > I think he wants to replace them, but just the invalid ones. I.e., >

Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread Felipe Almeida Lessa
On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: > So do you want to remove "&" or replace them with "&" ? If you want > to replace it try the following; I think he wants to replace them, but just the invalid ones. I.e., This & this & that would become This & this & that No, i

Re: BeautifulSoup vs. loose & chars

2006-12-26 Thread placid
John Nagle wrote: > I've been parsing existing HTML with BeautifulSoup, and occasionally > hit content which has something like "Design & Advertising", that is, > an "&" instead of an "&". Is there some way I can get BeautifulSoup > to clean those up? There are various parsing options related to