Re: Efficient: put Content of HTML file into mysql database

2007-11-19 Thread Fabian López
with this text. Is it enough? it is such an easy web crawler. Maybe I can save it without downloading the file, can I? Thanks Fabian 2007/11/19, Jesse Jaggars <[EMAIL PROTECTED]>: > > Fabian López wrote: > > Hi colegues, > > do you know the most efficient way to put the

Efficient: put Content of HTML file into mysql database

2007-11-19 Thread Fabian López
Hi colegues, do you know the most efficient way to put the content of an html file into a mySQL database?Could it be this one?: 1.- I have the html document in my hard disk. 2.- Then I Open the file (maybe with fopen??) 3.- Read the content (fread or similar) 4.- Write all the content it in a SQL s

crawler in python and mysql

2007-11-12 Thread Fabian López
Hi, I would like to write a code that needs to crawl an url and take all the HTML code. I have noticed that there are different opensource webcrawlers, but they are very extensive for what I need. I only need to crawl an url, and don't know if it is so easy as using an html parser. Is it? Which lib

Re: ignoring chinese characters parsing xml file

2007-10-23 Thread Fabian López
here, but it's a good idea for next goals. Thanks a lot! Fabian 2007/10/23, limodou <[EMAIL PROTECTED]>: > > On 10/23/07, Stefan Behnel <[EMAIL PROTECTED]> wrote: > > Fabian López wrote: > > > Thanks Mark, the code is like this. The attrib name is

Re: ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
ot;], elem.attrib["rssUrl"] And the xml file like: http://weblogli.com " when="4" /> 22 Oct 2007 20:20:16 GMT, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]>: > > On Mon, 22 Oct 2007 21:24:40 +0200, Fabian López wrote: > > > I am parsing

ignoring chinese characters parsing xml file

2007-10-22 Thread Fabian López
Hi, I am parsing an XML file that includes chineses characters, like ^ �u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: UnicodeEncodeerror:'charmap' codec can't encode characters in position The thing is that I would like to ignore it and parse all the characters less