Extract information from HTML table
Hello, I'm trying to extract the data from HTML table. Here is the part of the HTML source : """ Sat, 31.03.2007 - 20:24:00 http://s2.bitefight.fr/bite/ bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628033">Vous avez tendu une embuscade à votre victime ! Sat, 31.03.2007 - 20:14:35 http://s2.bitefight.fr/bite/ bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628007">Vous avez tendu une embuscade à votre victime ! Sat, 31.03.2007 - 20:11:39 Vous avez bien accompli votre tâche de Gardien de Cimetière et vous vous voyez remis votre salaire comme récompense. Vous recevez 320 et collectez 3 d'expérience ! """ I would like to transform this in following thing : Date : Sat, 31.03.2007 - 20:24:00 ContainType : Link LinkText : Vous avez tendu une embuscade à votre victime ! LinkURL : http://s2.bitefight.fr/bite/bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628033 Date : Sat, 31.03.2007 - 20:14:35 ContainType : Link LinkText : Vous avez tendu une embuscade à votre victime ! LinkURL : http://s2.bitefight.fr/bite/bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628007 Date : Sat, 31.03.2007 - 20:14:35 ContainType : Text Contain : Vous avez bien accompli votre tâche de Gardien de Cimetière et vous vous voyez remis votre salaire comme récompense. Vous recevez 320 et collectez 3 d'expérience ! Do you know the way to do it ? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Extract information from HTML table
On Apr 1, 2:52 pm, [EMAIL PROTECTED] wrote: > On Apr 1, 3:13 pm, "Ulysse" <[EMAIL PROTECTED]> wrote: > > > Hello, > > > I'm trying to extract the data from HTML table. Here is the part of > > the HTML source : > > > > > > Do you know the way to do it ? > > Beautiful Soup is an easy way to parse HTML (that may be > broken).http://www.crummy.com/software/BeautifulSoup/ > > Here's a start of a parser for your HTML: > > soup = BeautifulSoup(txt) > for tr in soup('tr'): > dateTd, textTd = tr('td')[1:] > print 'Date :', dateTd.contents[0].strip() > print textTd #element still needs parsing > > where txt is the string in your message. I have seen the Beautiful Soup online help and tried to apply that to my problem. But it seems to be a little bit hard. I will rather try to do this with regular expressions... -- http://mail.python.org/mailman/listinfo/python-list
Clean "Durty" strings
Hello, I need to clean the string like this : string = """ bonne mentalité mec!:) \nbon pour info moi je suis un serial posteur arceleur dictateur ^^* \nmais pour avoir des resultats probant il faut pas faire les mariolles, comme le "fondateur" de bvs krew \n mais pour avoir des resultats probant il faut pas faire les mariolles, comme le "fondateur" de bvs krew \n """ into : bonne mentalité mec!:) bon pour info moi je suis un serial posteur arceleur dictateur ^^* mais pour avoir des resultats probant il faut pas faire les mariolles, comme le "fondateur" de bvs krew mais pour avoir des resultats probant il faut pas faire les mariolles, comme le "fondateur" de bvs krew To do this I wold like to use only strandard librairies. Thanks -- http://mail.python.org/mailman/listinfo/python-list
Launch script on Linux using Putty
Hello, I have a python script which runs all the time (using of library threading). I would like this scipt to run on a remote linux Os using Putty. The problem is, when I close Putty command line window running on my Win PC, the python script stops to run too. I tried to use cron tables instead. By setting the time and restart cron process, but it's not practical. Do you know the right way to do this ? Regards -- http://mail.python.org/mailman/listinfo/python-list
Re: Launch script on Linux using Putty
On Apr 2, 12:56 am, Michael Hoffman <[EMAIL PROTECTED]> wrote: > Ulysse wrote: > > Hello, > > > I have a python script which runs all the time (using of library > > threading). I would like this scipt to run on a remote linux Os using > > Putty. The problem is, when I close Putty command line window running > > on my Win PC, the python script stops to run too. > > > I tried to use cron tables instead. By setting the time and restart > > cron process, but it's not practical. > > > Do you know the right way to do this ? > > There are a few ways to do this, in order of easiest to most involved: > > 1. The easiest is to run nohup on your script in the background: > > $ nohup myscript.py > output.txt 2> error.txt & > > Then you can disconnect but your script will keep running. Try man nohup > for more information. > > 2. Use GNU screen on your remote terminal, and detach the screen instead > of logging off. > > 3. Set up your script to fork as a daemon. Google for ["python cookbook" > fork daemon] to find a few recipes for this. > -- > Michael Hoffman Thanks a lot but in my situation : 1. nohup seems not to be installed on my "reduced linux distribution". It's a OpenWrt tunning on my WRT54GL Broadband router. 2. I have looked for the way I can "detach the screen" with Putty but I've not found (May be you can precise ?) 3. The "fork daemon" script found on http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278731 seems to be as huge as my own script and little bit hard to undestand. So maybe "detach the screen" ? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Launch script on Linux using Putty
On Apr 2, 8:54 pm, Michael Hoffman <[EMAIL PROTECTED]> wrote: > [Michael Hoffman] > > >> If you are running bash, you can do this: > > [Grant Edwards] > > > He's not running bash. He's running busybox's shell. > > There's a nohup applet for busybox. > > > [He'd be far better off asking his question in an OpenWRT or > > Busybox forum, since it's got absolutely nothing to do with > > Python.] > > I was going to say that originally, but then I realized that the daemon > solution is on-topic. So is masking SIGHUP. > -- > Michael Hoffman Actually the "./myscript.py &" command seems to work well. I can close the Putty console, and then after logging the command "top" show me that my process is still running. -- http://mail.python.org/mailman/listinfo/python-list
Re: Extract information from HTML table
On Apr 2, 9:28 pm, [EMAIL PROTECTED] (Cameron Laird) wrote: > In article <[EMAIL PROTECTED]>, > > > > anjesh <[EMAIL PROTECTED]> wrote: > >On Apr 2, 12:54 am, "Dotan Cohen" <[EMAIL PROTECTED]> wrote: > >> On 1 Apr 2007 07:56:04 -0700, Ulysse <[EMAIL PROTECTED]> wrote: > > >> > I have seen the Beautiful Soup online help and tried to apply that to > >> > my problem. But it seems to be a little bit hard. I will rather try to > >> > do this with regular expressions... > > >> If you think that Beautiful Soup is difficult than wait till you try > >> to do this with regexes. Granted you know the exact format of the HTML > >> you are scraping will help, if you ever need to parse HTML from an > >> unknown source than Beautiful Soup is the only way to go. Not all HTML > >> authors close their td and tr tags, and sometimes there are attributes > >> to those tags. If you plan on ever reusing the code or the format of > >> the HTML may change, then you are best off sticking with Beautiful > >> Soup. > > >> Dotan Cohen > > >>http://lyricslist.com/http://what-is-what.com/ > > >Have you tried HTMLParser. It can do the task you want to perform > >http://docs.python.org/lib/module-HTMLParser.html > > >-anjesh > > Yes, except that these last two follow-ups UNDERstate the difficulty--in > fact, the impossibility--of achieving adequate results on this problem > with regular expressions. We'll help with the documentation for HTMLParser > and BeautifulSoup. REs are an invitation to madness. > > http://www.unixreview.com/documents/s=10121/ur0702e/> might amuse > those who want to think more about REs. r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2})\W*? \W*?(.*?).*?' r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2}).*?player\.php.*?>(.*?).*?(.*?)' r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2})\W*? \W*?Message au clan de :([a-zA-Z0-9_\-]+?)\W*(.*?)' These three REs extract all data I need. That not exactly apply to the given string. I read the article but I didn't understood why REs are invitation to madness... -- http://mail.python.org/mailman/listinfo/python-list
Remote Command a Python Script
Hello, I've installed Python 2.5 on my WRT54G Linksys Router. On this router a script is executed. This script write a little Pickle database in the router memory. I would like to write another Python script which will be able to : 1. Stop and start the remote script from my Windows Computer. At present I use Putty to connect to the router by the SSL, then I manually kill the python process. 2. Retrieve the little database located in router memory and backup it on my Window PC. At present I use WinSCP (like FTP) to get the pickle file. Can you help me with that (modules to use, useful code snippets) Thank a lot, Maxime -- http://mail.python.org/mailman/listinfo/python-list
Code to send RSS news by Sendmail
Hello, I'm searching a code which allow you to parse each item in the RSS feed, get the news page of each item, convert it to text and send it by mail. Do you know if it exists ? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Problems with joining Unicode strings
Hello, I have problems with joining strings. My program get web page fragments, then joins them into one single web page. I have error when I try to join these fregments : "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 208: ordinal not in range(128)" Here is my code : # Parcours des RSS f = open('soupresult.html', 'w') d = feedparser.parse(rss_url) separator = '' data = '' refresh_time = 2*60*60 #Secondes resume_news = [] # Parcours des RSS if len(d['items']) > 0: now = datetime.datetime.now() for item in d['items']: item_date = item.date_parsed print item_date item_date2 = datetime.datetime(item_date[0], item_date[1], item_date[2], item_date[3], item_date[4], item_date[5]) age_item = now - item_date2 age_item = age_item.days*24*3600 + age_item.seconds if age_item < refresh_time: url = item['link'] print item.title try: req = urllib2.Request(url) browser = urllib2.urlopen(req) data = browser.read() clean_item = data resume_news.append(item.title) resume_news.append(clean_item) except urllib2.URLError, e: print e.code f.write(u''.join(resume_news)) -- http://mail.python.org/mailman/listinfo/python-list