Extract information from HTML table

2007-04-01 Thread Ulysse
Hello,

I'm trying to extract the data from HTML table. Here is the part of
the HTML source :
"""

  

  
Sat, 31.03.2007 - 20:24:00
  
http://s2.bitefight.fr/bite/
bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628033">Vous
avez tendu une embuscade à votre victime !


  

  
Sat, 31.03.2007 - 20:14:35
  
http://s2.bitefight.fr/bite/
bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628007">Vous
avez tendu une embuscade à votre victime !


  

  
Sat, 31.03.2007 - 20:11:39
   Vous avez bien accompli votre
tâche de Gardien de Cimetière et vous vous
voyez remis votre salaire comme récompense.
Vous recevez 320

et collectez 3 d'expérience !

"""

I would like to transform this in following thing :

Date : Sat, 31.03.2007 - 20:24:00
ContainType : Link
LinkText : Vous avez tendu une embuscade à votre victime !
LinkURL : 
http://s2.bitefight.fr/bite/bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628033

Date : Sat, 31.03.2007 - 20:14:35
ContainType : Link
LinkText : Vous avez tendu une embuscade à votre victime !
LinkURL : 
http://s2.bitefight.fr/bite/bericht.php?q=01bf0ba7258ad976d890379f987d444e&beid=2628007

Date : Sat, 31.03.2007 - 20:14:35
ContainType : Text
Contain : Vous avez bien accompli votre tâche de Gardien de Cimetière
et vous vous
voyez remis votre salaire comme récompense.
Vous recevez 320 et collectez 3 d'expérience !



Do you know the way to do it ?

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Extract information from HTML table

2007-04-01 Thread Ulysse
On Apr 1, 2:52 pm, [EMAIL PROTECTED] wrote:
> On Apr 1, 3:13 pm, "Ulysse" <[EMAIL PROTECTED]> wrote:
>
> > Hello,
>
> > I'm trying to extract the data from HTML table. Here is the part of
> > the HTML source :
>
> > 
>
> > Do you know the way to do it ?
>
> Beautiful Soup is an easy way to parse HTML (that may be 
> broken).http://www.crummy.com/software/BeautifulSoup/
>
> Here's a start of a parser for your HTML:
>
> soup = BeautifulSoup(txt)
> for tr in soup('tr'):
> dateTd, textTd = tr('td')[1:]
> print 'Date :', dateTd.contents[0].strip()
> print textTd #element still needs parsing
>
> where txt is the string in your message.

I have seen the Beautiful Soup online help and tried to apply that to
my problem. But it seems to be a little bit hard. I will rather try to
do this with regular expressions...

-- 
http://mail.python.org/mailman/listinfo/python-list


Clean "Durty" strings

2007-04-01 Thread Ulysse
Hello,

I need to clean the string like this :

string =
"""
bonne mentalité mec!:) \nbon pour
info moi je suis un serial posteur arceleur dictateur ^^*
\nmais pour avoir des resultats probant il
faut pas faire les mariolles, comme le "fondateur" de bvs
krew \n
mais pour avoir des resultats probant il faut pas faire les mariolles,
comme le "fondateur" de bvs krew \n
"""

into :
bonne mentalité mec!:) bon pour info moi je suis un serial posteur
arceleur dictateur ^^* mais pour avoir des resultats probant il faut
pas faire les mariolles, comme le "fondateur" de bvs krew
mais pour avoir des resultats probant il faut pas faire les mariolles,
comme le "fondateur" de bvs krew

To do this I wold like to use only strandard librairies.

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Launch script on Linux using Putty

2007-04-01 Thread Ulysse
Hello,

I have a python script which runs all the time (using of library
threading). I would like this scipt to run on a remote linux Os using
Putty. The problem is, when I close Putty command line window running
on my Win PC, the python script stops to run too.

I tried to use cron tables instead. By setting the time and restart
cron process, but it's not practical.

Do you know the right way to do this ?

Regards

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Launch script on Linux using Putty

2007-04-02 Thread Ulysse
On Apr 2, 12:56 am, Michael Hoffman <[EMAIL PROTECTED]> wrote:
> Ulysse wrote:
> > Hello,
>
> > I have a python script which runs all the time (using of library
> > threading). I would like this scipt to run on a remote linux Os using
> > Putty. The problem is, when I close Putty command line window running
> > on my Win PC, the python script stops to run too.
>
> > I tried to use cron tables instead. By setting the time and restart
> > cron process, but it's not practical.
>
> > Do you know the right way to do this ?
>
> There are a few ways to do this, in order of easiest to most involved:
>
> 1. The easiest is to run nohup on your script in the background:
>
> $ nohup myscript.py > output.txt 2> error.txt &
>
> Then you can disconnect but your script will keep running. Try man nohup
>   for more information.
>
> 2. Use GNU screen on your remote terminal, and detach the screen instead
> of logging off.
>
> 3. Set up your script to fork as a daemon. Google for ["python cookbook"
> fork daemon] to find a few recipes for this.
> --
> Michael Hoffman

Thanks a lot but in my situation :

1. nohup seems not to be installed on my "reduced linux distribution".
It's a OpenWrt tunning on my WRT54GL Broadband router.

2. I have looked for the way I can "detach the screen" with Putty but
I've not found (May be you can precise ?)

3. The "fork daemon" script found on 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278731
seems to be as huge as my own script and little bit hard to undestand.

So maybe "detach the screen" ?

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Launch script on Linux using Putty

2007-04-02 Thread Ulysse
On Apr 2, 8:54 pm, Michael Hoffman <[EMAIL PROTECTED]> wrote:
> [Michael Hoffman]
>
> >> If you are running bash, you can do this:
>
> [Grant Edwards]
>
> > He's not running bash.  He's running busybox's shell.
>
> There's a nohup applet for busybox.
>
> > [He'd be far better off asking his question in an OpenWRT or
> > Busybox forum, since it's got absolutely nothing to do with
> > Python.]
>
> I was going to say that originally, but then I realized that the daemon
> solution is on-topic. So is masking SIGHUP.
> --
> Michael Hoffman

Actually the "./myscript.py &" command seems to work well. I can close
the Putty console, and then after logging the command "top" show me
that my process is still running.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Extract information from HTML table

2007-04-02 Thread Ulysse
On Apr 2, 9:28 pm, [EMAIL PROTECTED] (Cameron Laird) wrote:
> In article <[EMAIL PROTECTED]>,
>
>
>
> anjesh <[EMAIL PROTECTED]> wrote:
> >On Apr 2, 12:54 am, "Dotan Cohen" <[EMAIL PROTECTED]> wrote:
> >> On 1 Apr 2007 07:56:04 -0700, Ulysse <[EMAIL PROTECTED]> wrote:
>
> >> > I have seen the Beautiful Soup online help and tried to apply that to
> >> > my problem. But it seems to be a little bit hard. I will rather try to
> >> > do this with regular expressions...
>
> >> If you think that Beautiful Soup is difficult than wait till you try
> >> to do this with regexes. Granted you know the exact format of the HTML
> >> you are scraping will help, if you ever need to parse HTML from an
> >> unknown source than Beautiful Soup is the only way to go. Not all HTML
> >> authors close their td and tr tags, and sometimes there are attributes
> >> to those tags. If you plan on ever reusing the code or the format of
> >> the HTML may change, then you are best off sticking with Beautiful
> >> Soup.
>
> >> Dotan Cohen
>
> >>http://lyricslist.com/http://what-is-what.com/
>
> >Have you tried HTMLParser. It can do the task you want to perform
> >http://docs.python.org/lib/module-HTMLParser.html
>
> >-anjesh
>
> Yes, except that these last two follow-ups UNDERstate the difficulty--in
> fact, the impossibility--of achieving adequate results on this problem
> with regular expressions.  We'll help with the documentation for HTMLParser
> and BeautifulSoup.  REs are an invitation to madness.
>
> http://www.unixreview.com/documents/s=10121/ur0702e/> might amuse
> those who want to think more about REs.

r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2})\W*?
\W*?(.*?).*?'

r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2}).*?player\.php.*?>(.*?).*?(.*?)'

r'(\d{2}\.\d{2}\.\d{4} - \d{2}:\d{2}:\d{2})\W*?
\W*?Message au clan de :([a-zA-Z0-9_\-]+?)\W*(.*?)'

These three REs extract all data I need. That not exactly apply to the
given string.
I read the article but I didn't understood why REs are invitation to
madness...

-- 
http://mail.python.org/mailman/listinfo/python-list


Remote Command a Python Script

2007-09-21 Thread Ulysse
Hello,

I've installed Python 2.5 on my WRT54G Linksys Router. On this router
a script is executed. This script write a little Pickle database in
the router memory.

I would like to write another Python script which will be able to :

1. Stop and start the remote script from my Windows Computer. At
present I use Putty to connect to the router by the SSL, then I
manually kill the python process.

2. Retrieve the little database located in router memory and backup it
on my Window PC. At present I use WinSCP (like FTP) to get the pickle
file.

Can you help me with that (modules to use, useful code snippets)

Thank a lot,

Maxime

-- 
http://mail.python.org/mailman/listinfo/python-list


Code to send RSS news by Sendmail

2008-03-15 Thread Ulysse
Hello,

I'm searching a code which allow you to parse each item in the RSS
feed, get the news page of each item, convert it to text and send it
by mail.

Do you know if it exists ?

Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Problems with joining Unicode strings

2008-03-23 Thread Ulysse
Hello,

I have problems with joining strings.

My program get web page fragments, then joins them into one single web
page. I have error when I try to join these fregments :
"UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
208: ordinal not in range(128)"

Here is my code :

# Parcours des RSS
f = open('soupresult.html', 'w')
d = feedparser.parse(rss_url)
separator = ''
data = ''
refresh_time = 2*60*60 #Secondes
resume_news = []

# Parcours des RSS
if len(d['items']) > 0:
now = datetime.datetime.now()
for item in d['items']:
item_date = item.date_parsed
print item_date
item_date2 = datetime.datetime(item_date[0], item_date[1],
item_date[2], item_date[3], item_date[4], item_date[5])
age_item = now - item_date2
age_item = age_item.days*24*3600 + age_item.seconds

if age_item < refresh_time:
url = item['link']
print item.title
try:
req = urllib2.Request(url)
browser = urllib2.urlopen(req)
data = browser.read()
clean_item = data
resume_news.append(item.title)
resume_news.append(clean_item)

except urllib2.URLError, e:
print e.code
f.write(u''.join(resume_news))
-- 
http://mail.python.org/mailman/listinfo/python-list