Re: FTP example going through a FTP Proxy

2009-01-07 Thread jakecjacobson
On Jan 7, 12:32 pm, jakecjacobson  wrote:
> Hi,
>
> I need to write a simple Python script that I can connect to a FTP
> server and download files from the server to my local box.  I am
> required to go through a FTP Proxy and I don't see any examples on how
> to do this.  The FTP proxy doesn't require username or password to
> connect but the FTP server that I am connecting to does.
>
> Any examples on how to do this would be greatly appreciated.  I am
> limited to using Python version 2.4.3 on a Linux box.

This is what I have tried so far,

import urllib

proxies = {'ftp':'ftp://proxy_server:21'}
ftp_server = 'ftp.somecompany.com'
ftp_port='21'
username = ''
password = 'secretPW'

ftp_string='ftp://' + username + '@' + password + ftp_server + ':' +
ftp_port


data = urllib.urlopen(ftp_string, proxies=proxies)

data=urllib.urlopen(req).read()

print data

I get the following error:

Traceback (most recent call last):
  File "./ftptest.py", line 22, in ?
data = urllib.urlopen(ftp_server, proxies=proxies)
  File "/usr/lib/python2.4/urllib.py", line 82, in urlopen
return opener.open(url)
  File "/usr/lib/python2.4/urllib.py", line 190, in open
return getattr(self, name)(url)
  File "/usr/lib/python2.4/urllib.py", line 470, in open_ftp
host, path = splithost(url)
  File "/usr/lib/python2.4/urllib.py", line 949, in splithost
match = _hostprog.match(url)
TypeError: expected string or buffer
--
http://mail.python.org/mailman/listinfo/python-list


Re: FTP example going through a FTP Proxy

2009-01-07 Thread jakecjacobson
On Jan 7, 2:11 pm, jakecjacobson  wrote:
> On Jan 7, 12:32 pm, jakecjacobson  wrote:
>
> > Hi,
>
> > I need to write a simple Python script that I can connect to a FTP
> > server and download files from the server to my local box.  I am
> > required to go through a FTP Proxy and I don't see any examples on how
> > to do this.  The FTP proxy doesn't require username or password to
> > connect but the FTP server that I am connecting to does.
>
> > Any examples on how to do this would be greatly appreciated.  I am
> > limited to using Python version 2.4.3 on a Linux box.
>
> This is what I have tried so far,
>
> import urllib
>
> proxies = {'ftp':'ftp://proxy_server:21'}
> ftp_server = 'ftp.somecompany.com'
> ftp_port='21'
> username = ''
> password = 'secretPW'
>
> ftp_string='ftp://' + username + '@' + password + ftp_server + ':' +
> ftp_port
>
> data = urllib.urlopen(ftp_string, proxies=proxies)
>
> data=urllib.urlopen(req).read()
>
> print data
>
> I get the following error:
>
> Traceback (most recent call last):
>   File "./ftptest.py", line 22, in ?
>     data = urllib.urlopen(ftp_server, proxies=proxies)
>   File "/usr/lib/python2.4/urllib.py", line 82, in urlopen
>     return opener.open(url)
>   File "/usr/lib/python2.4/urllib.py", line 190, in open
>     return getattr(self, name)(url)
>   File "/usr/lib/python2.4/urllib.py", line 470, in open_ftp
>     host, path = splithost(url)
>   File "/usr/lib/python2.4/urllib.py", line 949, in splithost
>     match = _hostprog.match(url)
> TypeError: expected string or buffer

I might be getting closer.  Now I am getting "I/O error(ftp error):
(111, 'Connection refused')" error with the following code:

import urllib2

proxies = {'ftp':'ftp://proxy_server:21'}
ftp_server = 'ftp.somecompany.com'
ftp_port='21'
username = ''
password = 'secretPW'

password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
top_level_url = ftp_server
password_mgr.add_password(None, top_level_url, username, password)

proxy_support = urllib2.ProxyHandler(proxies)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(proxy_support)
opener = urllib2.build_opener(handler)
a_url = 'ftp://' + ftp_server + ':' + ftp_port + '/'
print a_url

try:
data = opener.open(a_url)
print data
except IOError, (errno, strerror):
print "I/O error(%s): %s" % (errno, strerror)
--
http://mail.python.org/mailman/listinfo/python-list


FTP example going through a FTP Proxy

2009-01-08 Thread jakecjacobson
Hi,

I need to write a simple Python script that I can connect to a FTP
server and download files from the server to my local box.  I am
required to go through a FTP Proxy and I don't see any examples on how
to do this.  The FTP proxy doesn't require username or password to
connect but the FTP server that I am connecting to does.

Any examples on how to do this would be greatly appreciated.  I am
limited to using Python version 2.4.3 on a Linux box.
--
http://mail.python.org/mailman/listinfo/python-list


Re: FTP example going through a FTP Proxy

2009-01-08 Thread jakecjacobson
On Jan 7, 3:56 pm, jakecjacobson  wrote:
> On Jan 7, 2:11 pm, jakecjacobson  wrote:
>
>
>
> > On Jan 7, 12:32 pm, jakecjacobson  wrote:
>
> > > Hi,
>
> > > I need to write a simple Python script that I can connect to a FTP
> > > server and download files from the server to my local box.  I am
> > > required to go through a FTP Proxy and I don't see any examples on how
> > > to do this.  The FTP proxy doesn't require username or password to
> > > connect but the FTP server that I am connecting to does.
>
> > > Any examples on how to do this would be greatly appreciated.  I am
> > > limited to using Python version 2.4.3 on a Linux box.
>
> > This is what I have tried so far,
>
> > import urllib
>
> > proxies = {'ftp':'ftp://proxy_server:21'}
> > ftp_server = 'ftp.somecompany.com'
> > ftp_port='21'
> > username = ''
> > password = 'secretPW'
>
> > ftp_string='ftp://' + username + '@' + password + ftp_server + ':' +
> > ftp_port
>
> > data = urllib.urlopen(ftp_string, proxies=proxies)
>
> > data=urllib.urlopen(req).read()
>
> > print data
>
> > I get the following error:
>
> > Traceback (most recent call last):
> >   File "./ftptest.py", line 22, in ?
> >     data = urllib.urlopen(ftp_server, proxies=proxies)
> >   File "/usr/lib/python2.4/urllib.py", line 82, in urlopen
> >     return opener.open(url)
> >   File "/usr/lib/python2.4/urllib.py", line 190, in open
> >     return getattr(self, name)(url)
> >   File "/usr/lib/python2.4/urllib.py", line 470, in open_ftp
> >     host, path = splithost(url)
> >   File "/usr/lib/python2.4/urllib.py", line 949, in splithost
> >     match = _hostprog.match(url)
> > TypeError: expected string or buffer
>
> I might be getting closer.  Now I am getting "I/O error(ftp error):
> (111, 'Connection refused')" error with the following code:
>
> import urllib2
>
> proxies = {'ftp':'ftp://proxy_server:21'}
> ftp_server = 'ftp.somecompany.com'
> ftp_port='21'
> username = ''
> password = 'secretPW'
>
> password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
> top_level_url = ftp_server
> password_mgr.add_password(None, top_level_url, username, password)
>
> proxy_support = urllib2.ProxyHandler(proxies)
> handler = urllib2.HTTPBasicAuthHandler(password_mgr)
> opener = urllib2.build_opener(proxy_support)
> opener = urllib2.build_opener(handler)
> a_url = 'ftp://' + ftp_server + ':' + ftp_port + '/'
> print a_url
>
> try:
>         data = opener.open(a_url)
>         print data
> except IOError, (errno, strerror):
>         print "I/O error(%s): %s" % (errno, strerror)

I tried the same code from a different box and got a different error
message:

I/O error(ftp error): 501 USER format: proxy-user:auth-
met...@destination.  Closing connection.

My guess is that my original box couldn't connect with the firewall
proxy so I was getting a connection refused error.  Now it appears
that the password mgr has an issue if I understand the error
correctly.  I really hope that someone out in the Python Community can
give me a pointer.
--
http://mail.python.org/mailman/listinfo/python-list


Getting/Setting HTTP Headers

2008-09-17 Thread jakecjacobson
I need to write a feed parser that takes a url for any Atom or RSS
feed and transform it into an Atom feed.  I done the transformation
part but I want to support conditional HTTP requests.  I have not been
able to find any examples that show:

1.  How to read the Last_Modified or ETag header value from the
requester
2.  How to set the corresponding HTTP header value, either a 302 not
modified or the new Last_Modified date and/or ETag values
--
http://mail.python.org/mailman/listinfo/python-list


Processing XML File

2010-01-29 Thread jakecjacobson
I need to take a XML web resource and split it up into smaller XML
files.  I am able to retrieve the web resource but I can't find any
good XML examples.  I am just learning Python so forgive me if this
question has been answered many times in the past.

My resource is like:


 ...
 ...


 ...
 ...


So in this example, I would need to output 2 files with the contents
of each file what is between the open and close document tag.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Processing XML File

2010-01-29 Thread jakecjacobson
On Jan 29, 1:04 pm, Adam Tauno Williams 
wrote:
> On Fri, 2010-01-29 at 09:25 -0800, jakecjacobson wrote:
> > I need to take a XML web resource and split it up into smaller XML
> > files.  I am able to retrieve the web resource but I can't find any
> > good XML examples.  I am just learning Python so forgive me if this
> > question has been answered many times in the past.
> > My resource is like:
> > 
> >      ...
> >      ...
> > 
> > 
> >      ...
> >      ...
> > 
> > So in this example, I would need to output 2 files with the contents
> > of each file what is between the open and close document tag.
>
> Do you want to parse the document or SaX?
>
> I have a SaX example at
> <http://coils.hg.sourceforge.net/hgweb/coils/coils/file/99b227b08f7f/s...>

Thanks but I am way over my head with XML, Python.  I am working with
DDMS and need to output the individual resource nodes to their own
file.  I hope that this helps and I need a good example and how to use
it.

Here is what a resource node looks like:
  https://metadata.dod.mil/mdr/ns/DDMS/1.4/
https://metadata.dod.mil/mdr/ns/DDMS/1.4/";
xmlns:ddms="https://metadata.dod.mil/mdr/ns/DDMS/1.4/";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xmlns:ICISM="urn:us:gov:ic:ism:v2">

https://metadata.dod.mil/mdr/
ns/MDR/1.0/MDR.owl#GovernanceNamespace" ddms:value="TBD"/>

Sample Taxonomy

  This is a sample taxonomy created for the Help page.



  
Sample
Developer
FGM, Inc.
703-885-1000
sampledevelo...@fgm.com
  



  

You can see the DDMS site at https://metadata.dod.mil/.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Processing XML File

2010-02-01 Thread jakecjacobson
On Jan 29, 2:41 pm, Stefan Behnel  wrote:
> Sells, Fred, 29.01.2010 20:31:
>
> > Google is your friend.  Elementtree is one of the better documented
> > IMHO, but there are many modules to do this.
>
> Unless the OP provides some more information, "do this" is rather
> underdefined. And sending someone off to Google who is just learning the
> basics of Python and XML and trying to solve a very specific problem with
> them is not exactly the spirit I'm used to in this newsgroup.
>
> Stefan

Just want to thank everyone for their posts.  I got it working after I
discovered a name space issue with this code.

xmlDoc = libxml2.parseDoc(guts)
# Ignore namespace and just get the Resource
resourceNodes = xmlDoc.xpathEval('//*[local-name()="Resource"]')
for rNode in resourceNodes:
print rNode
-- 
http://mail.python.org/mailman/listinfo/python-list


Authenticating to web service using https and client certificate

2009-06-23 Thread jakecjacobson
Hi,

I need to post some XML files to a web client that requires a client
certificate to authenticate.  I have some code that works on posting a
multipart form over http but I need to modify it to pass the proper
certificate and post the XML file.  Is there any example code that
will point me in the correct direction?  Thanks for your help.
-- 
http://mail.python.org/mailman/listinfo/python-list


exceptions.TypeError an integer is required

2009-07-24 Thread jakecjacobson
I am trying to do a post to a REST API over HTTPS and requires the
script to pass a cert to the server.  I am getting
"exceptions.TypeError an integer is required" error and can't find the
reason.  I commenting out the lines of code, it is happening on the
connection.request() line.  Here is the problem code.  Would love some
help if possible.

head = {"Content-Type" : "application/x-www-form-urlencoded",
"Accept" : "text/plain"}
parameters = urlencode({"collection" : collection, "entryxml" : open
(file,'r').read()})
try:
connection = httplib.HTTPSConnection(host, port, key_file,
cert_file)
connection.request('POST', path, parameters, head)
response = connection.getresponse()
print response.status, response.reason
except:
print sys.exc_type, sys.exc_value

connection.close()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: exceptions.TypeError an integer is required

2009-07-27 Thread jakecjacobson
On Jul 24, 3:11 pm, Steven D'Aprano  wrote:
> On Fri, 24 Jul 2009 11:24:58 -0700, jakecjacobson wrote:
> > I am trying to do a post to a REST API over HTTPS and requires the
> > script to pass a cert to the server.  I am getting "exceptions.TypeError
> > an integer is required" error and can't find the reason.  I commenting
> > out the lines of code, it is happening on the connection.request() line.
> >  Here is the problem code.  Would love some help if possible.
>
> Please post the traceback that you get.
>
> My guess is that you are passing a string instead of an integer, probably
> for the port.
>
> [...]
>
> >    except:
> >            print sys.exc_type, sys.exc_value
>
> As a general rule, a bare except of that fashion is bad practice. Unless
> you can explain why it is normally bad practice, *and* why your case is
> an exception (no pun intended) to the rule "never use bare except
> clauses", I suggest you either:
>
> * replace "except:" with "except Exception:" instead.
>
> * better still, re-write the entire try block as:
>
>     try:
>         [code goes here]
>     finally:
>         connection.close()
>
> and use the Python error-reporting mechanism instead of defeating it.
>
> --
> Steven

Steven,

You are quite correct in your statements.  My goal was not to make
great code but something that I could quickly test.  My assumption was
that the httplib.HTTPSConnection() would do the cast to int for me.
As soon as I cast it to an int, I was able to get past that issue.

Still not able to post because I am getting a bad cert error.

Jake Jacobson
-- 
http://mail.python.org/mailman/listinfo/python-list


bad certificate error

2009-07-27 Thread jakecjacobson
Hi,

I am getting the following error when doing a post to REST API,

Enter PEM pass phrase:
Traceback (most recent call last):
  File "./ices_catalog_feeder.py", line 193, in ?
main(sys.argv[1])
  File "./ices_catalog_feeder.py", line 60, in main
post2Catalog(catalog_host, catalog_port, catalog_path, os.path.join
(input_dir, file), collection_name, key_file, cert_file)
  File "./ices_catalog_feeder.py", line 125, in post2Catalog
connection.request('POST', path, parameters, head)
  File "/usr/lib/python2.4/httplib.py", line 810, in request
self._send_request(method, url, body, headers)
  File "/usr/lib/python2.4/httplib.py", line 833, in _send_request
self.endheaders()
  File "/usr/lib/python2.4/httplib.py", line 804, in endheaders
self._send_output()
  File "/usr/lib/python2.4/httplib.py", line 685, in _send_output
self.send(msg)
  File "/usr/lib/python2.4/httplib.py", line 652, in send
self.connect()
  File "/usr/lib/python2.4/httplib.py", line 1079, in connect
ssl = socket.ssl(sock, self.key_file, self.cert_file)
  File "/usr/lib/python2.4/socket.py", line 74, in ssl
return _realssl(sock, keyfile, certfile)
socket.sslerror: (1, 'error:14094412:SSL
routines:SSL3_READ_BYTES:sslv3 alert bad certificate')


My code where this error occurs is:

head = {"Content-Type" : "application/x-www-form-urlencoded",
"Accept" : "text/plain"}
parameters = urlencode({"collection" : collection, "entryxml" : open
(file,'r').read()})
print "Sending the file to: " + host

try:
try:
# Default port is 443.
# key_file is the name of a PEM formatted file that contains 
your
private key.
# cert_file is a PEM formatted certificate chain file.
connection = httplib.HTTPSConnection(host, int(port), key_file,
cert_file)
connection.request('POST', path, parameters, head)
response = connection.getresponse()
print response.status, response.reason
except httplib.error, (value,message):
print value + ':' + message
finally:
connection.close()

I was wondering if this is due to the server having a invalid server
cert?  If I go to this server in my browser, I get a "This server
tried to identify itself with invalid information".  Is there a way to
ignore this issue with Python?  Can I setup a trust store and add this
server to the trust store?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: bad certificate error

2009-07-27 Thread jakecjacobson
On Jul 27, 2:23 pm, "Gabriel Genellina" 
wrote:
> En Mon, 27 Jul 2009 12:57:40 -0300, jakecjacobson  
>  escribió:
>
> > I was wondering if this is due to the server having a invalid server
> > cert?  If I go to this server in my browser, I get a "This server
> > tried to identify itself with invalid information".  Is there a way to
> > ignore this issue with Python?  Can I setup a trust store and add this
> > server to the trust store?
>
> I don't see the point in trusting someone that you know is telling lies  
> about itself.
>
> --
> Gabriel Genellina

It is a test box that the team I am on runs.  That is why I would
trust it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: bad certificate error

2009-07-28 Thread jakecjacobson
On Jul 28, 3:29 am, Nick Craig-Wood  wrote:
> jakecjacobson  wrote:
> >  I am getting the following error when doing a post to REST API,
>
> >  Enter PEM pass phrase:
> >  Traceback (most recent call last):
> >    File "./ices_catalog_feeder.py", line 193, in ?
> >      main(sys.argv[1])
> >    File "./ices_catalog_feeder.py", line 60, in main
> >      post2Catalog(catalog_host, catalog_port, catalog_path, os.path.join
> >  (input_dir, file), collection_name, key_file, cert_file)
> >    File "./ices_catalog_feeder.py", line 125, in post2Catalog
> >      connection.request('POST', path, parameters, head)
> >    File "/usr/lib/python2.4/httplib.py", line 810, in request
> >      self._send_request(method, url, body, headers)
> >    File "/usr/lib/python2.4/httplib.py", line 833, in _send_request
> >      self.endheaders()
> >    File "/usr/lib/python2.4/httplib.py", line 804, in endheaders
> >      self._send_output()
> >    File "/usr/lib/python2.4/httplib.py", line 685, in _send_output
> >      self.send(msg)
> >    File "/usr/lib/python2.4/httplib.py", line 652, in send
> >      self.connect()
> >    File "/usr/lib/python2.4/httplib.py", line 1079, in connect
> >      ssl = socket.ssl(sock, self.key_file, self.cert_file)
> >    File "/usr/lib/python2.4/socket.py", line 74, in ssl
> >      return _realssl(sock, keyfile, certfile)
> >  socket.sslerror: (1, 'error:14094412:SSL
> >  routines:SSL3_READ_BYTES:sslv3 alert bad certificate')
>
> >  My code where this error occurs is:
>
> >  head = {"Content-Type" : "application/x-www-form-urlencoded",
> >  "Accept" : "text/plain"}
> >  parameters = urlencode({"collection" : collection, "entryxml" : open
> >  (file,'r').read()})
> >  print "Sending the file to: " + host
>
> >  try:
> >    try:
> >            # Default port is 443.
> >            # key_file is the name of a PEM formatted file that contains your
> >  private key.
> >            # cert_file is a PEM formatted certificate chain file.
> >            connection = httplib.HTTPSConnection(host, int(port), key_file,
> >  cert_file)
> >            connection.request('POST', path, parameters, head)
> >            response = connection.getresponse()
> >            print response.status, response.reason
> >    except httplib.error, (value,message):
> >            print value + ':' + message
> >  finally:
> >    connection.close()
>
> >  I was wondering if this is due to the server having a invalid server
> >  cert?
>
> I'd say judging from the traceback you messed up key_file or cert_file
> somehow.
>
> Try using the openssl binary on them (read the man page to see how!)
> to check them out.
>
> >  If I go to this server in my browser, I get a "This server tried to
> >  identify itself with invalid information".  Is there a way to
> >  ignore this issue with Python?  Can I setup a trust store and add
> >  this server to the trust store?
>
> Invalid how?  Self signed certificate? Domain mismatch? Expired certificate?
>
> --
> Nick Craig-Wood  --http://www.craig-wood.com/nick

Nick,

Thanks for the help on this.  I will check my steps on openssl again
and see if I messed up.  What I tried to do was:
1.  Save my PKI cert to disk.  It was saved as a P12 file
2.  Use openssl to convert it to the needed .pem file type
3.  Saved the CA that my cert was signed by as a .crt file

These are the 2 files that I was using for key_file and
 * cert_file -> CA
 * key_file -> my PKI cert converted to a .pem file

"Invalid how?  Self signed certificate? Domain mismatch? Expired
certificate?"  It is a server name mismatch.

For everyone that wants to discuss why we shouldn't do this, great but
I can't change the fact that I need to do this.  I can't use http or
even get a correct cert at this time.  This is a quick a dirty project
to demonstrate capability.  I need something more than slide show
briefs.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: bad certificate error

2009-07-28 Thread jakecjacobson
On Jul 28, 9:48 am, Jean-Paul Calderone  wrote:
> On Tue, 28 Jul 2009 03:35:55 -0700 (PDT), jakecjacobson 
>  wrote:
> > [snip]
>
> >"Invalid how?  Self signed certificate? Domain mismatch? Expired
> >certificate?"  It is a server name mismatch.
>
> Python 2.4 is not capable of allowing you to customize this verification
> behavior.  It is hard coded to let OpenSSL make the decision about whether
> to accept the certificate or not.
>
> Either M2Crypto or pyOpenSSL will let you ignore verification errors.  The
> new ssl module in Python 2.6 may also as well.
>
> Jean-Paul

Thanks, I will look into these suggestions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: bad certificate error

2009-07-29 Thread jakecjacobson
On Jul 29, 2:08 am, "Gabriel Genellina" 
wrote:
> En Tue, 28 Jul 2009 09:02:40 -0300, Steven D'Aprano  
>  escribió:
>
>
>
> > On Mon, 27 Jul 2009 23:16:39 -0300, Gabriel Genellina wrote:
>
> >> I don't see the point on "fixing" either the Python script or httplib to
> >> accomodate for an invalid server certificate... If it's just for
> >> internal testing, I'd use HTTP instead (at least until the certificate
> >> is fixed).
>
> > In real life, sometimes you need to drive with bad brakes on your car,
> > walk down dark alleys in the bad part of town, climb a tree without a
> > safety line, and use a hammer without wearing goggles. We can do all
> > these things.
>
> > The OP has said that, for whatever reason, he needs to ignore a bad
> > server certificate when connecting to HTTPS. Python is a language where
> > developers are allowed to shoot themselves in the foot, so long as they
> > do so in full knowledge of what they're doing.
>
> > So, putting aside all the millions of reasons why the OP shouldn't accept
> > an invalid certificate, how can he accept an invalid certificate?
>
> Yes, I understand the situation, but I'm afraid there is no way (that I  
> know of). At least not without patching _ssl.c; all the SSL negotiation is  
> handled by the OpenSSL library itself.
>
> I vaguely remember a pure Python SSL implementation somewhere that perhaps  
> could be hacked to bypass all controls. But making it work properly will  
> probably require a lot more effort than installing a self signed  
> certificate in the server...
>
> --
> Gabriel Genellina

I have it working and I want to thank everyone for their efforts and
very helpful hints.  The error was with me and not understanding the
documentation about the cert_file & key_file.  After using openssl to
divide up my p12 file into a cert file and a key file using the
instructions 
http://security.ncsa.uiuc.edu/research/grid-howtos/usefulopenssl.php.
I got everything working.

Again, much thanks.

Jake
-- 
http://mail.python.org/mailman/listinfo/python-list


Help making this script better

2009-08-06 Thread jakecjacobson
Hi,

After much Google searching and trial & error, I was able to write a
Python script that posts XML files to a REST API using HTTPS and
passing PEM cert & key file.  It seems to be working but would like
some pointers on how to handle errors.  I am using Python 2.4, I don't
have the capability to upgrade even though I would like to.  I am very
new to Python so help will be greatly appreciated and I hope others
can use this script.

#!/usr/bin/python
#
# catalog_feeder.py
#
# This sciript will process a directory of XML files and push them to
the Enterprise Catalog.
#  You configure this script by using a configuration file that
describes the required variables.
#  The path to this file is either passed into the script as a command
line argument or hard coded
#  in the script.  The script will terminate with an error if it can't
process the XML file.
#

# IMPORT STATEMENTS
import httplib
import mimetypes
import os
import sys
import shutil
import time
from urllib import *
from time import strftime
from xml.dom import minidom

def main(c):
start_time = time.time()
# Set configuration parameters
try:
# Process the XML conf file 
xmldoc = minidom.parse(c)
catalog_host = readConfFile(xmldoc, 'catalog_host')
catalog_port = int(readConfFile(xmldoc, 'catalog_port'))
catalog_path = readConfFile(xmldoc, 'catalog_path')
collection_name = readConfFile(xmldoc, 'collection_name')
cert_file = readConfFile(xmldoc, 'cert_file')
key_file = readConfFile(xmldoc, 'key_file')
log_file = readConfFile(xmldoc, 'log_file')
input_dir = readConfFile(xmldoc, 'input_dir')
archive_dir = readConfFile(xmldoc, 'archive_dir')
hold_dir = readConfFile(xmldoc, 'hold_dir')
except Exception, inst:
# I had an error so report it and exit script
print "Unexpected error opening %s: %s" % (c, inst)
sys.exit(1)
# Log Starting
logOut = verifyLogging(log_file)
if logOut:
log(logOut, "Processing Started ...")
# Get list of XML files to process
if os.path.exists(input_dir):
files = getFiles2Post(input_dir)
else:
if logOut:
log(logOut, "WARNING!!! Couldn't find input directory: 
" +
input_dir)
cleanup(logOut)
else:
print "Dir doen't exist: " + input_dir
sys.exit(1)
try:
# Process each file to the catalog
connection = httplib.HTTPSConnection(catalog_host, catalog_port,
key_file, cert_file)
for file in files:
log(logOut, "Processing " + file + " ...")
try:
response = post2Catalog(connection, 
catalog_path, os.path.join
(input_dir, file), collection_name)
if response.status == 200:
msg = "Succesfully posted " +  file + " 
to cataloge ..."
print msg
log(logOut, msg)
# Move file to done directory
shutil.move(os.path.join(input_dir, 
file), os.path.join
(archive_dir, file))
else:
msg = "Error posting " +  file + " to 
cataloge [" + response.read
() + "] ..."
print msg
log(logOut, response.read())
# Move file to error dir
shutil.move(os.path.join(input_dir, 
file), os.path.join(hold_dir,
file))
except IOError, (errno):
print "%s" % (errno)

except httplib.HTTPException, (e):
print "Unexpected error %s " % (e)

run_time = time.time() - start_time
print 'Run time: %f seconds' % run_time

# Clean up
connection.close()
cleanup(logOut)

# Get an arry of files from the input_dir
def getFiles2Post(d):
return (os.listdir(d))

# Read out the conf file and set the needed global variable
def readConfFile(xmldoc, tag):
return (xmldoc.getElementsByTagName(tag)[0].firstChild.data)

# Write out the message to log file
def log(f, m):
f.write(strftime("%Y-%m-%d %H:%M:%S") + " : " + m + '\n')

# Clean up and exit
def cleanup(logOut):
if logOut:
log(logOut, "Proce

How to unencode a string

2009-08-27 Thread jakecjacobson
This seems like a real simple newbie question but how can a person
unencode a string?  In Perl I use something like: "$part=~ s/\%([A-Fa-
f0-9]{2})/pack('C', hex($1))/seg;"

If I have a string like Word1%20Word2%20Word3 I want to get Word1
Word2 Word3.  Would also like to handle special characters like '",(){}
[] etc/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to unencode a string

2009-08-28 Thread jakecjacobson
On Aug 27, 6:51 pm, Piet van Oostrum  wrote:
> >>>>> jakecjacobson  (j) wrote:
> >j> This seems like a real simple newbie question but how can a person
> >j> unencode a string?  In Perl I use something like: "$part=~ s/\%([A-Fa-
> >j> f0-9]{2})/pack('C', hex($1))/seg;"
> >j> If I have a string like Word1%20Word2%20Word3 I want to get Word1
> >j> Word2 Word3.  
>
> urllib.unquote(string)
>
> >j> Would also like to handle special characters like '",(){}
> >j> [] etc/
>
> What would you like to do with them? Or do you mean to replace %27 by ' etc?
> --
> Piet van Oostrum 
> URL:http://pietvanoostrum.com[PGP 8DAE142BE17999C4]
> Private email: p...@vanoostrum.org

Yes, take '%27' and replace with ', etc.
-- 
http://mail.python.org/mailman/listinfo/python-list


How to Convert IO Stream to XML Document

2010-09-10 Thread jakecjacobson
I am trying to build a Python script that reads a Sitemap file and
push the URLs to a Google Search Appliance.  I am able to fetch the
XML document and parse it with regular expressions but I want to move
to using native XML tools to do this.  The problem I am getting is if
I use urllib.urlopen(url) I can convert the IO Stream to a XML
document but if I use urllib2.urlopen and then read the response, I
get the content but when I use minidom.parse() I get a "IOError:
[Errno 2] No such file or directory:" error

THIS WORKS but will have issues if the IO Stream is a compressed file
def GetPageGuts(net, url):
pageguts = urllib.urlopen(url)
xmldoc = minidom.parse(pageguts)
return xmldoc

# THIS DOESN'T WORK, but I don't understand why
def GetPageGuts(net, url):
request=getRequest_obj(net, url)
response = urllib2.urlopen(request)
response.headers.items()
pageguts = response.read()
# Test to see if the response is a gzip/compressed data stream
if isCompressedFile(response, url):
compressedstream = StringIO.StringIO(pageguts)
gzipper = gzip.GzipFile(fileobj = compressedstream)
pageguts = gzipper.read()
xmldoc = minidom.parse(pageguts)
response.close()
return xmldoc

# I am getting the following error
Starting SiteMap Manager ...
Traceback (most recent call last):
  File "./tester.py", line 267, in ?
main()
  File "./tester.py", line 49, in main
fetchSiteMap(ResourceDict, line)
  File "./tester.py", line 65, in fetchSiteMap
pageguts = GetPageGuts(ResourceDict['NET'], url)
  File "./tester.py", line 89, in GetPageGuts
xmldoc = minidom.parse(pageguts)
  File "/usr/lib/python2.4/xml/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
  File "/usr/lib/python2.4/xml/dom/expatbuilder.py", line 922, in
parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: '\nhttp://www.sitemaps.org/
schemas/sitemap/0.9">\n\nhttp://www.myorg.org/janes/
sitemaps/binder_sitemap.xml\n2010-09-09\n\n\nhttp://www.myorg.org/janes/sitemaps/
dir_sitemap.xml\n2010-05-05\n
\n\nhttp://www.myorg.org/janes/sitemaps/
mags_sitemap.xml\n2010-09-09\n
\n\nhttp://www.myorg.org/janes/sitemaps/
news_sitemap.xml\n2010-09-09\n
\n\nhttp://www.myorg.org/janes/sitemaps/
sent_sitemap.xml\n2010-09-09\n
\n\nhttp://www.myorg.org/janes/sitemaps/
srep_sitemap.xml\n2001-05-04\n
\n\nhttp://www.myorg.org/janes/sitemaps/yb_sitemap.xml\n2010-09-09\n\n\n'

# A couple of supporting things
def getRequest_obj(net, url):
request = urllib2.Request(url)
request.add_header('User-Agent', 'ICES Sitemap Bot dni-ices-
searchad...@ugov.gov')
request.add_header('Accept-encoding', 'gzip')
return request

def isCompressedFile(r, u):
answer=False
if r.headers.has_key('Content-encoding'):
answer=True
else:
# Check to see if the URL ends in .gz
if u.endswith(".gz"):
answer=True
return answer

-- 
http://mail.python.org/mailman/listinfo/python-list