Re: Performance of Python 3

2009-03-01 Thread Stefan Behnel
Paul Rubin wrote: > Steve Holden writes: >> I'm not sure what you think the speed of Ruby has to do with Python. > > In the real world, people care about the relative speed of programs. Fine, but the Shootout on Alioth isn't a particularly pythonic one. It deals almost exclusively with computati

Re: Performance of Python 3

2009-03-01 Thread Stefan Behnel
Isaac Gouy wrote: > On Mar 1, 8:10 am, Stefan Behnel wrote: >> As long as that gives you improvements of >> 100-1000 times almost for free, I wouldn't bother too much with changing >> the platform just because someone shows me benchmark results of some code >> th

Re: Performance of Python 3

2009-03-02 Thread Stefan Behnel
Isaac Gouy wrote: > On Mar 1, 11:24 am, Stefan Behnel wrote: >> Isaac Gouy wrote: >>> On Mar 1, 8:10 am, Stefan Behnel wrote: >>>> As long as that gives you improvements of >>>> 100-1000 times almost for free, I wouldn't bother too much with changing

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-07 Thread Stefan Behnel
rpar...@gmail.com wrote: > I am trying to process an xml file that contains unicode characters > (see http://vyakarnam.wordpress.com/). Wordpress allows exporting the > entire content of the website into an xml file. Using > xml.dom.minidom, I wrote a few lines of python code to parse out the > xm

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Stefan Behnel
Martin v. Löwis wrote: >> Regarding minidom, you might be happier with the xml.etree package that >> comes with Python2.5 and later (it's also avalable for older versions). >> It's a lot easier to use, more memory friendly and also much faster. > > OTOH, choice of XML library is completely irrelev

comparing (c)ElementTree and minidom (was: Parsing unicode (devanagari) text with xml.dom.minidom)

2009-03-08 Thread Stefan Behnel
Martin v. Löwis wrote: >> The background was parsing the XML dump of an entire web site, which I >> would expect to be larger than what minidom is designed to handle >> gracefully. Switching to cElementTree before major code gets written is >> almost certainly a good idea here. > > I think minidom

Re: where is the PyString_AsString in Python 3.0?

2009-03-08 Thread Stefan Behnel
BigHand wrote: > I know that there is no PyString_AsString in Python3.0, > could you guys give me instruction about how can I do with the > following ? > > PyObject *exc_type = NULL, *exc_value = NULL, *exc_tb = NULL; > PyErr_Fetch(&exc_type, &exc_value, &exc_tb); > > how do I transfer the exc_ty

Re: where is the PyString_AsString in Python 3.0?

2009-03-08 Thread Stefan Behnel
BigHand wrote: > Finally I got the results now. This did take me 10 hours to solve > this. the docs of 3.0.. You will have to get used to Unicode. The code you used against the C-API mimics almost exactly the steps you'd use at the Python level. Stefan -- http://mail.python.org/mailman/listin

Re: xml input sanitizing method in standard lib?

2009-03-10 Thread Stefan Behnel
Gabriel Genellina wrote: > En Mon, 09 Mar 2009 15:30:31 -0200, Petr Muller escribió: > >> Thanks for response and sorry for I wasn't clear first time. I have a >> heap of data (logs), from which I build a XML document using >> xml.dom.minidom. In this data, some xml invalid characters may occur -

Re: can python import class or module directly from a zip package

2009-03-10 Thread Stefan Behnel
Flank wrote: > can python import class or module directly from a zip package Yes, just put the .zip file into your PYTHONPATH. Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: ElementTree: How to return only unicode?

2009-03-14 Thread Stefan Behnel
Torsten Bronger wrote: > I parse an XML file with ElementTree and get the contets with > the .attrib, .text, .get etc methods of the tree's nodes. > Additionally, I use the "find" and "findtext" methods. > > My problem is that if there is only ASCII, these methods return > ordinary strings instead

Re: ElementTree: How to return only unicode?

2009-03-15 Thread Stefan Behnel
Torsten Bronger wrote: > Hallöchen! und zurück! > Stefan Behnel writes: > >> Torsten Bronger wrote: >> >>> [...] >>> >>> My problem is that if there is only ASCII, these methods return >>> ordinary strings instead of unicode. So s

Re: Regarding the lxml import error only in a web-request

2009-03-20 Thread Stefan Behnel
nagraj wrote: > I'm trying to run Django from Apache using FastCGI in a shared hosting > environment on DH. I've installed python 2.5.2 onto my home > environment. And all the necessary libraries including lxml 2.1.3, > libxml2, libxslt, flup, etc. > > > I'm facing a strange issue with lxml, whic

Re: Downloading binary files - Python3

2009-03-21 Thread Stefan Behnel
Anders Eriksson wrote: > I have made a short program that given an url will download all referenced > files on that url. > > It works, but I'm thinking it could use some optimization since it's very > slow. What's slow about it? Is downloading each file slow, is it the overhead of connecting to t

[ANN] lxml 2.2 released

2009-03-21 Thread Stefan Behnel
Hi all, I'm proud to announce the release of lxml 2.2 final. http://codespeak.net/lxml/ http://pypi.python.org/pypi/lxml/2.2 Changelog: http://codespeak.net/lxml/changes-2.2.html What is lxml? == lxml is the most feature-rich and easy-to-use library for working with XML and HTML in

Re: [ANN] lxml 2.2 released

2009-03-21 Thread Stefan Behnel
pyt...@bdurham.com wrote: > Is it possible to use the same install of lxml across multiple versions > of Python, eg. I have 2.4, 2.5, 2.6, and 3.0 installed on my workstation > - can I use a single copy of lmxl for 4 versions of Python? It would be interesting to have some more information about y

Re: iteration without storing a variable

2009-03-25 Thread Stefan Behnel
Josh Dukes wrote: > $ time python -c 'a = "A"; > for r in xrange(10): a += "A" ' > > real 0m0.109s > user 0m0.100s > sys 0m0.010s > > Anyone get different results? Sure: $ time python -c 'a = "A"; for r in xrange(10): a += "A" ' real0m0.140s user0m0.132s sys 0m0.008s

Re: iteration without storing a variable

2009-03-25 Thread Stefan Behnel
Josh Dukes wrote: > $ python --version > Python 2.5.2 > > $ ruby --version > ruby 1.8.6 (2008-08-11 patchlevel 287) [x86_64-linux] > > but I was more talking about the speed differences between ruby and python. I heard that Ruby 1.9 is supposed to be a lot faster than 1.8 in many aspects (as is

Re: xml to xhtml

2009-04-02 Thread Stefan Behnel
jud...@gmail.com wrote: > On Apr 1, 11:16 am, Joe Riopel wrote: >> On Wed, Apr 1, 2009 at 10:43 AM, wrote: >>> If anyone can give me some guidance what should be the best way to >>> generate html/xhtml page using python would be great. I am open to >>> other options like xsl or anything else tha

Re: HTML Generation

2009-04-03 Thread Stefan Behnel
J Kenneth King wrote: > from tags import html, head, meta, title, body, div, p, a > > mypage = html( > head( > meta(attrs={'http-equiv': "Content-Type", > 'content': "text/html;"}), > title("My Page")), > body

Re: lxml and xslt extensions

2009-04-04 Thread Stefan Behnel
Hi, dasacc22 wrote: > On Apr 4, 11:31 am, dasacc22 wrote: >> Im not sure where else to ask this. The best place to ask is the lxml mailing list: http://codespeak.net/mailman/listinfo/lxml-dev >> But basically Im having trouble >> figuring out how to successfully apply multiple extensions in a

Re: PyXML and Python-2.6

2009-04-07 Thread Stefan Behnel
Andrew MacKeith wrote: > The Python.org "SIG for XML Processing in Python" page indicates that > "The SIG, through the mailing list and the PyXML project hosted on > SourceForge...". > > The PyXML project on SourceForge " is no longer maintained. ", so > perhaps the SIG page could be updated. > >

Re: xml.dom.minidom getElementsByTagName white space issue

2009-04-09 Thread Stefan Behnel
R. David Murray wrote: > Leonardo lozanne wrote: >> I'm getting some XML tags with white spaces from a web service and >> when I try to get them with the getElements ByTagName I'm not able to >> do so. I'm getting an empty list. What I'm doing is: >> >> #XML_response is an xml string >> xml_msg =

Re: Question to python C API

2009-04-15 Thread Stefan Behnel
Andreas Otto wrote: > I have the following question ... > > I write a custom "*.init" method and expect a variable number or arguments What's a "*.init" method? Do you mean SomeType.__init__() ? > This are my questions: > > 1.I need something like a for loop to analyse this argumen

Re: Question to python C API

2009-04-16 Thread Stefan Behnel
Andreas Otto wrote: > I want to make a language binding for an existing C library > > http://libmsgque.sourceforge.net > > is this possible ? Quoting the third paragraph on Cython's homepage (i.e. the link I posted): """ This makes Cython the ideal language for wrapping external C li

Re: Question to python C API

2009-04-16 Thread Stefan Behnel
Andreas Otto wrote: > the problem with such kind of framework is usually > that you start with the easy stuff and than (after a couple > of days/weeks) you come to the difficult stuff and you > have to figure out that this kind of problem does not > fit into the tool. That is a very comm

Re: How to access C structures

2009-04-17 Thread Stefan Behnel
Chris Helck wrote: > I have a couple dozen C structures that define binary file records. I > need to read the file and access the records. I need to do this very > efficiantly. > > I am aware of the Python struct class, but the C structures contain > arrays of nested structures and I'm not sure if

Re: Question to python C API

2009-04-18 Thread Stefan Behnel
Andreas Otto wrote: > just my first step in Cython > > 1. download Cython-0.11.1 > > 2. read INSTALL.txt > > < > (1) Run the setup.py script in this directory > as follows: > > python setup.py install > > This will install the Pyrex package > into your Pyt

Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-18 Thread Stefan Behnel
Daniel Molina Wegener wrote: > * Every serilization is made into unicode objects. Hmm, does that mean that when I serialise, I get a unicode object back? What about the XML declaration? How can a user create well-formed XML from your output? Or is that not the intention? Stefan -- http://

Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote: > Stefan Behnel > on Sunday 19 April 2009 02:25 > wrote in comp.lang.python: > > >> Daniel Molina Wegener wrote: >>> * Every serilization is made into unicode objects. >> Hmm, does that mean that when I serialise, I get

Re: PEP 401

2009-04-19 Thread Stefan Behnel
alessiogiovanni.bar...@gmail.com wrote: > Are 19 days that I read this PEP; it's all true? Yep. Actually, the Cython project was lucky that the FLUFL did not recognise it as an "alternative implementation of Python". That way, we can easily finish up world domination, say, 10-20 years before Pytho

Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote: > Sorry, it appears that I've misunderstand your question. By /unicode > objects/ I mean /python unicode objects/ aka /python unicode strings/. Yes, that's exactly what I'm talking about. Maybe you should read up on what Unicode is. > Most of them can be reencoded

Re: [ANN] pyxser-0.2r --- Python XML Serialization

2009-04-19 Thread Stefan Behnel
Daniel Molina Wegener wrote: > By using a different encoding than the default encoding for libxml2 makes > the work hard for libxml2 since it requires that every #PCDATA section to be > reencoded to the desired encoding and comparing one string conversion in > python against many string conversio

Re: PEP 401

2009-04-20 Thread Stefan Behnel
Chris Rebert wrote: > On Mon, Apr 20, 2009 at 1:22 AM, Steven D'Aprano > wrote: >> On Sun, 19 Apr 2009 11:02:35 -0700, alessiogiovanni.baroni wrote: >> >>> Are 19 days that I read this PEP; it's all true? >> For the benefit of people who are not aware of the tradition of "April >> Fools": >> >> ht

Re: best "void" return of a member function

2009-04-20 Thread Stefan Behnel
Andreas Otto writes: > I'm writing a native language binding for a library. > > http://libmsgque.sourceforge.net/ > > Every native method called by PYTHON have to return > a PyObject* even if the function itself does not > return anything. > [...] > Question: what is the best r

Re: best "void" return of a member function

2009-04-20 Thread Stefan Behnel
Stefan Behnel wrote: > you might want to try to wrap it in a more Pythonic > look&feel style, that wraps operations and use-cases rather than plain > functions. That should make it easier to hide things like memory allocation > and other C implementation details from users, an

Re: best "void" return of a member function

2009-04-20 Thread Stefan Behnel
Stefan Behnel wrote: > define message packing formats in advance in some way, e.g. > similar to Python's "array" module. I (obviously ;) meant the format identifiers in the "struct" module here. http://docs.python.org/library/struct.html Stefan -- http://m

Re: when can i expect libraries and third party tools to be updated for python 3 ?

2009-04-20 Thread Stefan Behnel
alessiogiovanni.baroni wrote: > On 20 Apr, 15:47, Deep_Feelings wrote: > > every one is telling "dont go with python 3 , 3rd party tools and > > libraries have no compitability with python 3" > > > > so from previous experience : when can i expect libraries and third > > party tools to be updated f

Re: best "void" return of a member function

2009-04-20 Thread Stefan Behnel
Andreas Otto wrote: >if you wrote one language interface you can write every language interface This is like saying: if you used one programming language, you can use every programming language. "Use" is different from "master" or "appreciate". > -> the tasks are allways the same... just

Re: Python interpreter speed

2009-04-20 Thread Stefan Behnel
Tim Roberts wrote: > The Python you're thinking of (CPython) is compiled to an intermediate > language, which is then interpreted by an interpreter loop, somewhat > remeniscent of Forth. It takes more cycles per instruction to run that > interpreter loop than it does to run the machine language, b

Re: problem with PyMapping_SetItemString()

2009-04-21 Thread Stefan Behnel
rahul wrote: > tatic PyObject *upadteCheck(PyObject *self,PyObject *args){ > PyObject *var_pyvalue=NULL,*newVar_pyvalue=NULL,*dict=NULL; > char *varName; > > if (!PyArg_ParseTuple(args, "s", &varName)){ > return NULL; > } > dict=PyEval_GetLocals(); >

Re: Best Python Web Framework ?

2009-04-21 Thread Stefan Behnel
Hi, SKYLAB wrote: > First , my english is not good . Note that there are many python news groups and mailing lists. There may also be one in your native language. Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: Programming in Python with a view to extending in C at a later date.

2009-04-22 Thread Stefan Behnel
Hi, dug.armad...@googlemail.com wrote: > Say you set out to program in Python knowing that you will be > converting parts of it into C ( or maybe C++) at a later date, but you > do not know which parts. > > Can you give any general Python structure / syntax advice that if > implemented from the s

Re: Cython + tuple unpacking

2009-04-23 Thread Stefan Behnel
Hugues Salamin wrote: > The following code will crash with a segfault when compiled using cython > (v0.11) > > def func(): > for (a, b) ,c ,d in zip(zip(range(3), range(3)), range(3), range(3)): > print a, b > print c > print d # This line segfault > > Compilation is

Re: ctypes

2009-04-30 Thread Stefan Behnel
luca72 wrote: > [3x the same thing] You should learn to calm down and wait for an answer. Even if the problem is urgent for you, it may not be to everyone, and spamming a newsgroup will not help to get people in a friendly mood to write a helpful reply. This is always worth a read: http://www.cat

Re: urllib2 and threading

2009-05-01 Thread Stefan Behnel
robean wrote: > I am writing a program that involves visiting several hundred webpages > and extracting specific information from the contents. I've written a > modest 'test' example here that uses a multi-threaded approach to > reach the urls with urllib2. The actual program will involve fairly >

Re: eric not working on ubuntu 9.04

2009-05-02 Thread Stefan Behnel
bvidinli wrote: > An unhandled exception occurred. Please report the problem > using the error reporting dialog or via email to > . > A log has been written to "/home/bvidinli/.eric4/eric4_error.log". Did you try that? Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: thc v0.3 - txt to html converter - better code?

2009-05-05 Thread Stefan Behnel
Florian Wollenschein wrote: > here's some code of thc, my txt to html converter programmed with Python > and pyQT4: > --- > > if self.rdioBtnTransitional.isChecked(): > if self.cmboBoxLang.currentText() == "Eng

Re: Parsing text

2009-05-06 Thread Stefan Behnel
iainemsley wrote: > for scene in text.split('Scene'): > num = re.compile("^\s\[0-9, i{1,4}, v]", re.I) > textNum = num.match(scene) Not related to your problem, but to your code - I'd write this as follows: match_scene_num = re.compile("^\s\[0-9, i{1,4}, v]", re.I).match

Re: What would YOU like to see in a txt to html converter?

2009-05-07 Thread Stefan Behnel
Florian Wollenschein wrote: > Will Wang wrote: >> *emphasis* >> **strong emphasis** >> ***very strong emphasis*** >> _underlined_ >> =verbatim and monospace= >> >> emacs-muse : http://mwolson.org/projects/EmacsMuse.html > > Thank you for this information. I already thought of using dots or > aster

Re: mod_python and xml.dom.minidom

2009-05-08 Thread Stefan Behnel
Daniel Fetchinson wrote: > On 5/8/09, dpapathanasiou wrote: >> I wrote a python script called xml_utils.py which parses xml using >> minidom. > > My only advice is, don't use mod_python. The project is dead, you > should use mod_wsgi instead: http://code.google.com/p/modwsgi/ Now that we're at it

Re: Conceptual flaw in pxdom?

2009-05-18 Thread Stefan Behnel
Emanuele D'Arrigo wrote: > I'm looking at pxdom and in particular at its foundation class > DOMObject I didn't know pxdom, but looking at it now I can see that it hasn't been updated since 2006. Not sure if that means that it is complete or that it has been abandoned. Anyway, seeing that it only

Re: Conceptual flaw in pxdom?

2009-05-18 Thread Stefan Behnel
Paul Boddie wrote: > On 18 Mai, 08:54, Stefan Behnel wrote: >> Emanuele D'Arrigo wrote: >>> I'm looking at pxdom and in particular at its foundation class >>> DOMObject >> I didn't know pxdom, but looking at it now I can see that it hasn't been

Re: Best library to make XSLT 2.0 transformation

2009-05-19 Thread Stefan Behnel
wdveloper wrote: > I need to make xml transformation using XSLT 2.0 (since i want to use > the powerful tag to produce multiple files). > In your experience, which kind of library out there is better? I'm not aware of a Python library that implements XSLT 2.0, although you might want to look arou

Re: lxml: traverse xml tree and retrieve element based on an attribute

2009-05-30 Thread Stefan Behnel
byron wrote: > I am using the lxml.etree library to validate an xml instance file > with a specified schema that contains the data types of each element. > This is some of the internals of a function that extracts the > elements: > > schema_doc = etree.parse(schema_fn) > schema = e

Re: Turning HTMLParser into an iterator

2009-05-31 Thread Stefan Behnel
samwyse wrote: > I'm processing some potentially large datasets stored as HTML. I've > subclassed HTMLParser so that handle_endtag() accumulates data into a > list, which I can then fetch when everything's done. I'd prefer, > however, to have handle_endtag() somehow yield values while the input >

Re: accessing the XML attribute value noNamespaceSchemaLocation thru Python 2.5

2009-06-04 Thread Stefan Behnel
tooshiny wrote: > I am currently successfully using lxml and ElementTree to validate and > to access the XML contained data. I can however not find any > functional call to access the schema location ie the attribute value > noNamespaceSchemaLocation. > > A simple function call would be so much ni

Re: how to iterate over several lists?

2009-06-04 Thread Stefan Behnel
kj wrote: > Suppose I have two lists, list_a and list_b, and I want to iterate > over both as if they were a single list. E.g. I could write: > > for x in list_a: > foo(x) > for x in list_b: > foo(x) > > But is there a less cumbersome way to achieve this? Take a look at the itertools mo

Re: Using file objects with elementtree

2008-05-15 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > What do we render? Sur. Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: ElementTree and DTDs

2008-05-16 Thread Stefan Behnel
J. Pablo Fernández wrote: > Is ElementTree supposed to load DTDs? AFAIR, you have to provide entities by hand. > Or is there another library that would handle DTDs correctly, > performing entity replacements? http://codespeak.net/lxml http://codespeak.net/lxml/parsing.html#parser-options Stefa

Re: Using file objects with elementtree

2008-05-16 Thread Stefan Behnel
castironpi wrote: > [...], and making up words. Blah? "Blah" is not made up. Try again. Stefan PS: this might be getting slightly off-topic... -- http://mail.python.org/mailman/listinfo/python-list

Re: Getting elements and text with lxml

2008-05-17 Thread Stefan Behnel
J. Pablo Fernández wrote: > I have an XML file that starts with: > > > > > *-a > > > out of it, I'd like to extract something like (I'm just showing one > structure, any structure as long as all data is there is fine): > > [("ofc", "*"), "-", ("rad", "a")] >>> root = etree.fromstring(

Re: xpath with big files

2008-05-21 Thread Stefan Behnel
Vladimir Kropylev wrote: > I've encountered a problem when trying to use lxml.etree.xpath with > big (63Mb) file. It returns empty list on any request. > Is there any restriction on file size for lxml.etree.xpath? No. > This is what I do: > > f=open(filename) > tree = etree.parse(f) > f.close()

Cython code generation for Py3 complete

2008-05-22 Thread Stefan Behnel
Hi, just a quick announcement that I finished the port of the Cython compiler to the Py3 target platform. While you cannot currently run Cython itself in Py3, you can build the generated C sources unchanged under Py2.3 through 3.0a5. http://cython.org/ There isn't a release yet (though there

Re: where is the Write method of ElementTree??

2008-05-23 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I'm messing around with trying to write an xml file using > xml.etree.ElementTree. All the examples on the internet show the use > of ElementTree.write(), although when I try to use it it's not > available, gives me ... > >ElementTree(sectionElement).write("section.

Re: simple url regexp

2008-05-23 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > url = re.compile(r"((http|ftp|https)\:\/\/)(www)?([a-zA-Z]{1}([\w\-]+ > \.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+ > \.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?") > > damn i hate these things. > > i want it to only match http://www.name.any/etc > > not http://wiki.x e

Re: which datastructure for fast sorted insert?

2008-05-25 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > im writing a webcrawler. > after visiting a new site i want to store it in alphabetical order. > > so obv i want fast insert. i want to delete duplicates too. > > which datastructure is best for this? Keep the data redundantly in two data structures. Use collections.de

Re: UTF problem?

2008-05-25 Thread Stefan Behnel
Vesa-Matti Sarenius wrote: > This did not work. Neither did commenting all print lines. The problem is > somewhere else. As Tim said, the line number that the exception prints would help here. Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: confused by HTMLParser class

2008-05-28 Thread Stefan Behnel
globalrev wrote: > tried all kinds of combos to get this to work. In case you meant to say that you can't get it to work, consider using lxml instead. http://codespeak.net/lxml http://codespeak.net/lxml/lxmlhtml.html Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: BeautifulSoup: problems with parsing a website

2008-05-28 Thread Stefan Behnel
Marco Hornung wrote: > Hy guys, ... and girls? > I'm using the python-framework BeautifulSoup(BS) to parse some > information out of a german soccer-website. consider using lxml. http://codespeak.net/lxml >>> from lxml import html > I want to parse the article shown on the website.

Re: Compare 2 files and discard common lines

2008-05-29 Thread Stefan Behnel
loial wrote: > I have a requirement to compare 2 text files and write to a 3rd file > only those lines that appear in the 2nd file but not in the 1st file. lines_in_file2 = set(open("file2").readlines()) for line in open("file1"): if line not in lines_in_file2: print line

Re: Compare 2 files and discard common lines

2008-05-29 Thread Stefan Behnel
Kalibr wrote: > On May 29, 6:36 pm, loial <[EMAIL PROTECTED]> wrote: >> I have a requirement to compare 2 text files and write to a 3rd file >> only those lines that appear in the 2nd file but not in the 1st file. >> >> Rather than re-invent the wheel I am wondering if anyone has written >> anythin

Re: Writing HTML

2008-06-02 Thread Stefan Behnel
Ken Starks wrote: > [EMAIL PROTECTED] wrote: >> I've searched the standard library docs, and, while there are a couple >> options for *reading* HTML from Python, I didn't notice any for >> *writing* it. Does anyone have any recommendations (particularly ones >> not listed on PyPI)? >> >> Thanks >

Re: Trying to extend Python with C: undefined reference to `Py_BuildValue'

2008-06-04 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I am trying to extend Python with some C code. Have you considered using Cython instead of C? http://cython.org/ Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: ANN: Resolver One 1.1 released

2008-06-04 Thread Stefan Behnel
Giles Thomas wrote: > We are proud to announce the release of Resolver One, version 1.1 - > the largest IronPython application in the world, we think, at 38,000 > lines of production code backed up by 130,000 lines of unit and > functional tests. Is it really IronPython specific code or would it r

Re: Looking for some good python learning resources on the web

2008-06-05 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > What are the best sites to read to learn python? http://wiki.python.org/moin/BeginnersGuide Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: BZip2 decompression and parsing XML

2008-06-06 Thread Stefan Behnel
phasma wrote: > xml.parsers.expat.ExpatError: not well-formed (invalid token): line > 538676, column 17 Looks like your XML file is broken in line 538676. > try: > handler = open(args[0], "r") This should read handler = open(args[0], "rb") Maybe t

Re: Web Crawler - Python or Perl?

2008-06-09 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > 1) I/O issues: my biggest constraint in terms of resource will be > bandwidth throttle neck. > 2) Efficiency issues: The crawlers have to be fast, robust and as > "memory efficient" as possible. I am running all of my crawlers on > cheap pcs with about 500 mb RAM and P3 t

Re: Web Crawler - Python or Perl?

2008-06-09 Thread Stefan Behnel
subeen wrote: > can use urllib2 module and/or beautiful soup for developing crawler Not if you care about a) speed and/or b) memory efficiency. http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: Web Crawler - Python or Perl?

2008-06-09 Thread Stefan Behnel
Ray Cote wrote: > Beautiful Soup is a bit slower, but it will actually parse some of the > bizarre HTML you'll download off the web. [...] > I don't know if some of the quicker parsers discussed require > well-formed HTML since I've not used them. You may want to consider > using one of the quicker

Re: Web Crawler - Python or Perl?

2008-06-10 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > As to why as opposed to what, I am attempting to build a search engine > right now that plans to crawl not just html but other things too. > > I am open to learning, and I don't want to learn anything that doesn't > really contribute to building my search engine for the

Re: Functionality similar to PHP's SimpleXML?

2008-06-13 Thread Stefan Behnel
Phillip B Oldham wrote: > I'm going to throw together a quick project over the weekend: a > spider. I want to scan a website for certain elements. > > I come from a PHP background, so normally I'd: > - throw together a quick REST script to handle http request/responses Use the urllib/urllib2 mod

Re: XML validation in stdlib?

2008-06-15 Thread Stefan Behnel
Filip Gruszczyński wrote: > I took a look at the standard library and tried to find some > validation against schema tools, but found none. I googled, but found > only links to external libraries, that can do some validation. Does it > mean, that there is no validation in stdlib or have I just miss

Re: ISO dict => xml converter

2008-06-20 Thread Stefan Behnel
kj wrote: > Hi. Does anyone know of a module that will take a suitable Python > dictionary and return the corresponding XML structure? > > In Perl I use XML::Simple's handy XMLout function: > > use XML::Simple 'XMLout'; > my %h = ( 'Foo' => +{ > 'Bar' => +{ >

Re: Fwd: xml to mysql (vice versa ) too

2008-06-24 Thread Stefan Behnel
> Le Tuesday 24 June 2008 07:08:46 swapna mudavath, vous avez écrit : >> can anybody help me in this >> >> -swapna >> >> -- Forwarded message -- >> From: swapna mudavath <[EMAIL PROTECTED]> >> Date: Mon, Jun 23, 2008 at 5:27 PM >> Subject: xml to mysql (vice versa ) too >> To: P

Re: ask for a RE pattern to match TABLE in html

2008-06-26 Thread Stefan Behnel
oyster wrote: > that is, there is no TABLE tag between a TABLE, for example > something with out table tag > what is the RE pattern? thanks > > the following is not right > [^table]*? Why not use an HTML parser instead? Try lxml.html. http://codespeak.net/lxml/ Stefan -- http://mail.python.org/

Re: Design principles and architecture of an information transfer standard based on XML and SOAP

2008-06-27 Thread Stefan Behnel
xkenneth wrote: > I'm looking for a bit of advice. There's an oilfield standard > called WITSML (Wellsite Information Transfer Standard Markup Language > - witsml.org), it's basically a collection of XML schemas and a spec For implementing XML languages, I (biasedly) advocate lxml's element class

Re: lxml and links

2008-06-27 Thread Stefan Behnel
Ampedesign wrote: > I'm trying to extract all the links on a page with lxml. Ideally, I > would like it to return me a list of hrefs of each link on the page, > in a list. > > How would I go about doing this? Read the manual? http://codespeak.net/lxml/dev/lxmlhtml.html#working-with-links http://

Re: HTML Parsing

2008-06-28 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > I am trying to build my own web crawler for an experiement and I don't > know how to access HTTP protocol with python. > > Also, Are there any Opensource Parsing engine for HTML documents > available in Python too? That would be great. Try lxml.html. It parses broken HTM

Re: lxml validation and xpath id function

2008-07-02 Thread Stefan Behnel
Floris Bruynooghe wrote: > I'm trying to use the .xpath('id("foo")') function on an lxml tree but > can't get it to work. Quick follow-up: this has been answered on the lxml mailing list: http://comments.gmane.org/gmane.comp.python.lxml.devel/3815 Stefan -- http://mail.python.org/mailman/listinf

Re: ANN: XML builder for Python

2008-07-02 Thread Stefan Behnel
Jonas Galvez wrote: > Not sure if it's been done before, but still... Obviously ;) http://codespeak.net/lxml/tutorial.html#the-e-factory ... and tons of other tools that generate XML, check PyPI. Stefan -- http://mail.python.org/mailman/listinfo/python-list

Re: ANN: XML builder for Python

2008-07-02 Thread Stefan Behnel
Stefan Behnel wrote: > Jonas Galvez wrote: >> Not sure if it's been done before, but still... > > Obviously ;) > > http://codespeak.net/lxml/tutorial.html#the-e-factory > > ... and tons of other tools that generate XML, check PyPI. Although it might be the fir

Re: ANN: XML builder for Python

2008-07-03 Thread Stefan Behnel
Hi, two comments. Gerard flanagan gmail.com> writes: > Nice! Here's a version that uses elementtree: [...] > def __call__(self, value='', **kargs): > self.element.text = value This should spell def __call__(self, value=None, **kargs): > class builder(element): > def

Re: ANN: XML builder for Python

2008-07-03 Thread Stefan Behnel
Hi, Walter Dörwald wrote: > XIST has been using with blocks since version 3.0. > > Take a look at: > http://www.livinglogic.de/Python/xist/Examples.html > > > from __future__ import with_statement > > from ll.xist import xsc > from ll.xist.ns import html, xml, meta > > with xsc.Frag() as node

Re: B-Soup: broken iterator, tag a keyword?

2008-07-10 Thread Stefan Behnel
Hi, Brendan wrote: > I have the following using Beautiful Soup: > > soup = BeautifulSoup(data) > tags = soup.findAll(href=re.compile("/MER_FRS_L2_Canada/MER_FRS_\S > +gz")) > for tag in tags: > print tag['href'] > print tag.parent.nextSibling.string > print tag.parent.nextSibling.next

Re: Bypassing WebFilter security

2008-07-10 Thread Stefan Behnel
pranav wrote: > I am working in an organization, which is using a very strict > webcontent filter management suite. Due to this i am unable to > download any exe file, or surf web (even the necessary downloads from > sourceforgenet are blocked). I was wondering, if python could be of > any help. Sa

Re: Help with using findAll() in BeautifulSoup

2008-07-11 Thread Stefan Behnel
Alexnb wrote: > Okay, I am not sure if there is a better way of doing this than findAll() but > that is how I am doing it right now. Consider using lxml.html and lxml.cssselect. http://codespeak.net/lxml/ > I am making an app that screen scapes > dictionary.com for definitions. Do they have a

Re: SAX XML Parse Python error message

2008-07-13 Thread Stefan Behnel
goldtech wrote: > My first attempt at SAX, but have an error message I need help with. Just in case you prefer writing readable code over debugging SAX code into existence, try lxml. http://codespeak.net/lxml/ Here is a presentation you might find interesting. http://codespeak.net/lxml/s5/lxml-

Re: Babelfish translation ...

2008-07-17 Thread Stefan Behnel
Stef Mientki gmail.com> writes: > Although it works functionally, > it can take lots of time waiting for the translation. > > What I basically do is, after selecting a new string to be translated: > > kwds = { 'trtext' : line_to_be_translated, 'lp' :'en_nl'} > soup = BeautifulSoup (urlop

Re: Good HTML Parser

2008-07-17 Thread Stefan Behnel
Chris wrote: > Can anyone recommend a good HTML/XHTML parser, similar to > HTMLParser.HTMLParser or htmllib.HTMLParser, but able to intelligently > know that certain tags, like , are implicitly closed? I need to > iterate through the entire DOM, building up a DOM path, but the stdlib > parsers aren

<    6   7   8   9   10   11   12   13   14   15   >