cate that if you can drop the ratio of
documents that require a run of html5lib below 30% and use lxml's parser
for the rest, you will still be faster than with BeautifulSoup alone.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
1246993200
> % clock scan "today + 1 fortnight"
> 1248135628
>
> Does any such package exist for Python?
Is this only for English times or is I18N a concern?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
if c == "ö":
You are reading Unicode strings, so you have to compare it to a unicode
string as in
if c == u"ö":
> print "oe"
> else:
> print c
Note that printing non-ASCII characters may not always work, depending on
your terminal.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Michiel Overtoom schrob:
> Viele Röhre. Macht spaß! Tsüsch!
LOL! :)
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
n--
which *do* fail with a SyntaxError. I think I faintly remember trying those
in my early Python days and immediately went for "+=" when I saw them fail
(as I had expected).
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
0 loops, best of 3: 222 usec per loop
$ python3.1 -m timeit 'list(x for x in range(1000) if x)'
1000 loops, best of 3: 227 usec per loop
$ python3.1 -m timeit -s 'r=[i%2 for i in range(2000)]' \
'list(x for x in r if x)'
1000 loops, best o
r,
> when going to production, we'll need the proper one.
Use a global variable in the module.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Deep_Feelings wrote:
> So you have chosen programming language "x" so shall you tell us why
> you did so , and what negatives or positives it has ?
Java, pays a living.
*duck*
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ne, Linux distros do that by default, for example.
"many things fail" is not a very detailed problem description, though.
Could you state more exactly what you do and provide the error messages
that you see?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
all performance would use something else anyway (do I need to mention
Cython here?)
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
t currently looks
like all three are there to stay and to keep growing better. And I'm also
happy to read that some optimisations jump from one to the other. ;)
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
The latter also has support for XML-Schema validation, and you might
be interested in lxml.objectify for handling data centric XML formats
(assuming that's the case here).
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Tycho Andersen wrote:
> Blah, forgot to include the list. When is python-list going to get Reply-To?
Hopefully never.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
can keep parsing regardless of errors and will drop the broken content.
However, it is *always* better to fix the input, if you get any hand on it.
Broken XML is *not* XML at all. If you can't fix the source, you can never
be sure that the data you received is in any way comple
html#iterators
Note that Cython doesn't currently support the "yield" statement, but
that's certainly on the ToDo list.
http://trac.cython.org/cython_trac/ticket/83
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
arser like the one in
lxml.html? That would eliminate this kind of problem altogether, as you'd
always get a well-decoded unicode string from the tree content.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
27;t there as well.
What helps is to put a fake Pyrex installation into your sys.path, like
http://codespeak.net/svn/lxml/trunk/fake_pyrex/
as done at the top of
http://codespeak.net/svn/lxml/trunk/setup.py
I haven't tried if newer setuptools versions have been fixed yet.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
em:
# ...
Then use
record = lap.find("recordtagname")
to find things inside the subtree. You can also use XPath-like expressions
such as
all_intersting_elements =
lap.findall("sometag/somechild//somedescendant")
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
t; Lord Of The Rings
> XML-Schema Specification
> Aladin
Use the iterparse() function of the xml.etree.ElementTree package.
http://effbot.org/zone/element-iterparse.htm
http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
nto the module
namespace *after* the compilation, either by assigning module attributes or
by importing the module into a custom namespace.
Given that both use cases are extremely rare, it was decided that
optimisations like this are more important than the ability to redefine the
most commo
me
inside your module (even the range function itself), this will disable the
optimisation.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ead/thread/d88d02f269a7d20d#
I noticed a problem with Google in general this weekend, not even related
to mailing lists. I can't remember getting similarly bad results from a web
search for years. Even trivial queries that worked for months returned
completely unrelated pages and lacked the &
inder wrote:
> On Aug 17, 8:31 pm, John Posner wrote:
>>> Use the iterparse() function of the xml.etree.ElementTree package.
>>> http://effbot.org/zone/element-iterparse.htm
>>> http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
>>> Stefan
>
for elem in tree.findall('//book/title'):
> print elem.text
Is that really so much better than an iterparse() version?
from xml.etree.ElementTree import ElementTree
for _, elem in ElementTree.iterparse("myfile.xml"):
if elem.tag == 'book':
p
hon package ?
No, there isn't any XMLSchema support in the stdlib.
However, you may still be able to use lxml locally for development and with
validation enabled, and switch to non-validating ElementTree on
distribution/pre-prod-testing/whatever. Just use a conditional import and
write a bit
meant for fun, I'd vote for this. This is very good advice.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
docs.python.org/library/threading.html
http://docs.python.org/library/multiprocessing.html
Both share a (mostly) common interface and are simple enough to use. They
are pretty close to the above interface already.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
y extension libraries that release the GIL (for the most common
> Python implementation), they'll run faster being called in sequence
> since you won't have the overhead of task switching.
... unless, obviously, the hardware is somewhat up to date (which is not
that uncommon for number
ritten by James Clark.
BTW, if you are new to XML and want to use it in Python, you might want to
start with the xml.etree package.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
code; use the codecs module with a suitable
encoding to read encoded text files, and use an XML parser when reading XML.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
lxml, at least.
Note that fromstring() behaves the same as XML(), but it reads better when
parsing from a string variable. XML() reads better when parsing from a
literal string.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Justin wrote:
> list 'results' from maps.google then crawl through the (engine of some
> sort) space to the 'results' website and look at it html to find the
> contact
Good idea. How do you know how to recognise the contact? He/she might come
disguised.
Stefan
--
h
, and
Cython has a running sub-project on providing better Fortran integration
(which might be of interest to you anyway).
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
who started their postings with "hi guys!",
and I stopped doing that as a) it became too tiring, especially on a
potentially-for-newbees group like c.l.py, and b) to many people it
actually *is* a figure of speech.
But reading statements like the above really makes me feel that it's best
aps the single greatest cause
>>> of human misery.
>>
>> You mean the single greatest cause of human misery isn't
>> Microsoft Windows?
>>
> No, emacs is responsible ! Hail to Vi !
Heck, where's Godwin's law when you need it?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Mensanator wrote:
> asking how many Jews you can fit into a Volswagen.
None, because it's already full.
(or "voll" as those who design Volkswagens would put it...)
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Python 3 yet and
> (according to their development websites) will not for a very long
> time to come.
http://wiki.python.org/moin/PortingDjangoTo3k
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
s a tostring() method that returns a string. To get a pretty
printed representation, you can use the indent() function from this recipe:
http://effbot.org/zone/element-lib.htm#prettyprint
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
> by using the encoding that the user pass as enc parameter to the
> serialization function. This means that Unicode strings are serialized
> in a human readable form, regarding a better interoperability with
> other platforms.
You mean, the whole XML document is serialised with that encodi
tem/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/sgmllib.py"
You can use "python -m sgmllib" to call a module from the stdlib (or the
PYTHONPATH, to be more accurate).
But note that sgmllib is a particularly cumbersome way to deal with HTML.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Steven D'Aprano wrote:
> I'm amused and somewhat perplexed that somebody with the non-English
> name of Stefan, writing from a .de email address, seems to be assuming
> that (1) everybody is on the Internet, and (2) everybody on the Internet
> speaks English.
Oh, I t
Mensanator wrote:
> On Aug 23, 2:25�pm, Stefan Behnel wrote:
>> Mensanator wrote:
>>> asking how many Jews you can fit into a Volswagen.
>> None, because it's already full.
>
> A spelling error does not make it any less offensive.
As it stands, I find the jok
run the script "sgmllib.py"
*in the current directory*. According to the original post, that's clearly
not the intention of the OP.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Dave Angel wrote:
> Stefan Behnel wrote:
>> elsa wrote:
>>> I know how to turn HTML into an ElementTree object
>>
>> I don't. ;)
>>
>> ElementTree doesn't have an HTML parser, so what do you use for parsing?
>>
> Perhaps the OP was r
e further down:
http://mail.python.org/pipermail/python-list/2006-July/567400.html
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
rs with an identical code
point value. So you do not risk any failures or data loss.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
code using a other extension c-code function called
> from python code
>
> python CRASH with invalid thread-state object
You forgot to create a thread state for the new thread. See the
PyThreadState_New() function.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Steven D'Aprano wrote:
> On Mon, 24 Aug 2009 09:40:03 +0200, Stefan Behnel wrote:
>
>>> Or you could enter the 21 century and understand that "guys" has become
>>> a generic term for people of any sex.
>> Is that true for everyone who understands and
inal is configured for
US-ASCII, so you can't output anything but US-ASCII characters.
Change your terminal setup to e.g. UTF-8 and see how things start working.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ibrary.
> But I'm having a tough time finding some good examples of that, because
> all the tutorials I've found just tell you to use the aforementioned
> magic methods, which unfortunately don;t seem to be working for me.
http://effbot.org/zone/element-soap.htm
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Daniel Molina Wegener wrote:
> Stefan Behnel wrote:
>> Daniel Molina Wegener wrote:
>>> When the object is restored, by using pyxser.unserialize:
>>>
>>> pyobj = pyxser.unserialize(obj = xmldocstr, enc = "utf-8")
>> But this is XML, righ
Stefan Behnel wrote:
> for all byte
> strings, regardless of their encoding (since you can't even know if they
> represent encoded text at all).
Hmm, having written that, I guess it's actually best to encode byte strings
as base64 instead. Otherwise, null bytes and other specia
John Gordon wrote:
> Any suggestions?
Well, yes, see the link I posted.
http://effbot.org/zone/element-soap.htm
That might actually be the easiest way to get your stuff done, and it
avoids external dependencies (well, except for ElementTree, if you continue
to use Python <= 2.4).
o the iterparse() function which
supports iterative parsing of an XML file and thus allows intermediate
cleanup of used data.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
r wrote:
> As long as Java
> can be complied strait to machine code
I think you meant "compared" here.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
n most cases. I guess the unicode() function is the right
thing to use here (or the str() function in Py3 - no idea if that's
supported by Qt by now).
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ing the file as html.
... which is obviously not the correct thing to do when it's XHTML.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ivial. The most well known
example is clearly Erlang. Adding "synchronised" data structures to that
will not make writing race conditions much easier.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Nigel Rantor wrote:
> My comment you quoted was talking about Java and the use of
> synchronized. I fthat was unclear I apologise.
Well, it was clear. But it was also unrelated to what the OP wrote. He was
talking about the semantics of "synchronized" in Java, not the use.
thon code,
i.e. without basically inventing a new language. If that's required for
removing the GIL, I doubt that it will ever be done.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Richard Brodie wrote:
> "Stefan Behnel" wrote:
>> Lee wrote:
>>> Not a bug in IE (this time), which is correctly parsing the file as html.
>> ... which is obviously not the correct thing to do when it's XHTML.
>
> It isn't though; it's HTML
r, your question seems to imply that you generate the XML manually
using string concatenation, which is a rather bad idea. Python has great
XML tools like ElementTree that help in generating and serialising XML
correctly (besides parsing, searching and other things).
Stefan
--
http://mail.python.or
ybe others?
See, for example, the python-dev archives from 2009-03-02.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
gt; Traceback (most recent call last):
> File "./a.py", line 6, in
> for row in data:
> _csv.Error: iterator should return strings, not bytes (did you open
> the file in text mode?)
See codecs.EncodedFile().
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
to the "Engineering" key was itself a dict and was
> assigned {'Analysis' : 'Simulink'} for example.
You might want to read up on recursion, i.e. a function calling itself.
You can find out if something is a dict like this:
isinstance(x, dict)
or, if you kn
, I actually tend to have a lot of fun per line with Java. But that's
usually not with code I have written myself.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
rl)
urls = [ img.src for img in doc.xpath('//img') ]
Then use e.g. urllib2 to save the images.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
"wb" as mode to write as binary.
Otherwise you'll get automatic line ending conversion (at
least on Windows) which will give the result you describe.
If my answer doesn't help, you probably need to describe in
more detail what you're doing, including showing some real
code.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Josh English, 27.08.2010 01:30:
solve a lot of the problems I'm running into in my own attempt to
build a python Class implementation of an XML Validation object.
How would object serialisation help here?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
7;s the router, the OP might try to change their router
settings to get rid of the problem.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ocal
Only after the assignment "x_local += y_local" x_local
points to a new object which is the result of the addition
of the previously "shared" object and y_local.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
'.//components' )
"." matches the current element, so the path expression looks for all
"components" nodes *below* the current element.
You can either wrap the root in an ElementTree and search globally (i.e.
without the leading "."), or you can test the root
de if that ever changed in whatever
implementation.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
sts than
CPython as it avoids the interpreter loop. It also optimises away the
literal sequences in "in" tests such as
if x in (1,2,3):
...
which, in the best case of integer literals, even compile down into C
switch statements.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ion
return my_var
UnboundLocalError: local variable 'my_var' referenced before assignment
as soon as the function is called.
If you want to have the global my_var modified, you need
a "global my_var" statement in the function body.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
at.
What about using the csv (not CVS) module?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
m instances with (most likely) cached hash values.
So even that will most likely be much faster than the spelled-out code above.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
o match the algorithmic complexities at least for
the major builtin types. It seems quite clear to me as a developer that the
set of builtin types and "collections" types was chosen in order to cover a
certain set of algorithmic complexities and not just arbitrary interfaces.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
could become a
little boring to be the first who arrives in the morning ...
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
Aahz, 01.09.2010 17:40:
I still think that making a full set of
algorithmic guarantees is a Bad Idea, but I think that any implementation
that doesn't have O(1) for list element access is fundamentally broken,
and we should probably document that somewhere.
+1
Stefan
--
give up one bit of CPython compatibility to use all of that.
That alone counts as a pretty huge advantage to some people.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
il, http://ftputil.sschwarzer.net . :-)
As the name implies, it's FTP-only for now, though.
If you have any questions regarding the library, please ask.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
so you won't need the "cdef"
annotation. It won't automatically do that for "a", though, as that might
break Python's unlimited integer semantics if "imax" and/or "a" are large
enough.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ould CPython do that?
If you want a binary extension module for CPython, you can try to push the
RPython module through Cython. However, in that case, you wouldn't be
restricted to RPython in the first place.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
"next" in each call.
That's even a well known way to implement state machines.
However, as usual, the details are a bit different in CPython, which has a
C level slot for the "next" method. So the lookup isn't as heavy as it looks.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
t gets returned from the function. C
compilers do these things to benchmarks these days.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
replaces
range() in Python 3.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
driven frameworks
(like Twisted, eventlet and others) that make asynchronous event handling
fast and easy, and that use much higher-level abstractions than pure state
machines.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
you remove it, the string will become a unicode
literal. Since the code is syntax compatible with Python 3, simply running
it in a Python 3 interpreter will also show this behaviour.
So it's redundant in Python 2, but it's no longer redundant when you plan
to migrate the code to Python 3.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
g something. So you don't take the risk of introducing side effects
somewhere because all state implementations are pure functions (at least as
far as the state machine itself is concerned).
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
should have an impact on a programmer's daily job.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
ippets between the code blocks.
What do you get if you test your text file by explicitly
calling doctest.testfile?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
different efforts that address very different issues.
All those compilers that offer loop unrolling are therefore wasting
their time...
Sometimes they do, yes.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
en situation.
This can be very fast, since
the loop counter need not be a Python object
It still has to count, though.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
VGNU Linux, 06.09.2010 13:02:
Can Python be used for embedded systems development ?
It can and has been.
What kind of embedded system with what set of capabilities are you thinking
about? TV sets? Mobile phones? Smart dust?
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
hon?
If you can tell us what these structs are being used for in the original C
code, we might be able to point you to a suitable way to implement the same
thing efficiently in Python.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
".
For example,
f = open('d:\nice_filename.txt', 'a')
will give surprising results. :-) Either double the
backslash, use a raw string, or, in the special case of
file system paths, possibly use a forward slash.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
rs by using profile based optimisation.
BTW, I wonder why the code takes a whole 0.8 seconds to run in your gcc
test. Maybe you should use a newer GCC version. It shouldn't take more than
a couple of milliseconds (for program startup, OS calls, etc.), given that
the output is
different types, whereas
the *args syntax happily accepts any iterable object.
But I think it's still a rare enough use case to require
f(*(tuple(my_list) + tuple(my_other_list)))
when you need it, although the concatenation would likely get split up and
moved into an explicit varia
ol(0)
False
It simply follows Python's boolean coercion rules.
If you consider it inconsisten w.r.t. int('32'), then what about
>>> list('[]')
['[', ']']
Stefan
--
http://mail.python.org/mailman/listinfo/python-list
2001 - 2100 of 2239 matches
Mail list logo