Ignoring XML Namespaces with cElementTree

2010-04-27 Thread dmtr
Is there any way to configure cElementTree to ignore the XML root namespace? Default cElementTree (Python 2.6.4) appears to add the XML root namespace URI to _every_ single tag. I know that I can strip URIs manually, from every tag, but it is a rather idiotic thing to do (performance wise). -- h

Re: Ignoring XML Namespaces with cElementTree

2010-04-29 Thread dmtr
I'm referring to xmlns/URI prefixes. Here's a code example: from xml.etree.cElementTree import iterparse from cStringIO import StringIO xml = """http://www.very_long_url.com";>""" for event, elem in iterparse(StringIO(xml)): print event, elem The output is: end http://www.very_long_url.com}ch

Re: Ignoring XML Namespaces with cElementTree

2010-04-30 Thread dmtr
> I think that's your main mistake: don't remove them. Instead, use the fully > qualified names when comparing. > > Stefan Yes. That's what I'm forced to do. Pre-calculating tags like tagChild = "{%s}child" % uri and using them instead of "child". As a result the code looks ugly and there is extra

Re: Ignoring XML Namespaces with cElementTree

2010-04-30 Thread dmtr
Here's a link to the patch exposing this parameter: http://bugs.python.org/issue8583 -- http://mail.python.org/mailman/listinfo/python-list

Re: Ignoring XML Namespaces with cElementTree

2010-05-01 Thread dmtr
> Unless you have multiple namespaces or are working with defined schema > or something, it's useless boilerplate. > > It'd be a nice feature if ElementTree could let users optionally > ignore a namespace, unfortunately it doesn't have it. Yep. Exactly my point. Here's a link to the patch address

Re: Parser

2010-05-02 Thread dmtr
On May 2, 12:54 pm, Andreas Löscher wrote: > Hi, > I am looking for an easy to use parser. I am want to get an overview > over parsing and want to try to get some information out of a C-Header > file. Which parser would you recommend? ANTLR -- http://mail.python.org/mailman/listinfo/python-list

Re: Parser

2010-05-02 Thread dmtr
> > > ANTLR > > I don't know if it's that easy to get started with though. The > companion for-pay book is *most excellent*, but it seems to have been > written to the detriment of the normal online docs. > > Cheers, > Chris > --http://blog.rebertia.com IMO ANTLR is much easier to use compared to

A python interface to google-sparsehash?

2010-05-04 Thread dmtr
Anybody knows if a python sparsehash module is there in the wild? -- http://mail.python.org/mailman/listinfo/python-list

An empty object with dynamic attributes (expando)

2010-06-03 Thread dmtr
How can I create an empty object with dynamic attributes? It should be something like: >>> m = object() >>> m.myattr = 1 But this doesn't work. And I have to resort to: >>> class expando(object): pass >>> m = expando() >>> m.myattr = 1 Is there a one-liner that would do the thing? -- Cheers, D

Re: getting MemoryError with dicts; suspect memory fragmentation

2010-06-03 Thread dmtr
On Jun 3, 3:43 pm, "Emin.shopper Martinian.shopper" wrote: > Dear Experts, > > I am getting a MemoryError when creating a dict in a long running > process and suspect this is due to memory fragmentation. Any > suggestions would be welcome. Full details of the problem are below. > > I have a long r

Re: getting MemoryError with dicts; suspect memory fragmentation

2010-06-03 Thread dmtr
> I have a long running processing which eventually dies to a > MemoryError exception. When it dies, it is using roughly 900 MB on a 4 > GB Windows XP machine running Python 2.5.4. If I do "import pdb; BTW have you tried the same code with the Python 2.6.5? -- Dmitry -- http://mail.python.org/ma

Re: getting MemoryError with dicts; suspect memory fragmentation

2010-06-03 Thread dmtr
I'm still unconvinced that it is a memory fragmentation problem. It's very rare. Can you give more concrete example that one can actually try to execute? Like: python -c "list([list([0]*xxx)+list([1]*xxx)+list([2]*xxx) +list([3]*xxx) for xxx in range(10)])" & -- Dmitry -- http://mail.python.

Re: An empty object with dynamic attributes (expando)

2010-06-04 Thread dmtr
> Why does it have to be a one-liner? Is the Enter key on your keyboard > broken? Nah. I was simply looking for something natural and intuitive, like: m = object(); m.a = 1; Usually python is pretty good providing these natural and intuitive solutions. > You have a perfectly good solution: defin

Re: An empty object with dynamic attributes (expando)

2010-06-05 Thread dmtr
Right. >>> m = lambda:expando >>> m.myattr = 1 >>> print m.myattr 1 -- Cheers, Dmitry -- http://mail.python.org/mailman/listinfo/python-list

Re: An empty object with dynamic attributes (expando)

2010-06-10 Thread dmtr
On Jun 9, 7:31 pm, a...@pythoncraft.com (Aahz) wrote: > dmtr   wrote: > > >>>> m = lambda:expando > >>>> m.myattr = 1 > >>>> print m.myattr > >1 > > That's a *great* technique if your goal is to confuse people. > -

How to print SRE_Pattern (regexp object) text for debugging purposes?

2010-06-17 Thread dmtr
I need to print the regexp pattern text (SRE_Pattern object ) for debugging purposes, is there any way to do it gracefully? I've came up with the following hack, but it is rather crude... Is there an official way to get the regexp pattern text? >>> import re, pickle >>> r = re.compile('^abc$', re.

Re: How to print SRE_Pattern (regexp object) text for debugging purposes?

2010-06-17 Thread dmtr
On Jun 17, 3:35 pm, MRAB wrote: > >  >>> import re >  >>> r = re.compile('^abc$', re.I) >  >>> r.pattern > '^abc$' >  >>> r.flags > 2 Hey, thanks. It works. Couldn't find it in a reference somehow. And it's not in the inspect.getmembers(r). Must be doing something wrong. -- Cheers, Dmitry

Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-06 Thread dmtr
I'm running into some performance / memory bottlenecks on large lists. Is there any easy way to minimize/optimize memory usage? Simple str() and unicode objects() [Python 2.6.4/Linux/x86]: >>> sys.getsizeof('') 24 bytes >>> sys.getsizeof('0')25 bytes >>> sys.getsizeof(u'')28 bytes >>>

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-06 Thread dmtr
Steven, thank you for answering. See my comments inline. Perhaps I should have formulated my question a bit differently: Are there any *compact* high performance containers for unicode()/str() objects in Python? By *compact* I don't mean compression. Just optimized for memory usage, rather than per

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-06 Thread dmtr
> > Well...  63 bytes per item for very short unicode strings... Is there > > any way to do better than that? Perhaps some compact unicode objects? > > There is a certain price you pay for having full-feature Python objects. Are there any *compact* Python objects? Optimized for compactness? > Wha

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-06 Thread dmtr
On Aug 6, 10:56 pm, Michael Torrie wrote: > On 08/06/2010 07:56 PM, dmtr wrote: > > > Ultimately a dict that can store ~20,000,000 entries: (u'short > > string' : (int, int, int, int, int, int, int)). > > I think you really need a real database engine.  With th

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-07 Thread dmtr
On Aug 6, 11:50 pm, Peter Otten <__pete...@web.de> wrote: > I don't know to what extent it still applys but switching off cyclic garbage > collection with > > import gc > gc.disable() Haven't tried it on the real dataset. On the synthetic test it (and sys.setcheckinterval(10)) gave ~2% speedu

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-07 Thread dmtr
Correction. I've copy-pasted it wrong! array.array('i', (i, i+1, i+2, i +3, i+4, i+5, i+6)) was the best. >>> for i in xrange(0, 100): d[unicode(i)] = (i, i+1, i+2, i+3, i+4, i+5, >>> i+6) 100 keys, ['VmPeak:\t 224704 kB', 'VmSize:\t 224704 kB'], 4.079240 seconds, 245143.698209 keys pe

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-07 Thread dmtr
> Looking at your benchmark, random.choice(letters) has probably less overhead > than letters[random.randint(...)]. You might even try to inline it as Right... random.choice()... I'm a bit new to python, always something to learn. But anyway in that benchmark (from http://bugs.python.org/issue952

Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

2010-08-07 Thread dmtr
I guess with the actual dataset I'll be able to improve the memory usage a bit, with BioPython::trie. That would probably be enough optimization to continue working with some comfort. On this test code BioPython::trie gives a bit of improvement in terms of memory. Not much though... >>> d = dict()