Re: lxml with python-3.12.0a5

2023-02-24 Thread Robin Becker
On 23/02/2023 18:09, Mats Wichmann wrote: I seem to always have trouble with lxml (which I know doesn't help). The cause would seem to be this: GH-101291: Refactor the `PyLongObject` struct into object header and PyLongValue struct. (GH-101292) So it looks to me like c

Re: lxml with python-3.12.0a5

2023-02-23 Thread Mats Wichmann
On 2/23/23 07:47, Mats Wichmann wrote: On 2/23/23 06:03, Robin Becker wrote: I'm trying to test python-3.12.0a5 and need to install lxml. My wheel build for lxml fails with errors like this src/lxml/etree.c: In function ‘__Pyx_PyIndex_AsSsize_t’: src/lxml/etree.c:270404:45:

Re: lxml with python-3.12.0a5

2023-02-23 Thread Mats Wichmann
On 2/23/23 06:03, Robin Becker wrote: I'm trying to test python-3.12.0a5 and need to install lxml. My wheel build for lxml fails with errors like this src/lxml/etree.c: In function ‘__Pyx_PyIndex_AsSsize_t’: src/lxml/etree.c:270404:45: error: ‘PyLongObject’ {aka ‘struct _longobject’} h

lxml with python-3.12.0a5

2023-02-23 Thread Robin Becker
I'm trying to test python-3.12.0a5 and need to install lxml. My wheel build for lxml fails with errors like this src/lxml/etree.c: In function ‘__Pyx_PyIndex_AsSsize_t’: src/lxml/etree.c:270404:45: error: ‘PyLongObject’ {aka ‘struct _longobject’} has no member named ‘ob_digit’ 2

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-19 Thread Dan Kolis
Editing text intended primarily for machine reading that involves metadata and lower level facts is a horror show. I sort of worked for a company years ago and a smart ass suggested I was making labor for myself by doing changes to a scripting language for db users, maybe a few hours a week. He

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-15 Thread aapost
On 1/11/23 13:21, Dieter Maurer wrote: aapost wrote at 2023-1-10 22:15 -0500: On 1/4/23 12:13, aapost wrote: On 1/4/23 09:42, Dieter Maurer wrote: ... You might have a look at `PyXB`, too. It tries hard to enforce schema restrictions in Python code. ... Unfortunately picking it apart for a w

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-15 Thread aapost
elem, 'nsmap'): if uri in etree_module.register_namespace._namespace_map: del etree_module.register_namespace._namespace_map[uri] else: # TODO research this for better understanding # _namespace_map is uri->prefix # DataElement.nsmap prefix->uri # lxml etree .nsmap ?

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-11 Thread Dieter Maurer
aapost wrote at 2023-1-10 22:15 -0500: >On 1/4/23 12:13, aapost wrote: >> On 1/4/23 09:42, Dieter Maurer wrote: >> ... >>> You might have a look at `PyXB`, too. >>> It tries hard to enforce schema restrictions in Python code. >> ... >Unfortunately picking it apart for a while and diving deeper in t

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-10 Thread aapost
On 1/4/23 12:13, aapost wrote: On 1/4/23 09:42, Dieter Maurer wrote: aapost wrote at 2023-1-3 22:57 -0500: ... Consider the following: from lxml import objectify, etree schema = etree.XMLSchema(file="path_to_my_xsd_schema_file") parser = objectify.makeparser(schema=schema, encod

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-04 Thread aapost
On 1/4/23 09:42, Dieter Maurer wrote: aapost wrote at 2023-1-3 22:57 -0500: ... Consider the following: from lxml import objectify, etree schema = etree.XMLSchema(file="path_to_my_xsd_schema_file") parser = objectify.makeparser(schema=schema, encoding="UTF-8") xml_o

Re: Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-04 Thread Dieter Maurer
aapost wrote at 2023-1-3 22:57 -0500: > ... >Consider the following: > >from lxml import objectify, etree >schema = etree.XMLSchema(file="path_to_my_xsd_schema_file") >parser = objectify.makeparser(schema=schema, encoding="UTF-8") >xml_obj = objectify.p

Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

2023-01-03 Thread aapost
maybe I am wandering off in a wrong way of thinking.. I am looking to interact with elements directly, loaded from a template, editing them, then ultimately submitting them to an API as a modified xml document. Consider the following: from lxml import objectify, etree schema = etree.XMLSchema

Re: lxml empty versus self closed tag

2022-03-03 Thread Dieter Maurer
Robin Becker wrote at 2022-3-3 09:21 +: >On 02/03/2022 18:39, Dieter Maurer wrote: >> Robin Becker wrote at 2022-3-2 15:32 +: >>> I'm using lxml.etree.XMLParser and would like to distinguish >>> >>> >>> >>> from >>> >>> >>> >>> I seem to have e.getchildren()==[] and e.text==None for both

Re: lxml empty versus self closed tag

2022-03-03 Thread Robin Becker
ke the distinction. However, I wonder how lxml can present an empty string content deliberately or if that always has to be a semantic decision. ` ag/>' is just a shorthand notation for '' and the difference has no influence on the DOM. Note that `lxml` is just a Python bindin

Re: lxml empty versus self closed tag

2022-03-02 Thread Dieter Maurer
I do not think so (at least not without a DTD): `' is just a shorthand notation for '' and the difference has no influence on the DOM. Note that `lxml` is just a Python binding for `libxml2`. All the parsing is done by this library. -- https://mail.python.org/mailman/listinfo/python-list

lxml empty versus self closed tag

2022-03-02 Thread Robin Becker
I'm using lxml.etree.XMLParser and would like to distinguish from I seem to have e.getchildren()==[] and e.text==None for both cases. Is there a way to get the first to have e.text=='' -- Robin Becker -- https://mail.python.org/mailman/listinfo/python-list

Re: preserving entities with lxml

2022-01-13 Thread Robin Becker
On 13/01/2022 09:29, Dieter Maurer wrote: Robin Becker wrote at 2022-1-13 09:13 +: On 12/01/2022 20:49, Dieter Maurer wrote: ... Apparently, the `resolve_entities=False` was not effective: otherwise, your tree content should have more structure (especially some entity reference children).

Re: preserving entities with lxml

2022-01-13 Thread Dieter Maurer
Robin Becker wrote at 2022-1-13 09:13 +: >On 12/01/2022 20:49, Dieter Maurer wrote: > ... >> Apparently, the `resolve_entities=False` was not effective: otherwise, >> your tree content should have more structure (especially some >> entity reference children). >> >except that the tree knows not

Re: preserving entities with lxml

2022-01-13 Thread Robin Becker
makes my life a bit easier. If I had wanted the unexpanded values in the attrib/text/tail it would be more of a problem. `&#` is not an entity reference but a character reference. It may rightfully be treated differently from entity references. I understand the difference, but lxml (and p

Re: preserving entities with lxml

2022-01-12 Thread Dieter Maurer
Robin Becker wrote at 2022-1-12 10:22 +: >I have a puzzle over how lxml & entities should be 'preserved' code below >illustrates. To preserve I change & --> & >in the source and add resolve_entities=False to the parser definition. The >escaping means we

preserving entities with lxml

2022-01-12 Thread Robin Becker
I have a puzzle over how lxml & entities should be 'preserved' code below illustrates. To preserve I change & --> & in the source and add resolve_entities=False to the parser definition. The escaping means we only have one kind of entity & which means lxml will pre

Re: lxml parsing with validation and target?

2021-11-03 Thread Robin Becker
On 02/11/2021 12:55, Robin Becker wrote: I'm having a problem using lxml.etree to make a treebuilding parser that validates; I have test code where invalid xml is detected and an error raised when the line below target=ET.TreeBuilder(), is commented out. . I managed to overcome this p

lxml parsing with validation and target?

2021-11-02 Thread Robin Becker
lxml.py", line 78, in tree = ET.parse(sys.argv[1],parser) File "src/lxml/etree.pyx", line 3521, in lxml.etree.parse File "src/lxml/parser.pxi", line 1859, in lxml.etree._parseDocument File "src/lxml/parser.pxi", line 1885, in lxml.etree._parseDocumentFr

Re: lxml - minor problem appending new element

2020-02-03 Thread Frank Millman
quot;a"] for item in items: if item == "a": items.append("a") I did feel a bit uneasy doing it, but once I had got it working it did not feel too bad. I did not test for appending from the last item, so that bug has just bitten me now, but I will run with my workar

Re: lxml - minor problem appending new element

2020-02-03 Thread Peter Otten
Frank Millman wrote: > Hi all > > I usually send lxml queries to the lxml mailing list, but it appears to > be not working, so I thought I would try here. > > This is a minor issue, and I have found an ugly workaround, but I > thought I would mention it. Like this? child

lxml - minor problem appending new element

2020-02-02 Thread Frank Millman
Hi all I usually send lxml queries to the lxml mailing list, but it appears to be not working, so I thought I would try here. This is a minor issue, and I have found an ugly workaround, but I thought I would mention it. In Python I can iterate through a list, and on a certain condition

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2019-12-03 Thread Karsten Hilbert
On Mon, Dec 02, 2019 at 08:58:11PM -0800, gerem...@gmail.com wrote: > Date: Mon, 2 Dec 2019 20:58:11 -0800 (PST) > From: gerem...@gmail.com > To: python-list@python.org > Subject: Re: lxml question -- creating an etree.Element attribute with ':' > in the name > User

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2019-12-03 Thread geremy85
Theanks a lot -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml namespace as an attribute

2018-08-17 Thread Stefan Behnel
w/path namespace, is there a way to cleanly find all > instances of Tag? In addition to what dieter said, let me mention that you do not need to obey to XPath's dictate to use namespace prefixes. lxml provides two ways of expressing searches with qualified tag names (i.e. "{namespace

Re: lxml namespace as an attribute

2018-08-16 Thread Skip Montanaro
> You seem to think that you need to take the namespace definitions > from the XML document itself. This is not the case: you can > provide them from whatever soure you want. I was under the impression that XML was a self-describing format. I've been disabused of that notion. Skip -- https://mai

Re: lxml namespace as an attribute

2018-08-15 Thread dieter
soure you want. The important part of the namespace is the namespace uri; the namespace prefix is just an abbreviation - its exact value is of no importance; you can use whatever you want (and there is no need that your choice is the same as that of the XML document). "lxml" handles

Re: lxml namespace as an attribute

2018-08-15 Thread Skip Montanaro
> See https://lxml.de/tutorial.html#namespaces and > https://lxml.de/2.1/FAQ.html#how-can-i-specify-a-default-namespace-for-xpath-expressions > for direction. I had read at least the namespaces section of the tutorial. I could see the namespace definition right there in the XML and figured somehow

RE: lxml namespace as an attribute

2018-08-15 Thread Joseph L. Casale
-Original Message- From: Python-list On Behalf Of Skip Montanaro Sent: Wednesday, August 15, 2018 3:26 PM To: Python Subject: lxml namespace as an attribute > Much of XML makes no sense to me. Namespaces are one thing. If I'm > parsing a document where namespaces are defined

Re: lxml namespace as an attribute

2018-08-15 Thread Skip Montanaro
Ack. Of course I meant the subject to be "XML namespace as an attribute". I happen to be using lxml.etree. (Long day, I guess...) S On Wed, Aug 15, 2018 at 4:25 PM Skip Montanaro wrote: > > Much of XML makes no sense to me. Namespaces are one thing. If I'm > parsing a document where namespaces ar

lxml namespace as an attribute

2018-08-15 Thread Skip Montanaro
Much of XML makes no sense to me. Namespaces are one thing. If I'm parsing a document where namespaces are defined at the top level, then adding namespaces=root.nsmap works when calling the xpath method. I more-or-less get that. What I don't understand is how I'm supposed to search for a tag when

Re: LXML: can't register namespace

2018-03-09 Thread Andrew Z
nel wrote: > > > >> Andrew Z schrieb am 07.03.2018 um 05:03: > >>> Hello, > >>> with 3.6 and latest greatest lxml: > >>> > >>> from lxml import etree > >>> > >>> tree = etree.parse('Sample.xml') >

Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Peter Otten schrieb am 09.03.2018 um 14:11: > Stefan Behnel wrote: > >> Andrew Z schrieb am 07.03.2018 um 05:03: >>> Hello, >>> with 3.6 and latest greatest lxml: >>> >>> from lxml import etree >>> >>> tree = etree.parse(&#x

Re: LXML: can't register namespace

2018-03-09 Thread Peter Otten
Stefan Behnel wrote: > Andrew Z schrieb am 07.03.2018 um 05:03: >> Hello, >> with 3.6 and latest greatest lxml: >> >> from lxml import etree >> >> tree = etree.parse('Sample.xml') >> etree.register_namespace('','http

Re: LXML: can't register namespace

2018-03-09 Thread Steven D'Aprano
On Fri, 09 Mar 2018 13:08:10 +0100, Stefan Behnel wrote: >> Is there a good reason not to support "" as the empty prefix? > > Well, the "empty prefix" is not an "empty" prefix, it's *no* prefix. The > result is not ":tag" instead of "prefix:tag", the result is "tag". That makes sense, thanks.

Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Steven D'Aprano schrieb am 09.03.2018 um 12:41: > On Fri, 09 Mar 2018 10:22:23 +0100, Stefan Behnel wrote: > >> Andrew Z schrieb am 07.03.2018 um 05:03: >>> Hello, >>> with 3.6 and latest greatest lxml: >>> >>> from lxm

Re: LXML: can't register namespace

2018-03-09 Thread Steven D'Aprano
On Fri, 09 Mar 2018 10:22:23 +0100, Stefan Behnel wrote: > Andrew Z schrieb am 07.03.2018 um 05:03: >> Hello, >> with 3.6 and latest greatest lxml: >> >> from lxml import etree >> >> tree = etree.parse('Sample.xml') >> etree.register_

Re: LXML: can't register namespace

2018-03-09 Thread Stefan Behnel
Andrew Z schrieb am 07.03.2018 um 05:03: > Hello, > with 3.6 and latest greatest lxml: > > from lxml import etree > > tree = etree.parse('Sample.xml') > etree.register_namespace('','http://www.example.com') The default namespace prefix is s

Re: LXML: can't register namespace

2018-03-07 Thread Andrew Z
xml later on. > > > On Mar 7, 2018 00:38, "Steven D'Aprano" pearwood.info> wrote: > >> On Tue, 06 Mar 2018 23:03:15 -0500, Andrew Z wrote: >> >> > Hello, >> > with 3.6 and latest greatest lxml: >> > >> > from lxml impo

Re: LXML: can't register namespace

2018-03-07 Thread Andrew Z
> wrote: > On Tue, 06 Mar 2018 23:03:15 -0500, Andrew Z wrote: > > > Hello, > > with 3.6 and latest greatest lxml: > > > > from lxml import etree > > > > tree = etree.parse('Sample.xml') > > etree.register_namespace('','htt

Re: LXML: can't register namespace

2018-03-06 Thread Steven D'Aprano
On Tue, 06 Mar 2018 23:03:15 -0500, Andrew Z wrote: > Hello, > with 3.6 and latest greatest lxml: > > from lxml import etree > > tree = etree.parse('Sample.xml') > etree.register_namespace('','http://www.example.com') > it seems to not

LXML: can't register namespace

2018-03-06 Thread Andrew Z
Hello, with 3.6 and latest greatest lxml: from lxml import etree tree = etree.parse('Sample.xml') etree.register_namespace('','http://www.example.com') causes: Traceback (most recent call last): File "/home/az/Work/flask/tutorial_1/src/xml_oper.py", l

Re: windows utf8 & lxml

2016-12-27 Thread Steve D'Aprano
On Tue, 20 Dec 2016 10:53 pm, Sayth Renshaw wrote: > content.read().encode('utf-8'), parser=utf8_parser) > > However doing it in such a fashion returns this error: > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: > invalid start byte That tells you that the XML file

Re: windows utf8 & lxml

2016-12-26 Thread Stefan Behnel
; be to use the .decode('windows-1252') to correct an ascii error. > > I am using lxml to read my content and decode is not supported are there any > known ways to read with lxml and fix unicode faults? > > The key part of my script is > > for content in root

Re: windows utf8 & lxml

2016-12-21 Thread Peter Otten
w however the last error I am trying to overcome, the solution appears >> to be to use the .decode('windows-1252') to correct an ascii error. >> >> I am using lxml to read my content and decode is not supported are there >> any known ways to read with lxml and f

Re: windows utf8 & lxml

2016-12-21 Thread Sayth Renshaw
come, the solution appears to > be to use the .decode('windows-1252') to correct an ascii error. > > I am using lxml to read my content and decode is not supported are there any > known ways to read with lxml and fix unicode faults? > > The key part of m

windows utf8 & lxml

2016-12-20 Thread Sayth Renshaw
Possibly i will have to use a different method from lxml like this. http://stackoverflow.com/a/29057244/461887 Sayth -- https://mail.python.org/mailman/listinfo/python-list

windows utf8 & lxml

2016-12-20 Thread Sayth Renshaw
ror. I am using lxml to read my content and decode is not supported are there any known ways to read with lxml and fix unicode faults? The key part of my script is for content in roots: utf8_parser = etree.XMLParser(encoding='utf-8') fix_ascii = utf8_pa

Re: lxml and xpath(?)

2016-11-02 Thread dieter
Doug OLeary writes: > ... > Any hints/tips/suggestions greatly appreciated especially with complete noob > tutorials for xpath. You can certainly do it with "XPath" (look for the "following-sibling" axis). You can also use Python (with "lxml"). If y

Re: lxml and xpath(?)

2016-10-27 Thread Peter Otten
Pete Forman wrote: > Peter Otten <__pete...@web.de> writes: > >> root = etree.fromstring(s) >> for server in root.xpath("./server"): >> servername = server.xpath("./name/text()")[0] > > When working with lxml I prefer to use this P

Re: lxml and xpath(?)

2016-10-27 Thread Pete Forman
Peter Otten <__pete...@web.de> writes: > root = etree.fromstring(s) > for server in root.xpath("./server"): > servername = server.xpath("./name/text()")[0] When working with lxml I prefer to use this Python idiom. servername, = server.xpath("

Re: lxml and xpath(?)

2016-10-26 Thread Peter Otten
rvices_MC1 > ... > > > EDIServices_MS2 > ... > EDIServices_MC2 > ... > > > EDIServices_MC1 > >EDIServices_MC1 >SSL >host001 > 7001 > > > > EDIService

lxml and xpath(?)

2016-10-24 Thread Doug OLeary
t001 7001 EDIServices_MC2 EDIServices_MC2 host002 7001 So, running it on 'normal' config, I get: $ ./lxml configs/EntsvcSoa_Domain_config.xml EntsvcSoa_CS=> host003.myco.com EntsvcSoa_CS => host004.myco.com Running

Re: xml parsing with lxml

2016-10-07 Thread Doug OLeary
On Friday, October 7, 2016 at 3:21:43 PM UTC-5, John Gordon wrote: > root = doc.getroot() > for child in root: > print(child.tag) > Excellent! thank, you sir! that'll get me started. Appreciate the reply. Doug O'Leary -- https://mail.python.org/mailman/listinfo/python-list

Re: xml parsing with lxml

2016-10-07 Thread John Gordon
In <622ea3b0-88b4-420b-89e0-9e7c6e866...@googlegroups.com> Doug OLeary writes: > >>> from lxml import etree > >>> doc =3D etree.parse('config.xml') > Now what? For instance, how do I list the top level children of > ? root = doc.getroot() for

xml parsing with lxml

2016-10-07 Thread Doug OLeary
Hey; I'm trying to gather information from a number of weblogic configuration xml files using lxml. I've found any number of tutorials on the web but they all seem to assume a knowledge that I apparently don't have... that, or I'm just being rock stupid today - that's

Re: lxml ignore none in getchildren

2016-10-04 Thread Peter Otten
o the xml and change an n to an m as in > my example below the getchildren will return none for none matches, how > can I ignore nones? > > In [2]: from lxml import etree > > In [3]: xml = ''' n="2"/>''' > > In [

lxml ignore none in getchildren

2016-10-04 Thread Sayth Renshaw
Contents: ... print "%-10s %3s" % (content.tag, content.get("n", "0")) ... horse2 cow 17 cowboy 2 >>> If I make one minor modification to the xml and change an n to an m as in my example below the getchildren will return none for n

LXML cannot access elements of dict created

2016-06-24 Thread Sayth Renshaw
from, as a first test case i am trying to parse the file and then return the dict key numbers and all its values. def parseXML(): """ given a file XML will parse for listed attributes. using objectified lxml """ for file in getsMeet(file_list):

lxml parsing whole file, unable to access elements

2016-06-24 Thread Sayth Renshaw
Hi I have created several versions of a parser for XML with lxml both objectify and lxml. They all work in that the parser returns the whole file however I cannot access elements or keys as I have tried to create a dict of the results to make it easier to put in an sql query later. However I

Re: lxml - SubElement dump root doesn't dump like Element dump does

2016-06-20 Thread Sayth Renshaw
On Monday, 20 June 2016 16:19:31 UTC+10, Peter Otten wrote: > Sayth Renshaw wrote: > > > Afternoon > > > > Wondering has anyone much experience with lxml specifically objectify? > > > > When I pick up a file with lxml and use objectify dumping root works

Re: lxml - SubElement dump root doesn't dump like Element dump does

2016-06-20 Thread Sayth Renshaw
Thanks your way makes more sense indeed. In the example they create and access I think I just got lost in their example. Sayth -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml - SubElement dump root doesn't dump like Element dump does

2016-06-19 Thread Peter Otten
Sayth Renshaw wrote: > Afternoon > > Wondering has anyone much experience with lxml specifically objectify? > > When I pick up a file with lxml and use objectify dumping root works as > expected actually better its quite nice. This is how i do it, file > handling part

lxml - SubElement dump root doesn't dump like Element dump does

2016-06-19 Thread Sayth Renshaw
Afternoon Wondering has anyone much experience with lxml specifically objectify? When I pick up a file with lxml and use objectify dumping root works as expected actually better its quite nice. This is how i do it, file handling part left out for brevity. def getsMeet(file_list): for

Re: lxml objectify - attribute elements to list.

2015-02-08 Thread Sayth Renshaw
Awesome, thanks so much for the help. Sayth -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml objectify - attribute elements to list.

2015-02-08 Thread Stefan Behnel
Sayth Renshaw schrieb am 08.02.2015 um 12:22: > How can I actually access the values of an element with lxml objectify? > > for example if I had this element in my xml file. > > VenueCode="151" TrackName="Main" TrackCode="149"> > > I ca

Re: lxml objectify - attribute elements to list.

2015-02-08 Thread Kev Dwyer
Sayth Renshaw wrote: > > Hi > > How can I actually access the values of an element with lxml objectify? > > for example if I had this element in my xml file. > > VenueCode="151" TrackName="Main" TrackCode="149"> > > I ca

lxml objectify - attribute elements to list.

2015-02-08 Thread Sayth Renshaw
Hi How can I actually access the values of an element with lxml objectify? for example if I had this element in my xml file. I can see all the attributes using this. In [86]: for child in root.getchildren(): print(child.attrib) : {} {'RequestCode': '&#x

Re: Trying to parse matchup.io (lxml, SGMLParser, urlparse)

2015-01-18 Thread Peter Otten
> modules but it isn't easy for me to work out the best way to do this > as most tutorials I see use complicated classes and I just want to > parse this one paragraph at a time (as I would do in Perl) and print > > 1 mizuho 26648 35315 > 2 xx 9 9 > 3 xx

Trying to parse matchup.io (lxml, SGMLParser, urlparse)

2015-01-18 Thread Jerry Rocteur
l) and print 1 mizuho 26648 35315 2 xx 9 9 3 xx 9 9 etc. (in the above case I'm ignoring 818.7 and Miles. The best way I found so far is this: from lxml import html import requests page = requests.get("https://matchup.io/players/rocteur/friends/week/&

Re: beautifulsoup VS lxml

2014-12-12 Thread iMath
correctly BeautifulSoup can use the > lxml engine under the hood, so maybe it's the way to go for you, is it > gives you the most flexibility. It certainly has a good API that's easy > to use for data scraping. Try it and see if it's acceptable. tried it, very elegant

Re: beautifulsoup VS lxml

2014-12-11 Thread Michael Torrie
On 12/11/2014 07:02 PM, iMath wrote: > > which is more easy and elegant for pulling data out of HTML? Beautiful Soup is specialized for HTML parsing, and it can deal with badly formed HTML, but if I recall correctly BeautifulSoup can use the lxml engine under the hood, so maybe it's

beautifulsoup VS lxml

2014-12-11 Thread iMath
which is more easy and elegant for pulling data out of HTML? -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml and namespaces

2014-07-29 Thread Irmen de Jong
On 29-7-2014 20:35, Marc Aymerich wrote: > Got it! > xml = lxml.builder.ElementMaker( > nsmap = { > None: "urn:iso:std:iso:20022:tech:xsd:pain.008.001.02", > 'xsi': "http://www.w3.org/2001/XMLSchema-instance";, > } > ) > doc = xml.Document() Thanks for taking the tim

Re: lxml and namespaces

2014-07-29 Thread Marc Aymerich
On Tue, Jul 29, 2014 at 8:19 PM, Marc Aymerich wrote: > Hi, I'm desperately trying to construct an XML with the following document > declaration: > > xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> > > I'm using LXML, and what I'm doing is

lxml and namespaces

2014-07-29 Thread Marc Aymerich
Hi, I'm desperately trying to construct an XML with the following document declaration: http://www.w3.org/2001/XMLSchema-instance”> I'm using LXML, and what I'm doing is this >>> from lxml import etree >>> from lxml.builder import E >>> d

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2013-09-19 Thread Stefan Behnel
tance}my_node_name') > > will generate a proper xmlns declaration for you. It may not be the same > every time, but it will do the job just as well. For this specific namespace, and also a couple of other well-known namespace URIs, lxml will use the "expected" prefix by default. Stefan -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2013-09-18 Thread dieter
Roy Smith writes: > But, how do I handle something like: > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";, since "xmlns:xsi" > isn't a valid python identifier? Read about "lxml"'s "namespace" support. -- https://mail.python.org/mailman/listinfo/python-list

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2013-09-18 Thread Burak Arslan
On 09/18/13 21:59, Roy Smith wrote: > I can create an Element with a 'foo' attribute by doing: > > etree.Element('my_node_name', foo="spam") > > But, how do I handle something like: > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";, since "xmlns:xsi" > isn't a valid python identifier? > >

Re: lxml question -- creating an etree.Element attribute with ':' in the name

2013-09-18 Thread Zachary Ware
On Wed, Sep 18, 2013 at 1:59 PM, Roy Smith wrote: > I can create an Element with a 'foo' attribute by doing: > > etree.Element('my_node_name', foo="spam") > > But, how do I handle something like: > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";, since "xmlns:xsi" > isn't a valid python

lxml question -- creating an etree.Element attribute with ':' in the name

2013-09-18 Thread Roy Smith
I can create an Element with a 'foo' attribute by doing: etree.Element('my_node_name', foo="spam") But, how do I handle something like: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";, since "xmlns:xsi" isn't a valid python identifier? --- Roy Smith r...@panix.com -- https://mail.py

Re: lxml tostring quoting too much

2013-08-07 Thread andrea crotti
2013/8/6 Chris Down : > On 2013-08-06 18:38, andrea crotti wrote: >> I would really like to do the following: >> >> from lxml import etree as ET >> from lxml.builder import E >> >> url = "http://something?x=10&y=20"; >> l = E

Re: lxml tostring quoting too much

2013-08-06 Thread Chris Down
On 2013-08-06 18:38, andrea crotti wrote: > I would really like to do the following: > > from lxml import etree as ET > from lxml.builder import E > > url = "http://something?x=10&y=20"; > l = E.link(url) > ET.tostring(l) -> "http://something?x=10&

lxml tostring quoting too much

2013-08-06 Thread andrea crotti
I would really like to do the following: from lxml import etree as ET from lxml.builder import E url = "http://something?x=10&y=20"; l = E.link(url) ET.tostring(l) -> "http://something?x=10&y=20" However the lxml tostring always quotes the &, I can't f

Re: how to get the source of html in lxml?

2012-12-30 Thread Dave Angel
On 12/31/2012 01:32 AM, contro opinion wrote: > import urllibimport lxml.html > down='http://blog.sina.com.cn/s/blog_71f3890901017hof.html' > file=urllib.urlopen(down).read() > root=lxml.html.document_fromstring(file) > body=root.xpath('//div[@class="articalContent "]')[0]print > body.text_conten

Re: how to get the source of html in lxml?

2012-12-30 Thread Chris Rebert
On Sun, Dec 30, 2012 at 10:32 PM, contro opinion wrote: > import urllib > import lxml.html > down='http://blog.sina.com.cn/s/blog_71f3890901017hof.html' > file=urllib.urlopen(down).read() > root=lxml.html.document_fromstring(file) > body=root.xpath('//div[@class="articalContent "]')[0] > print bo

Re: lxml 3.0 final released - efficient XML and HTML processing with Python

2012-10-10 Thread D.M. Procida
Stefan Behnel wrote: > it's been a while since the last stable release series appeared, so I'm > proud to announce the final release of lxml 3.0. Great. We use it in <https://bitbucket.org/spookylukey/semanticeditor/wiki/Home>. Thanks. Daniele -- http://mail.pytho

lxml 3.0 final released - efficient XML and HTML processing with Python

2012-10-09 Thread Stefan Behnel
Hi everyone, it's been a while since the last stable release series appeared, so I'm proud to announce the final release of lxml 3.0. http://lxml.de/ http://pypi.python.org/pypi/lxml/3.0 Changelog: http://lxml.de/changes-3.0.html In short, lxml is the most feature-rich and easy-to-u

Re: lxml can't output right unicode result

2012-09-06 Thread MRAB
.. 你 is you in english, "\xc4\xe3" is the gbk encode of it. "\xe4\xbd\xe3" is the utf-8 encode of it. "u\x4f\x60" is the unicode encode of it. now i parse it in lxml >>> "你" '\xe4\xbd\xa0' >>> "你".decode(&q

lxml can't output right unicode result

2012-09-06 Thread contro opinion
ot; is the gbk encode of it. "\xe4\xbd\xe3" is the utf-8 encode of it. "u\x4f\x60" is the unicode encode of it. now i parse it in lxml >>> "你" '\xe4\xbd\xa0' >>> "你".decode("utf-8") u'\u4f60' >>> &q

Re: docx/lxml

2012-07-31 Thread Cyrille Leroux
On Tuesday, July 31, 2012 5:13:12 PM UTC+2, Stefan Behnel wrote: > Cyrille Leroux, 31.07.2012 17:01: > > > I'm giving pip a try : > > > > > > > > > 1/ Linux (debian lenny) > > > - (as root) sh setuptools-0.6c11-py2.7.egg (ok) > > > - (as root) cd pip-1.1 ; python setup.py install (ok) > >

Re: docx/lxml

2012-07-31 Thread Stefan Behnel
Cyrille Leroux, 31.07.2012 17:01: > I'm giving pip a try : > > > 1/ Linux (debian lenny) > - (as root) sh setuptools-0.6c11-py2.7.egg (ok) > - (as root) cd pip-1.1 ; python setup.py install (ok) > - pip : ImportError : No module named pkg_resources > - damn, I guess it's going to be a pain, again

Re: docx/lxml

2012-07-31 Thread Cyrille Leroux
es ? > > > > > > Regards, > > > > > > Cyrille > > > > Hi, > > > > May I suggest you use pip and, possibly, virtualenv? > > pip makes it easy to install Python packages while virtualenv creates an > isolated Python envi

Re: docx/lxml

2012-07-31 Thread Pedro Kroger
thon packages while virtualenv creates an isolated Python environment For instance, I just installed docx and its dependencies with: pip install docx lxml datutils PIL And I did that inside a testing virtualenv, so I wouldn't mess up my Python setup. pip and virtualenv make it really easy

docx/lxml

2012-07-31 Thread cyrille . leroux
First, I downloaded docx package, copied the docx directory I found. Then, I added its path (sys.path.append()) But it complained it missed a lxml package. Ok, I downloaded it, copied the lxml directory. This time, it wanted a etree.py file. I search and found it is generated during installat

Re: Install lxml package on Windows 7

2012-05-29 Thread Irmen de Jong
On 29-5-2012 22:41, David Fanning wrote: > Folks, > > I need some help. I need the lxml package to run a particular > Python program. > >http://lxml.de/ > > I downloaded the appropriate binary egg package for lxml, and > I found easy_install.exe in my Python 2.7

  1   2   3   4   >