date:20220112

preserving entities with lxml

2022-01-12 Thread Robin Becker

I have a puzzle over how lxml & entities should be 'preserved' code below illustrates. To preserve I change & --> & 
in the source and add resolve_entities=False to the parser definition. The escaping means we only have one kind of 
entity & which means lxml will preserve it. For whatever reason lxml won't preserve character entities eg !.


The simple parse from string and conversion tostring shows that the parsing at 
least took notice of it.

However, I want to create a tuple tree so have to use tree.text, 
tree.getchildren() and tree.tail for access.

When I use those I expected to have to undo the escaping to get back the original entities, but it seems they are 
already done.


Good for me, but if the tree knows how it was created (tostring shows that) why 
is it ignored with attribute access?

if __name__=='__main__':
from lxml import etree as ET
#initial xml
xml = b'a &mysym; < & > 
! A'
#escaped xml
xxml = xml.replace(b'&',b'&')

myparser = ET.XMLParser(resolve_entities=False)
tree = ET.fromstring(xxml,parser=myparser)

#use tostring
print(f'using tostring\n{xxml=!r}\n{ET.tostring(tree)=!r}\n')

#now access the items using text & children & text
print(f'using 
attributes\n{tree.text=!r}\n{tree.getchildren()=!r}\n{tree.tail=!r}')

when run I see this

$ python tmp/tlp.py
using tostring
xxml=b'a &mysym; < & > 
! A'
ET.tostring(tree)=b'a &mysym; < & 
> ! A'


using attributes
tree.text='a &mysym; < & > ! A'
tree.getchildren()=[]
tree.tail=None
--
Robin Becker
--
https://mail.python.org/mailman/listinfo/python-list

Re: preserving entities with lxml

2022-01-12 Thread Dieter Maurer

Robin Becker wrote at 2022-1-12 10:22 +:
>I have a puzzle over how lxml & entities should be 'preserved' code below 
>illustrates. To preserve I change & --> &
>in the source and add resolve_entities=False to the parser definition. The 
>escaping means we only have one kind of
>entity & which means lxml will preserve it. For whatever reason lxml won't 
>preserve character entities eg !.
>
>The simple parse from string and conversion tostring shows that the parsing at 
>least took notice of it.
>
>However, I want to create a tuple tree so have to use tree.text, 
>tree.getchildren() and tree.tail for access.
>
>When I use those I expected to have to undo the escaping to get back the 
>original entities, but it seems they are
>already done.
>
>Good for me, but if the tree knows how it was created (tostring shows that) 
>why is it ignored with attribute access?
>
>if __name__=='__main__':
> from lxml import etree as ET
> #initial xml
> xml = b'a &mysym; < & 
> > ! A'
> #escaped xml
> xxml = xml.replace(b'&',b'&')
>
> myparser = ET.XMLParser(resolve_entities=False)
> tree = ET.fromstring(xxml,parser=myparser)
>
> #use tostring
> print(f'using tostring\n{xxml=!r}\n{ET.tostring(tree)=!r}\n')
>
> #now access the items using text & children & text
> print(f'using 
> attributes\n{tree.text=!r}\n{tree.getchildren()=!r}\n{tree.tail=!r}')
>
>when run I see this
>
>$ python tmp/tlp.py
>using tostring
>xxml=b'a 
>&mysym; < & >
>! A'
>ET.tostring(tree)=b'a &mysym; < &
>> ! A'
>
>using attributes
>tree.text='a &mysym; < & > ! A'
>tree.getchildren()=[]
>tree.tail=None

Apparently, the `resolve_entities=False` was not effective: otherwise,
your tree content should have more structure (especially some
entity reference children).

`&#` is not an entity reference but a character reference.
It may rightfully be treated differently from entity references.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ast.parse, ast.dump, but with comment preservation?

2022-01-12 Thread samue...@gmail.com

On Thursday, December 16, 2021 at 5:56:51 AM UTC-5, lucas wrote:
> Hi ! 
> 
> Maybe RedBaron may help you ? 
> 
> https://github.com/PyCQA/redbaron 
> 
> IIRC, it aims to conserve the exact same representation of the source 
> code, including comments and empty lines. 
> 
> --lucas
> On 16/12/2021 04:37, samue...@gmail.com wrote: 
> > I wrote a little open-source tool to expose internal constructs in OpenAPI. 
> > Along the way, I added related functionality to: 
> > - Generate/update a function prototype to/from a class 
> > - JSON schema 
> > - Automatically add type annotations to all function arguments, class 
> > attributes, declarations, and assignments 
> > 
> > alongside a bunch of other features. All implemented using just the builtin 
> > modules (plus astor on Python < 3.9; and optionally black). 
> > 
> > Now I'm almost at the point where I can run it—without issue—against, e.g., 
> > the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s 
> > because the comments aren't preserved (and there are some whitespace 
> > issues… but I should be able to resolve the latter). 
> > 
> > Is the only viable solution available to rewrite around redbaron | libcst? 
> > - I don't need to parse the comments just dump them out unedited whence 
> > they're found… 
> > 
> > Thanks for any suggestions 
> > 
> > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense 
> > with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])

Ended up writing my own CST and added it to that library of mine (link above).

My target is adding/removing/changing of: docstrings, function return types, 
function arguments, and Assign/AnnAssign. All but the last are now implemented.

I was careful not to replace code elsewhere in my codebase, so everything 
except my new CST code (in its own files) stays, and everything else works 
exclusively with the builtin `ast` module as before.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to write or search on github to get the code for what is written below:

2022-01-12 Thread Dennis Lee Bieber


*** Apologies for the repost. Since Gmane made the list a read-only
group, I finally broke down and reinstated Giganews comp.lang.python.
Unfortunately I'd missed that this came back with X-NoArchive active and
Google doesn't even let such messages show up for a day -- so the OP hasn't
seen any of my responses.

As a courtesy, I will NOT be reposting the other four responses I've
made over the last few days. {If I do, it will be as a single consolidated
response} ***

On Mon, 10 Jan 2022 22:31:00 -0800 (PST), NArshad 
declaimed the following:

>-“How are the relevant cells identified in the spreadsheet?”
>The column headings are:
>BOOK_NAME
>BOOK_AUTHOR
>BOOK_ISBN
>TOTAL_COPIES
>COPIES_LEFT
>BORROWER’S_NAME
>ISSUE_DATE
>RETURN_DATE
>

So... Besides "BORROWER'S_NAME" you also have a pair of dates you have
to track in parallel, and which should also need to be updated whenever you
change the borrower field. Furthermore, if you plan to separate those with
commas, you'll need to escape any embedded commas or you'll find that names
like "John Doe, Jr" will mess up the correspondence as you'd treat that as
two names on reading the borrower field. Also you need to be aware of the
limits for Excel text cells -- while you could stuff 32kB of text into a
cell, Excel itself will only display the first 1024 characters. That might
be sufficient if the average name is around 31 characters (32 with your
comma separator) as it would allow 32 names to be entered and still display
in Excel itself. Oh, and to track multiple dates in a cell, you'll have to
convert from date to text when writing the cell, and from text back to date
when reading the cell -- since you can't comma separate multiple dates.

Total_Copies - Copies_Left should be equal to the number of names (and
dates). In short, this is a very messy structure to be maintaining. If not
using an RDBM, at the very least borrower/issue date/return date should be
moved to a separate sheet which also has "Book ID" (the row number in the
first sheet with the book). That way you'd have one record per borrower,
and can easily add new records at the bottom of the sheet (might need to
use a "Book ID" of "0" to indicate a deleted record (when a borrower
returns the book) so you can reuse the slot, since you'd need some way to
identify the end of the data -- most likely by a blank record..

>-“If that's what you have in your spreadsheet, then read the cells on the 
>first row for the column labels and put them in a dict to map from column 
>label to column number.”
>
>This written above I do not understand how to code. 

Have you gone through the Python Tutorial? Dictionaries are one of
Python's basic data structures. https://docs.python.org/3/tutorial/

You are unlikely to find anything near to your application on-line --
pretty much anyone doing something like a library check-out system will be
using a relational database rather than spread sheets. At worst, they may
have a spread sheet import operation to do initial population of the
database, though even that might be using SQL operations (Windows supports
Excel files as an ODBC data source). See:
https://docs.microsoft.com/en-us/cpp/data/odbc/data-source-managing-connections-odbc?view=msvc-170
They are unlikely to be dong any exports to Excel -- that's the realm of
report logic. According to
https://support.sas.com/documentation/onlinedoc/dfdmstudio/2.5/dmpdmsug/Content/dfDMStd_T_Excel_ODBC.html
"""
Note: You cannot use a DSN to write output to Excel format. You can,
however, use a Text File Output node in a data job to write output in CSV
format. You can then import the CSV file into Excel.
"""
A Java-biased (old Java -- the interface to ODBC has been removed from
current Java) example that doesn't seem to need "named ranges" is
https://querysurge.zendesk.com/hc/en-us/articles/205766136-Writing-SQL-Queries-against-Excel-files-using-ODBC-connection-Deprecated-Excel-SQL-

Or...
https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/getting-data-between-excel-and-sql-server-using-odbc/
(which also indicates that it is possible to update the file via ODBC...
But note the constraints regarding having 64-bit vs 32-bit drivers).
Obviously you'll need to translate the PowerShell syntax into Python's ODBC
DB-API interface (which is a bit archaic as I recall -- does not match
current DP-API specifications).



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to write or search on github to get the code for what is written below:

2022-01-12 Thread NArshad

-“How are the relevant cells identified in the spreadsheet?”
The column headings are:
BOOK_NAME
BOOK_AUTHOR
BOOK_ISBN
TOTAL_COPIES
COPIES_LEFT
BORROWER’S_NAME
ISSUE_DATE
RETURN_DATE


-“It's often the case that the cells on the first row contain text as column 
labels.” 

These I have written above.


-“If that's what you have in your spreadsheet, then read the cells on the first 
row for the column labels and put them in a dict to map from column label to 
column number.”

This written above I do not understand how to code. 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: ast.parse, ast.dump, but with comment preservation?

2022-01-12 Thread samue...@gmail.com

> > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense 
> > with CC0… anyway too early for others to use; wait for the 0.1.0 release ;])
Ended up writing my own CST and added it to that library of mine (link above).

My target is adding/removing/changing of: docstrings, function return types, 
function arguments, and Assign/AnnAssign. All but the last are now implemented.

I was careful not to replace code elsewhere in my codebase, so everything 
except my new CST code (in its own files) stays, and everything else works 
exclusively with the builtin `ast` module as before.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to write or search on github to get the code for what is written below:

2022-01-12 Thread Dennis Lee Bieber


*** Going back to the post in the thread as I've other concerns (and
have turned off the old X-NoArchive setting ***

On Thu, 6 Jan 2022 10:55:30 -0800 (PST), NArshad 
declaimed the following:

>All this is going to be in python’s flask and HTML only
>
>1. First, I have to check in the Excel sheet or table whether the book user 
>has entered is present in the book bank or not.
>
>2. If a book is present and the quantity of the required book is greater than 
>0 (COPIES_LEFT column in excel file) and if the user wants the book, it will 
>be assigned to the user which he will take from the book bank physically. When 
>COPIES_LEFT will is less than or equal to 0 the message will be “Book finished 
>or not present”.
>
>3. The quantity of the book in the Excel file will be reduced by 1 in the 
>COPIES_LEFT column and the name of the borrower or user will be entered/added 
>in the Excel file table or sheet already made and the column name is 
>BORROWER’S NAME.
>
>4. The borrower’s or user name can be more than one so they will be separated 
>with a comma in the Excel file BORROWER’S NAME column.
>
>
>- All functions mentioned above are to be deployed on the website 
>pythonhow.com so make according to 
>https://pythonhow.com/python-tutorial/flask/web-development-with-python-and-flask/
>
>- Do you know any other websites to deploy a python web application??

There are likely plenty -- How much do you want to pay? How much
support do you need? How much traffic do you expect.

Note that the PythonHOW tutorial is suggesting creating a student/hobby
account on Heroku (free, and fairly limited). Heroku provides Linux
containers, and for Python you can only make use of add-ons that can be
installed using PIP. As Linux, none of the Windows specific modules will be
available (no Excel ODBC, no use of pythonwin extensions calling directly
into the Excel DLLs).

Who is going to be using the Excel file? and how are they going to get
to it? Your Heroku container does not run Excel, and I'm not even sure how
you would get it to the Heroku container (possibly it can be done as part
of the Python application upload). 

I don't even know if SQLite3 is viable -- as I recall, Linux Python
installs rely upon system installed SQLite3 libraries, not installed via
PIP. Heroku pushes PostgreSQL for data storage. It may cost to add others
(MariaDB/MySQL).


https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-python

>
>- No time to switch from Excel to anywhere else. Please do not make any 
>changes to the Excel file.
>

Again, if you are deploying to something like Heroku for the
application -- the Excel file will have to be deployed also, and no one
except your application will be able to see it there. Under this situation,
there is no reason/excuse to keep the data in the very inefficient format
you've defined in the most recent message. Import into some supported
database and normalize the data to make updates easier.


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to write or search on github to get the code for what is written below:

2022-01-12 Thread MRAB


On 2022-01-11 06:31, NArshad wrote:

-“How are the relevant cells identified in the spreadsheet?”
The column headings are:
BOOK_NAME
BOOK_AUTHOR
BOOK_ISBN
TOTAL_COPIES
COPIES_LEFT
BORROWER’S_NAME
ISSUE_DATE
RETURN_DATE


-“It's often the case that the cells on the first row contain text as column 
labels.”

These I have written above.


-“If that's what you have in your spreadsheet, then read the cells on the first 
row for the column labels and put them in a dict to map from column label to 
column number.”

This written above I do not understand how to code.


Well, you know how to read the contents of a cell, and how to put items 
into a dict (the key will be the cell contents and the value will be the 
column number).


The column numbers will go from 1 to sheet.last_column, although some of 
them might be empty (their value will be None), which you can remove 
from the dict afterwards.

--
https://mail.python.org/mailman/listinfo/python-list

preserving entities with lxml

Re: preserving entities with lxml

Re: ast.parse, ast.dump, but with comment preservation?

Re: What to write or search on github to get the code for what is written below:

Re: What to write or search on github to get the code for what is written below:

Re: ast.parse, ast.dump, but with comment preservation?

Re: What to write or search on github to get the code for what is written below:

Re: What to write or search on github to get the code for what is written below:

8 matches

Site Navigation

Mail list logo

Footer information