preserving entities with lxml
I have a puzzle over how lxml & entities should be 'preserved' code below illustrates. To preserve I change & --> & in the source and add resolve_entities=False to the parser definition. The escaping means we only have one kind of entity & which means lxml will preserve it. For whatever reason lxml won't preserve character entities eg !. The simple parse from string and conversion tostring shows that the parsing at least took notice of it. However, I want to create a tuple tree so have to use tree.text, tree.getchildren() and tree.tail for access. When I use those I expected to have to undo the escaping to get back the original entities, but it seems they are already done. Good for me, but if the tree knows how it was created (tostring shows that) why is it ignored with attribute access? if __name__=='__main__': from lxml import etree as ET #initial xml xml = b'a &mysym; < & > ! A' #escaped xml xxml = xml.replace(b'&',b'&') myparser = ET.XMLParser(resolve_entities=False) tree = ET.fromstring(xxml,parser=myparser) #use tostring print(f'using tostring\n{xxml=!r}\n{ET.tostring(tree)=!r}\n') #now access the items using text & children & text print(f'using attributes\n{tree.text=!r}\n{tree.getchildren()=!r}\n{tree.tail=!r}') when run I see this $ python tmp/tlp.py using tostring xxml=b'a &mysym; < & > ! A' ET.tostring(tree)=b'a &mysym; < & > ! A' using attributes tree.text='a &mysym; < & > ! A' tree.getchildren()=[] tree.tail=None -- Robin Becker -- https://mail.python.org/mailman/listinfo/python-list
Re: preserving entities with lxml
Robin Becker wrote at 2022-1-12 10:22 +: >I have a puzzle over how lxml & entities should be 'preserved' code below >illustrates. To preserve I change & --> & >in the source and add resolve_entities=False to the parser definition. The >escaping means we only have one kind of >entity & which means lxml will preserve it. For whatever reason lxml won't >preserve character entities eg !. > >The simple parse from string and conversion tostring shows that the parsing at >least took notice of it. > >However, I want to create a tuple tree so have to use tree.text, >tree.getchildren() and tree.tail for access. > >When I use those I expected to have to undo the escaping to get back the >original entities, but it seems they are >already done. > >Good for me, but if the tree knows how it was created (tostring shows that) >why is it ignored with attribute access? > >if __name__=='__main__': > from lxml import etree as ET > #initial xml > xml = b'a &mysym; < & > > ! A' > #escaped xml > xxml = xml.replace(b'&',b'&') > > myparser = ET.XMLParser(resolve_entities=False) > tree = ET.fromstring(xxml,parser=myparser) > > #use tostring > print(f'using tostring\n{xxml=!r}\n{ET.tostring(tree)=!r}\n') > > #now access the items using text & children & text > print(f'using > attributes\n{tree.text=!r}\n{tree.getchildren()=!r}\n{tree.tail=!r}') > >when run I see this > >$ python tmp/tlp.py >using tostring >xxml=b'a >&mysym; < & > >! A' >ET.tostring(tree)=b'a &mysym; < & >> ! A' > >using attributes >tree.text='a &mysym; < & > ! A' >tree.getchildren()=[] >tree.tail=None Apparently, the `resolve_entities=False` was not effective: otherwise, your tree content should have more structure (especially some entity reference children). `` is not an entity reference but a character reference. It may rightfully be treated differently from entity references. -- https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation?
On Thursday, December 16, 2021 at 5:56:51 AM UTC-5, lucas wrote: > Hi ! > > Maybe RedBaron may help you ? > > https://github.com/PyCQA/redbaron > > IIRC, it aims to conserve the exact same representation of the source > code, including comments and empty lines. > > --lucas > On 16/12/2021 04:37, samue...@gmail.com wrote: > > I wrote a little open-source tool to expose internal constructs in OpenAPI. > > Along the way, I added related functionality to: > > - Generate/update a function prototype to/from a class > > - JSON schema > > - Automatically add type annotations to all function arguments, class > > attributes, declarations, and assignments > > > > alongside a bunch of other features. All implemented using just the builtin > > modules (plus astor on Python < 3.9; and optionally black). > > > > Now I'm almost at the point where I can run it—without issue—against, e.g., > > the entire TensorFlow codebase. Unfortunately this is causing huge `diff`s > > because the comments aren't preserved (and there are some whitespace > > issues… but I should be able to resolve the latter). > > > > Is the only viable solution available to rewrite around redbaron | libcst? > > - I don't need to parse the comments just dump them out unedited whence > > they're found… > > > > Thanks for any suggestions > > > > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense > > with CC0… anyway too early for others to use; wait for the 0.1.0 release ;]) Ended up writing my own CST and added it to that library of mine (link above). My target is adding/removing/changing of: docstrings, function return types, function arguments, and Assign/AnnAssign. All but the last are now implemented. I was careful not to replace code elsewhere in my codebase, so everything except my new CST code (in its own files) stays, and everything else works exclusively with the builtin `ast` module as before. -- https://mail.python.org/mailman/listinfo/python-list
Re: What to write or search on github to get the code for what is written below:
*** Apologies for the repost. Since Gmane made the list a read-only group, I finally broke down and reinstated Giganews comp.lang.python. Unfortunately I'd missed that this came back with X-NoArchive active and Google doesn't even let such messages show up for a day -- so the OP hasn't seen any of my responses. As a courtesy, I will NOT be reposting the other four responses I've made over the last few days. {If I do, it will be as a single consolidated response} *** On Mon, 10 Jan 2022 22:31:00 -0800 (PST), NArshad declaimed the following: >-“How are the relevant cells identified in the spreadsheet?” >The column headings are: >BOOK_NAME >BOOK_AUTHOR >BOOK_ISBN >TOTAL_COPIES >COPIES_LEFT >BORROWER’S_NAME >ISSUE_DATE >RETURN_DATE > So... Besides "BORROWER'S_NAME" you also have a pair of dates you have to track in parallel, and which should also need to be updated whenever you change the borrower field. Furthermore, if you plan to separate those with commas, you'll need to escape any embedded commas or you'll find that names like "John Doe, Jr" will mess up the correspondence as you'd treat that as two names on reading the borrower field. Also you need to be aware of the limits for Excel text cells -- while you could stuff 32kB of text into a cell, Excel itself will only display the first 1024 characters. That might be sufficient if the average name is around 31 characters (32 with your comma separator) as it would allow 32 names to be entered and still display in Excel itself. Oh, and to track multiple dates in a cell, you'll have to convert from date to text when writing the cell, and from text back to date when reading the cell -- since you can't comma separate multiple dates. Total_Copies - Copies_Left should be equal to the number of names (and dates). In short, this is a very messy structure to be maintaining. If not using an RDBM, at the very least borrower/issue date/return date should be moved to a separate sheet which also has "Book ID" (the row number in the first sheet with the book). That way you'd have one record per borrower, and can easily add new records at the bottom of the sheet (might need to use a "Book ID" of "0" to indicate a deleted record (when a borrower returns the book) so you can reuse the slot, since you'd need some way to identify the end of the data -- most likely by a blank record.. >-“If that's what you have in your spreadsheet, then read the cells on the >first row for the column labels and put them in a dict to map from column >label to column number.” > >This written above I do not understand how to code. Have you gone through the Python Tutorial? Dictionaries are one of Python's basic data structures. https://docs.python.org/3/tutorial/ You are unlikely to find anything near to your application on-line -- pretty much anyone doing something like a library check-out system will be using a relational database rather than spread sheets. At worst, they may have a spread sheet import operation to do initial population of the database, though even that might be using SQL operations (Windows supports Excel files as an ODBC data source). See: https://docs.microsoft.com/en-us/cpp/data/odbc/data-source-managing-connections-odbc?view=msvc-170 They are unlikely to be dong any exports to Excel -- that's the realm of report logic. According to https://support.sas.com/documentation/onlinedoc/dfdmstudio/2.5/dmpdmsug/Content/dfDMStd_T_Excel_ODBC.html """ Note: You cannot use a DSN to write output to Excel format. You can, however, use a Text File Output node in a data job to write output in CSV format. You can then import the CSV file into Excel. """ A Java-biased (old Java -- the interface to ODBC has been removed from current Java) example that doesn't seem to need "named ranges" is https://querysurge.zendesk.com/hc/en-us/articles/205766136-Writing-SQL-Queries-against-Excel-files-using-ODBC-connection-Deprecated-Excel-SQL- Or... https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/getting-data-between-excel-and-sql-server-using-odbc/ (which also indicates that it is possible to update the file via ODBC... But note the constraints regarding having 64-bit vs 32-bit drivers). Obviously you'll need to translate the PowerShell syntax into Python's ODBC DB-API interface (which is a bit archaic as I recall -- does not match current DP-API specifications). -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: What to write or search on github to get the code for what is written below:
-“How are the relevant cells identified in the spreadsheet?” The column headings are: BOOK_NAME BOOK_AUTHOR BOOK_ISBN TOTAL_COPIES COPIES_LEFT BORROWER’S_NAME ISSUE_DATE RETURN_DATE -“It's often the case that the cells on the first row contain text as column labels.” These I have written above. -“If that's what you have in your spreadsheet, then read the cells on the first row for the column labels and put them in a dict to map from column label to column number.” This written above I do not understand how to code. -- https://mail.python.org/mailman/listinfo/python-list
Re: ast.parse, ast.dump, but with comment preservation?
> > PS: Library is https://github.com/SamuelMarks/cdd-python (might relicense > > with CC0… anyway too early for others to use; wait for the 0.1.0 release ;]) Ended up writing my own CST and added it to that library of mine (link above). My target is adding/removing/changing of: docstrings, function return types, function arguments, and Assign/AnnAssign. All but the last are now implemented. I was careful not to replace code elsewhere in my codebase, so everything except my new CST code (in its own files) stays, and everything else works exclusively with the builtin `ast` module as before. -- https://mail.python.org/mailman/listinfo/python-list
Re: What to write or search on github to get the code for what is written below:
*** Going back to the post in the thread as I've other concerns (and have turned off the old X-NoArchive setting *** On Thu, 6 Jan 2022 10:55:30 -0800 (PST), NArshad declaimed the following: >All this is going to be in python’s flask and HTML only > >1. First, I have to check in the Excel sheet or table whether the book user >has entered is present in the book bank or not. > >2. If a book is present and the quantity of the required book is greater than >0 (COPIES_LEFT column in excel file) and if the user wants the book, it will >be assigned to the user which he will take from the book bank physically. When >COPIES_LEFT will is less than or equal to 0 the message will be “Book finished >or not present”. > >3. The quantity of the book in the Excel file will be reduced by 1 in the >COPIES_LEFT column and the name of the borrower or user will be entered/added >in the Excel file table or sheet already made and the column name is >BORROWER’S NAME. > >4. The borrower’s or user name can be more than one so they will be separated >with a comma in the Excel file BORROWER’S NAME column. > > >- All functions mentioned above are to be deployed on the website >pythonhow.com so make according to >https://pythonhow.com/python-tutorial/flask/web-development-with-python-and-flask/ > >- Do you know any other websites to deploy a python web application?? There are likely plenty -- How much do you want to pay? How much support do you need? How much traffic do you expect. Note that the PythonHOW tutorial is suggesting creating a student/hobby account on Heroku (free, and fairly limited). Heroku provides Linux containers, and for Python you can only make use of add-ons that can be installed using PIP. As Linux, none of the Windows specific modules will be available (no Excel ODBC, no use of pythonwin extensions calling directly into the Excel DLLs). Who is going to be using the Excel file? and how are they going to get to it? Your Heroku container does not run Excel, and I'm not even sure how you would get it to the Heroku container (possibly it can be done as part of the Python application upload). I don't even know if SQLite3 is viable -- as I recall, Linux Python installs rely upon system installed SQLite3 libraries, not installed via PIP. Heroku pushes PostgreSQL for data storage. It may cost to add others (MariaDB/MySQL). https://elements.heroku.com/buildpacks/heroku/heroku-buildpack-python > >- No time to switch from Excel to anywhere else. Please do not make any >changes to the Excel file. > Again, if you are deploying to something like Heroku for the application -- the Excel file will have to be deployed also, and no one except your application will be able to see it there. Under this situation, there is no reason/excuse to keep the data in the very inefficient format you've defined in the most recent message. Import into some supported database and normalize the data to make updates easier. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: What to write or search on github to get the code for what is written below:
On 2022-01-11 06:31, NArshad wrote: -“How are the relevant cells identified in the spreadsheet?” The column headings are: BOOK_NAME BOOK_AUTHOR BOOK_ISBN TOTAL_COPIES COPIES_LEFT BORROWER’S_NAME ISSUE_DATE RETURN_DATE -“It's often the case that the cells on the first row contain text as column labels.” These I have written above. -“If that's what you have in your spreadsheet, then read the cells on the first row for the column labels and put them in a dict to map from column label to column number.” This written above I do not understand how to code. Well, you know how to read the contents of a cell, and how to put items into a dict (the key will be the cell contents and the value will be the column number). The column numbers will go from 1 to sheet.last_column, although some of them might be empty (their value will be None), which you can remove from the dict afterwards. -- https://mail.python.org/mailman/listinfo/python-list