回复: Problem using cx_Freeze > auto-py-to-exe

2022-08-19 Thread Daniel Lee
Thank you so much, I really appreciate it.

发件人: Chris Angelico
发送时间: 2022年8月19日 8:39
收件人: python-list@python.org
主题: Re: Problem using cx_Freeze > auto-py-to-exe

On Fri, 19 Aug 2022 at 10:07, Grant Edwards  wrote:
>
> On 2022-08-18, Chris Angelico  wrote:
> > On Fri, 19 Aug 2022 at 05:05, Grant Edwards  
> > wrote:
> >> On 2022-08-18, Chris Angelico  wrote:
> >>
> >> > It's one of the frustrations with JSON, since that format doesn't
> >> > allow the trailing comma :)
> >>
> >> Yep, that's a constant, low-level pain for all the C code I deal with
> >> which generates JSON. You'd think after 10+ years of maintaining code
> >> that outputs JSON, I wouldn't trip over that any longer...
> >
> > With some JSON files, I just cheat and define a shim at the end of arrays...
> >
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.githubusercontent.com%2FRosuav%2FMustardMine%2Fmaster%2Ftemplate.json&data=05%7C01%7C%7Cc74e60f1d42f4904e66c08da817b37fa%7C84df9e7fe9f640afb435%7C1%7C0%7C637964663486534639%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=W5LULRCXXGHodvem7CdscC9KZDrU7ONXbCAlIuBZ8Xg%3D&reserved=0
>
> That's OK if it's strictly internal. Almost all of the JSON data I
> work with is part of published APIs ― many of which are defined by
> industry consortiums or corporate-wide "standards".
>

That's an export/import format that I defined, so I mandated (a) that
there's an empty-string key as a signature (on import, it can be
anywhere, but on export, it's that final shim), and (b) all arrays are
allowed to have an empty string at the end, which is ignored on
import. Saves so much trouble.

That particular export format is actually designed as a git-managed
config file as well, which is why the line breaks are done the way
they are (anything on a single line is intended to be added/removed as
a single unit), which is why I definitely don't want the "add a comma
to the previous line" deltas.

"Strictly internal" is a subset of "protocols/standards that you are
in control of". :)

ChrisA
--
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fpython-list&data=05%7C01%7C%7Cc74e60f1d42f4904e66c08da817b37fa%7C84df9e7fe9f640afb435%7C1%7C0%7C637964663486554552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5nlih9a9slRg%2FWdtsS4IbskQKIKUTfj%2BqefVsWa0C4Q%3D&reserved=0

-- 
https://mail.python.org/mailman/listinfo/python-list


回复: setup.py + cython == chicken and the egg problem

2022-08-19 Thread Daniel Lee
Thank you!

从 Windows 版邮件发送

发件人: Dan Stromberg
发送时间: 2022年8月19日 8:35
收件人: Python List
主题: Re: setup.py + cython == chicken and the egg problem

On Tue, Aug 16, 2022 at 2:03 PM Dan Stromberg  wrote:

> Hi folks.
>
> I'm attempting to package up a python package that uses Cython.
>
> Rather than build binaries for everything under the sun, I've been
> focusing on including the .pyx file and running cython on it at install
> time.  This requires a C compiler, but I'm OK with that.
>
> BTW, the pure python version works fine, and the cython version works too
> as long as you preinstall cython - but I don't want users to have to know
> that :)
>

For the actual chicken-and-egg problem, I'd needed to include my
pyproject.toml in my MANIFEST.in, like:
include pyx_treap.pyx pyx_treap.c pyproject.toml
--
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fpython-list&data=05%7C01%7C%7C128ce386df164c9d8a3708da817ab65c%7C84df9e7fe9f640afb435%7C1%7C0%7C637964661300313663%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FMteu3dpspfzD4WvRWjnEbeNSr9P0aXrrv467ueM86k%3D&reserved=0

-- 
https://mail.python.org/mailman/listinfo/python-list


回复: Problem using cx_Freeze

2022-08-19 Thread Daniel Lee
Thank you!
发件人: subin
发送时间: 2022年8月19日 8:02
收件人: python-list@python.org
主题: Re: Problem using cx_Freeze

Hope you had a good time.

On Wed, Aug 17, 2022 at 10:19 PM Peter J. Holzer  wrote:

> On 2022-08-17 12:09:14 -0600, David at Booomer wrote:
> > Executable(
> >
>  "prjui.py","Maiui.py","about.py","dict.py","geometry.py","getEquation.py",
> >
>  "gtrail.py","main.py","matchingstring.py","producelatex.py","readfile.py",
> > "separete.py","speak.py",
> > )
> [...]
> > I am/was worried about the trailing ‘,' after ',"speak.py”,’ <- but
> > deleting it or moving it after the ] didn’t help.
>
> This has nothing to do with your problem but:
>
> Python allows a trailing comma in any comma-separated list of values. It
> will just be ignored.
>
> This is really common in modern programming languages (read: programming
> languages younger than 30 years or so), because it makes it much more
> convenient to extend/shorten/reorder a list. Otherwise you alway have to
> remember add or remove a comma in the right place. (Some people
> (especially SQL programmers for some reason) resorted to put the comma
> at the start of each line to get around this, which is really ugly.)
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | 
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.hjp.at%2F&data=05%7C01%7C%7Cf1c5395d5c384d4337a308da81760b87%7C84df9e7fe9f640afb435%7C1%7C0%7C637964641252805670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2F8KY%2FjYyHLyPx3VNEXGW0yBOwwoNSOgjbqt6Z46AKHQ%3D&reserved=0
>  |   challenge!"
> --
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fpython-list&data=05%7C01%7C%7Cf1c5395d5c384d4337a308da81760b87%7C84df9e7fe9f640afb435%7C1%7C0%7C637964641252805670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SQPv4cUay6nBpvjcx7m9FGSfhZXBrwyQ72IdwWCpyCE%3D&reserved=0
>
--
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fpython-list&data=05%7C01%7C%7Cf1c5395d5c384d4337a308da81760b87%7C84df9e7fe9f640afb435%7C1%7C0%7C637964641252805670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SQPv4cUay6nBpvjcx7m9FGSfhZXBrwyQ72IdwWCpyCE%3D&reserved=0

-- 
https://mail.python.org/mailman/listinfo/python-list


回复: UTF-8 and latin1

2022-08-19 Thread Daniel Lee
Thanks!

发件人: Stefan Ram
发送时间: 2022年8月19日 6:23
收件人: python-list@python.org
主题: Re: UTF-8 and latin1

Tobiah  writes:
>  When a person enters
>Montréal, Quebéc into a form field, what are they
>doing on the keyboard to make that happen?

  Depends on the OS and its configuration. Some devices might
  not even have a keyboard as hardware.

>As the
>string sits there in the text box, is it latin1, or utf-8
>or something else?

  This is an internal implementation detail of the browser.

>How does the browser know what
>sort of data it has in that text box?

  This is an internal implementation details of the browser.

  You usually do not need to know these internal information
  about the browser in order to use it.


--
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.python.org%2Fmailman%2Flistinfo%2Fpython-list&data=05%7C01%7C%7C242e3a7de5ba4183621b08da81684702%7C84df9e7fe9f640afb435%7C1%7C0%7C637964582138805523%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HSG21e6Aj5pyf7m8e290Rv7tsMMfCGZptEU32iMbo1I%3D&reserved=0

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem using cx_Freeze > auto-py-to-exe

2022-08-19 Thread Dennis Lee Bieber
On Thu, 18 Aug 2022 12:17:25 -0600, David at Booomer 
declaimed the following:

>
>I did count but hadn’t noticed this argument list before you mentioned it. 
>However, I still don’t see any of these argument names in the Executable list 
>or anywhere else.
>

It's your responsibility to provide them when you called Executable().
As I said, you are (were?) providing a whole bunch of .py files, which were
being mapped to these arguments.


>"""
>argument name  description
>
>#1
>script the name of the file containing 
>the script
>which is to be frozen
>
prjui.py

>#2
>init_scriptthe name of the initialization 
>script that will
>be executed before the actual script is executed; this script is used to
>set up the environment for the executable; if a name is given without an
>absolute path the names of files in the initscripts subdirectory of the
>cx_Freeze package is searched
>
Maiui.py

>#3
>base   the name of the base 
>executable; if a name is
>given without an absolute path the names of files in the bases subdirectory
>of the cx_Freeze package is searched
>
about.py

>#4
>target_namethe name of the target executable; the 
>default
>value is the name of the script; the extension is optional (automatically
>added on Windows); support for names with version; if specified a pathname,
>raise an error.
>
dict.py

>#5
>icon   name of icon which should be 
>included in the
>executable itself on Windows or placed in the target directory for other
>platforms (ignored in Microsoft Store Python app)
>
geometry.py

>#6
>manifest   name of manifest which should 
>be included in
>the executable itself (Windows only - ignored by Python app from Microsoft
>Store)
>
getEquation.py

>#7
>uac-admin  creates a manifest for an 
>application that will
>request elevation (Windows only - ignored by Python app from Microsoft
>Store)
>
gtrail.py

>#8
>shortcut_name  the name to give a shortcut for the 
>executable
>when included in an MSI package (Windows only).
>
main.py

>#9
>shortcut_dir   the directory in which to place 
>the
>shortcut when being installed by an MSI package; see the MSI Shortcut table
>documentation for more information on what values can be placed here
>(Windows only).

matchingstring.py

>#10
>copyright  the copyright value to include 
>in the version
>resource associated with executable (Windows only).
>
producelatex.py

>#11
>trademarks the trademarks value to include 
>in the version
>resource associated with the executable (Windows only).

readfile.py

and
separete.py
speak.py
are not mapped to anything, hence the too-many arguments error.
>"""

As you can see, a lot of those don't even fit with the data type of the
argument.
>
>I tried passing just main.py or one of the others that might be a starting 
>point but just got ’NoneType has no len()

What did the traceback say? Just reporting the last line message is
meaningless.

>Then I searched for ‘python executable’ and found auto-py-to-exe and 
>pyinstaller which I must/might explore later. First tries ran into PyQt4 to 
>PyQt5 conversions. Good start at 
>https://towardsdatascience.com/how-to-easily-convert-a-python-script-to-an-executable-file-exe-4966e253c7e9
>

Note that pretty much all such python->executable scheme is just making
an archive of the required Python source files, and packaging the core of
the Python interpreter is such a way that running this archive is simply
extracting the source files and running the packaged Python interpreter
with them.


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-19 Thread Dennis Lee Bieber
On Thu, 18 Aug 2022 11:33:59 -0700, Tobiah  declaimed the
following:

>
>So how does this break down?  When a person enters
>Montréal, Quebéc into a form field, what are they
>doing on the keyboard to make that happen?  As the
>string sits there in the text box, is it latin1, or utf-8
>or something else?  How does the browser know what
>sort of data it has in that text box?
>

If this were my ancient Amiga -- most of the accented characters in
ISO-Latin-1 were entered by using one of the meta/alt keys simultaneously
with one of five or six designated "dead keys" (in days of typewriters, a
dead key was one that did not advance the carriage to the next character
space). The dead key indicated which accent mark was to be applied to the
subsequent "regular" character.

On Windows, many of the characters might be entered using 
(where  are keys on the numeric pad!)  (such as 1254 => µ).

As for what the browser receives? Unless the browser is asking for raw
key codes and translating them internally to some encoding, it is likely
receiving characters in whatever encoding has been defined for the
computer/OS (Windows, most likely CP1252, which is a superset of latin-1 as
I recall). Whether the browser then re-encodes that to UTF-8 is something I
can't answer.



-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico
What's the best way to precisely reconstruct an HTML file after
parsing it with BeautifulSoup?

Using the Alice example from the BS4 docs:

>>> html_doc = """The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and
their names were
http://example.com/elsie"; class="sister" id="link1">Elsie,
http://example.com/lacie"; class="sister" id="link2">Lacie and
http://example.com/tillie"; class="sister" id="link3">Tillie;
and they lived at the bottom of a well.

...
"""
>>> print(soup)
The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and
their names were
http://example.com/elsie"; id="link1">Elsie,
http://example.com/lacie"; id="link2">Lacie and
http://example.com/tillie"; id="link3">Tillie;
and they lived at the bottom of a well.
...

>>>

Note two distinct changes: firstly, whitespace has been removed, and
secondly, attributes are reordered (I think alphabetically). There are
other canonicalizations being done, too.

I'm trying to make some automated changes to a huge number of HTML
files, with minimal diffs so they're easy to validate. That means that
spurious changes like these are very much unwanted. Is there a way to
get BS4 to reconstruct the original precisely?

The mutation itself would be things like finding an anchor tag and
changing its href attribute. Fairly simple changes, but might alter
the length of the file (eg changing "http://example.com/"; into
"https://example.com/";). I'd like to do them intelligently rather than
falling back on element.sourceline and element.sourcepos, but worst
case, that's what I'll have to do (which would be fiddly).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Barry


> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> 
> What's the best way to precisely reconstruct an HTML file after
> parsing it with BeautifulSoup?

I recall that in bs4 it parses into an object tree and loses the detail of the 
input.
I recently ported from very old bs to bs4 and hit the same issue.
So no it will not output the same as went in.

If you can trust the input to be parsed as xml, meaning all the rules of closing
tags have been followed. Then I think you can parse and unparse thru xml to
do what you want.

Barry


> 
> Using the Alice example from the BS4 docs:
> 
 html_doc = """The Dormouse's story
> 
> The Dormouse's story
> 
> Once upon a time there were three little sisters; and
> their names were
> http://example.com/elsie"; class="sister" id="link1">Elsie,
> http://example.com/lacie"; class="sister" id="link2">Lacie and
> http://example.com/tillie"; class="sister" id="link3">Tillie;
> and they lived at the bottom of a well.
> 
> ...
> """
 print(soup)
> The Dormouse's story
> 
> The Dormouse's story
> Once upon a time there were three little sisters; and
> their names were
> http://example.com/elsie"; id="link1">Elsie,
> http://example.com/lacie"; id="link2">Lacie and
> http://example.com/tillie"; id="link3">Tillie;
> and they lived at the bottom of a well.
> ...
> 
 
> 
> Note two distinct changes: firstly, whitespace has been removed, and
> secondly, attributes are reordered (I think alphabetically). There are
> other canonicalizations being done, too.
> 
> I'm trying to make some automated changes to a huge number of HTML
> files, with minimal diffs so they're easy to validate. That means that
> spurious changes like these are very much unwanted. Is there a way to
> get BS4 to reconstruct the original precisely?
> 
> The mutation itself would be things like finding an anchor tag and
> changing its href attribute. Fairly simple changes, but might alter
> the length of the file (eg changing "http://example.com/"; into
> "https://example.com/";). I'd like to do them intelligently rather than
> falling back on element.sourceline and element.sourcepos, but worst
> case, that's what I'll have to do (which would be fiddly).
> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread 2QdxY4RzWzUUiLuE
On 2022-08-19 at 20:12:35 +0100,
Barry  wrote:

> > On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> > 
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
> 
> I recall that in bs4 it parses into an object tree and loses the
> detail of the input.  I recently ported from very old bs to bs4 and
> hit the same issue.  So no it will not output the same as went in.
> 
> If you can trust the input to be parsed as xml, meaning all the rules
> of closing tags have been followed. Then I think you can parse and
> unparse thru xml to do what you want.

XML is in the same boat.  Except for "canonical form" (which underlies
cryptographically signed XML documents) the standards explicitly don't
require tools to round-trip the "source code."  The preferred method of
comparing XML documents is at the structural level rather than with
textual representations.  That way, the following two elements are the
same (and similar with a collection of sub-elements in a different order
in another document):



and



Dan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico
On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
>
>
>
> > On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
>
> I recall that in bs4 it parses into an object tree and loses the detail of 
> the input.
> I recently ported from very old bs to bs4 and hit the same issue.
> So no it will not output the same as went in.
>
> If you can trust the input to be parsed as xml, meaning all the rules of 
> closing
> tags have been followed. Then I think you can parse and unparse thru xml to
> do what you want.
>


Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
well. Thanks for trying, anyhow.

So I'm left with a few options:

1) Give up on validation, give up on verification, and just run this
thing on the production site with my fingers crossed
2) Instead of doing an intelligent reconstruction, just str.replace()
one URL with another within the file
3) Split the file into lines, find the Nth line (elem.sourceline) and
str.replace that line only
4) Attempt to use elem.sourceline and elem.sourcepos to find the start
of the tag, manually find the end, and replace one tag with the
reconstructed form.

I'm inclined to the first option, honestly. The others just seem like
hard work, and I became a programmer so I could be lazy...

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread David
On Sat, 20 Aug 2022 at 04:31, Chris Angelico  wrote:

> What's the best way to precisely reconstruct an HTML file after
> parsing it with BeautifulSoup?

> Note two distinct changes: firstly, whitespace has been removed, and
> secondly, attributes are reordered (I think alphabetically). There are
> other canonicalizations being done, too.

> I'm trying to make some automated changes to a huge number of HTML
> files, with minimal diffs so they're easy to validate. That means that
> spurious changes like these are very much unwanted. Is there a way to
> get BS4 to reconstruct the original precisely?

On Sat, 20 Aug 2022 at 07:02, Chris Angelico  wrote:
> On Sat, 20 Aug 2022 at 05:12, Barry  wrote:

> > I recall that in bs4 it parses into an object tree and loses the detail
> > of the input.  I recently ported from very old bs to bs4 and hit the
> > same issue.  So no it will not output the same as went in.

> So I'm left with a few options:

> 1) Give up on validation, give up on verification, and just run this
>thing on the production site with my fingers crossed

> 2) Instead of doing an intelligent reconstruction, just str.replace() one
>URL with another within the file

> 3) Split the file into lines, find the Nth line (elem.sourceline) and
>str.replace that line only

> 4) Attempt to use elem.sourceline and elem.sourcepos to find the start of
>the tag, manually find the end, and replace one tag with the
>reconstructed form.

> I'm inclined to the first option, honestly. The others just seem like
> hard work, and I became a programmer so I could be lazy...

Hi, I don't know if you will like this option, but I don't see it on the
list yet so ...

I'm assuming that the phrase "with minimal diffs so they're easy to
validate" means being eyeballed by a human.

Have you considered two passes through BS? Do the first pass with no
modification, so that the intermediate result gets the BS default
"spurious" changes.

Then do the second pass with the desired changes, so that the human will
see only the desired changes in the diff.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico
On Sat, 20 Aug 2022 at 10:04, David  wrote:
>
> On Sat, 20 Aug 2022 at 04:31, Chris Angelico  wrote:
>
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
>
> > Note two distinct changes: firstly, whitespace has been removed, and
> > secondly, attributes are reordered (I think alphabetically). There are
> > other canonicalizations being done, too.
>
> > I'm trying to make some automated changes to a huge number of HTML
> > files, with minimal diffs so they're easy to validate. That means that
> > spurious changes like these are very much unwanted. Is there a way to
> > get BS4 to reconstruct the original precisely?
>
> On Sat, 20 Aug 2022 at 07:02, Chris Angelico  wrote:
> > On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
>
> > > I recall that in bs4 it parses into an object tree and loses the detail
> > > of the input.  I recently ported from very old bs to bs4 and hit the
> > > same issue.  So no it will not output the same as went in.
>
> > So I'm left with a few options:
>
> > 1) Give up on validation, give up on verification, and just run this
> >thing on the production site with my fingers crossed
>
> > 2) Instead of doing an intelligent reconstruction, just str.replace() one
> >URL with another within the file
>
> > 3) Split the file into lines, find the Nth line (elem.sourceline) and
> >str.replace that line only
>
> > 4) Attempt to use elem.sourceline and elem.sourcepos to find the start of
> >the tag, manually find the end, and replace one tag with the
> >reconstructed form.
>
> > I'm inclined to the first option, honestly. The others just seem like
> > hard work, and I became a programmer so I could be lazy...
>
> Hi, I don't know if you will like this option, but I don't see it on the
> list yet so ...

Hey, all options are welcomed :)

> I'm assuming that the phrase "with minimal diffs so they're easy to
> validate" means being eyeballed by a human.
>
> Have you considered two passes through BS? Do the first pass with no
> modification, so that the intermediate result gets the BS default
> "spurious" changes.
>
> Then do the second pass with the desired changes, so that the human will
> see only the desired changes in the diff.

I'm 100% confident of the actual changes, so that wouldn't really
solve anything. The problem is that, without eyeballing the actual
changes, I can't easily see if there's been something else changed or
broken. This is a scripted change that will affect probably hundreds
of HTML files across a large web site, so making sure I don't break
anything means either (a) minimize the diff so it's clearly correct,
or (b) eyeball the rendered versions of every page - manually - to see
if there were any unintended changes. (There WILL be intended visual
changes, so I can't render the page to bitmap and ensure that it
hasn't changed. This is not React snapshot testing, which IMO is one
of the most useless testing features ever devised. No, actually, that
can't be true, someone MUST have made a worse one.)

Appreciate the suggestion, though!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread dn
On 20/08/2022 09.01, Chris Angelico wrote:
> On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
>>
>>
>>
>>> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
>>>
>>> What's the best way to precisely reconstruct an HTML file after
>>> parsing it with BeautifulSoup?
>>
>> I recall that in bs4 it parses into an object tree and loses the detail of 
>> the input.
>> I recently ported from very old bs to bs4 and hit the same issue.
>> So no it will not output the same as went in.
>>
>> If you can trust the input to be parsed as xml, meaning all the rules of 
>> closing
>> tags have been followed. Then I think you can parse and unparse thru xml to
>> do what you want.
>>
> 
> 
> Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
> well. Thanks for trying, anyhow.
> 
> So I'm left with a few options:
> 
> 1) Give up on validation, give up on verification, and just run this
> thing on the production site with my fingers crossed
> 2) Instead of doing an intelligent reconstruction, just str.replace()
> one URL with another within the file
> 3) Split the file into lines, find the Nth line (elem.sourceline) and
> str.replace that line only
> 4) Attempt to use elem.sourceline and elem.sourcepos to find the start
> of the tag, manually find the end, and replace one tag with the
> reconstructed form.
> 
> I'm inclined to the first option, honestly. The others just seem like
> hard work, and I became a programmer so I could be lazy...
+1 - but I've noticed that sometimes I have to work quite hard to be
this lazy!


Am assuming that http -> https is not the only 'change' (if it were,
you'd just do that without BS). How many such changes are planned/need
checking? Care to list them?

-- 
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-19 Thread Chris Angelico
On Sat, 20 Aug 2022 at 10:19, dn  wrote:
>
> On 20/08/2022 09.01, Chris Angelico wrote:
> > On Sat, 20 Aug 2022 at 05:12, Barry  wrote:
> >>
> >>
> >>
> >>> On 19 Aug 2022, at 19:33, Chris Angelico  wrote:
> >>>
> >>> What's the best way to precisely reconstruct an HTML file after
> >>> parsing it with BeautifulSoup?
> >>
> >> I recall that in bs4 it parses into an object tree and loses the detail of 
> >> the input.
> >> I recently ported from very old bs to bs4 and hit the same issue.
> >> So no it will not output the same as went in.
> >>
> >> If you can trust the input to be parsed as xml, meaning all the rules of 
> >> closing
> >> tags have been followed. Then I think you can parse and unparse thru xml to
> >> do what you want.
> >>
> >
> >
> > Yeah, no I can't, this is HTML 4 with a ton of inconsistencies. Oh
> > well. Thanks for trying, anyhow.
> >
> > So I'm left with a few options:
> >
> > 1) Give up on validation, give up on verification, and just run this
> > thing on the production site with my fingers crossed
> > 2) Instead of doing an intelligent reconstruction, just str.replace()
> > one URL with another within the file
> > 3) Split the file into lines, find the Nth line (elem.sourceline) and
> > str.replace that line only
> > 4) Attempt to use elem.sourceline and elem.sourcepos to find the start
> > of the tag, manually find the end, and replace one tag with the
> > reconstructed form.
> >
> > I'm inclined to the first option, honestly. The others just seem like
> > hard work, and I became a programmer so I could be lazy...
> +1 - but I've noticed that sometimes I have to work quite hard to be
> this lazy!

Yeah, that's very true...

> Am assuming that http -> https is not the only 'change' (if it were,
> you'd just do that without BS). How many such changes are planned/need
> checking? Care to list them?
>

Assumption is correct. The changes are more of the form "find all the
problems, add to the list of fixes, try to minimize the ones that need
to be done manually". So far, what I have is:

1) A bunch of http -> https, but not all of them - only domains where
I've confirmed that it's valid
2) Some absolute to relative conversions:
https://www.gsarchive.net/whowaswho/index.htm should be referred to as
/whowaswho/index.htm instead
3) A few outdated URLs for which we know the replacement, eg
http://www.cris.com/~oakapple/gasdisc/ to
http://www.gasdisc.oakapplepress.com/ (this one can't go on
HTTPS, which is one reason I can't shortcut that)
4) Some internal broken links where the path is wrong - anything that
resolves to /books/ but can't be found might be better
rewritten as /html/perf_grps/websites/ if the file can be
found there
5) Any external link that yields a permanent redirect should, to save
clientside requests, get replaced by the destination. We have some
Creative Commons badges that have moved to new URLs.

And there'll be other fixes to be done too. So it's a bit complicated,
and no simple solution is really sufficient. At the very very least, I
*need* to properly parse with BS4; the only question is whether I
reconstruct from the parse tree, or go back to the raw file and try to
edit it there.

For the record, I have very long-term plans to migrate parts of the
site to Markdown, which would make a lot of things easier. But for
now, I need to fix the existing problems in the existing HTML files,
without doing gigantic wholesale layout changes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Python scripts in .exe form

2022-08-19 Thread Mona Lee
I'm pretty new to Python, and I had to do some tinkering because I was running 
into issues with trying to download a package from PIP and must've caused some 
issues in my program that I don't know how to fix

1. It started when I was unable to update PIP to the newest version because of 
some "Unknown error" (VS Code error - unable to read file - 
(Unknown(FileSystemError) where I believe some file was not saved in the right 
location? 

2. In my command line on VS code there used to be the prefix that looked 
something like "PS C:\Users\[name]>" but now it is "PS 
C:\Users\[name]\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts>

>From there I redownloaded my VS code but still have the 2) issue.

also, my scripts are now in the .exe form that I cannot access because "it is 
either binary or in a unsupported text encoding" I've tried to extract it back 
into the .py form using pyinstxtractor and decompile-python3 but I can't 
successfully work these.

3. also wanted to mention that some of my old Python programs are missing.
-- 
https://mail.python.org/mailman/listinfo/python-list