[Python-ideas] Re: Native support for units [was: custom literals]

Brian McCall Sun, 03 Apr 2022 21:25:05 -0700

In the previous thread (Custom C++ literals), ChrisA raised some good 
questions, some of which I can actually answer :D


> Part of the problem here is that Python has to be many many things.
> Which set of units is appropriate? For instance, in a lot of contexts,
> it's fine to simply attach K to the end of something to mean "a
> thousand", while still keeping it unitless; but in other contexts,
> 273K clearly is a unit of temperature. (Although I think the solution
> there is to hard-disallow prefixes without units, as otherwise there'd
> be all manner of collisions.) Is it valid to refer to fifteen
> Angstroms as 15A, or do you have to say 15Å, or 15e-10m and accept
> that it's now a float not an int? Similarly, what if you want to write
> a Python script that works in natural units - the Planck length, mass,
> time, and temperature?

I think if you look into CGPM standards (they're the grand pooh bahs who decide 
what SI units are) then you'd find that a lot of these potential collisions 
have already been encountered and resolved. Under SI, there is no ambiguity 
regarding K. K means Kelvin and only Kelvin, whereas k means 1000. Some units, 
like Å do pose challenges. We often substitute u instead of μ, which works fine 
since there don't seem to be any SI units that start with u. But we can't do 
likewise for Å, since A is already reserved for Amperes. The easy way out is to 
say Å is not SI, so it's out. But I would rather not see this feature limited 
to SI units only (although SI should be preferred). A somewhat gentler approach 
would be let Å be Å. Unicode letters are allowed in Python these days. I use 
theta, mu, lambda - the whole bunch of them, in my code all the time. If 
someone wants to use Å bad enough, let them use the unicode for it, otherwise 
use nm.

Units like Planck's length are valid, and I don't see any reason to exclude 
them. The problem is that CGPM (nor anyone else, as far as I can tell) hasn't 
created an SI unit Planck length and other similar units that are 
lexicographically distinct from other units. And creating one would only be 
worth the trouble if all of the physicists who might use it could immediately 
recognize it. Not that I speak for them, but I'm guessing the folks who run 
SciPy or astropy could be of help in answering these sort of questions, rather 
than trying to get a Python steering committee to work with a possibly more 
bureaucratic organization like CGPM.

Regarding precision, this is not something that so many scientists and 
engineers understand as well as computer scientists and engineers. I'd rather 
see units available for integers as well as floats. I think that as long as a 
unit is defined, it makes sense to allow integer quantities of them. If they 
are to built-in types, as I would prefer, then I suppose unfortunately one 
would not be able to define fractions of these units as new units. But again, 
most of this work is done with floats anyway, so if units were only available 
for floats, I would still see this as a big step forward.

Related to these questions, there is the question of what to do about mixed 
systems? Should 2.54 in / 1 cm evaluate to 2.54 in/cm or should it evaluate to 
1? I'd much rather it evaluate to 1, but if anyone else has a stronger opinion, 
I would not let a dispute over such a thing stand in the way of getting units. 
Regarding 1m / 1mm, though, I have a much stronger opinion. It should be 1000, 
without any units.

There is yet another question related to the interpretation of K as 1000 vs 
Kelvin. As I said, SI is clear that K means Kelvin, but what about Python users 
that are not familiar with SI? What about those in the financial industry? To 
them, K means 1000, and might not even know what Kelvin is. Now, unless adding 
a suffix K to a number is supported later on, a financial person would have to 
go pretty far out of their way, or be looking at the wrong code to be confused 
by something referring to Kelvin. But it would indeed be a mistake to assume 
that everyone who uses Python wants and can live with SI units, or even that 
they would be using the same set of units! Which brings me to the next part of 
ChrisA's reply...

> 
> Purity and practicality are at odds here. Practicality says that you
> should be able to have "miles" as a unit, purity says that the only
> valid units are pure SI fundamentals and everything else is
> transformed into those. Leaving it to libraries would allow different
> Python programs to make different choices.
> 
> But I would very much like to see a measure of language support for
> "number with alphabetic tag", without giving it any semantic meaning
> whatsoever. Python currently has precisely one such tag, and one
> conflicting piece of syntax: "10j" means "complex(imag=10)", and
> "10e1" means "100.0". (They can of course be combined, 10e1j does
> indeed mean 100*sqrt(-1).) This is what could be expanded.
> 

As I mentioned above, I am not a purist. I keep a set of Thorlabs thread 
adapters handy in my lab so that I can screw imperial cage plates onto metric 
posts.

I think I diverge (or perhaps just don't understand) statement on "semantic 
meaning". To me, semantic meaning of the units seems pretty essential. Wherever 
possible, units should be simplified in a prescribed manner. 1W / 1s = 1J, 
10km/1cm = 1000000. The meaning of these suffixes should be explicit, not 
implicit.

Also, see above about precision of unit-aware data types. Floating point only 
would be fine, but I don't see why integers cannot be supported as well.

> C++ does things differently, since it can actually compile things in,
> and declarations earlier in the file can redefine how later parts of
> the file get parsed. In Python, I think it'd make sense to
> syntactically accept *any* suffix, and then have a run-time
> translation table that can have anything registered; if you use a
> suffix that isn't registered, it's a run-time error. Something like
> this:
> 
> import sys
> # sys.register_numeric_suffix("j", lambda n: complex(imag=n))
> sys.register_numeric_suffix("m", lambda n: unit(n, "meter"))
> sys.register_numeric_suffix("mol", lambda n: unit(n, "mole"))
> 
> (For backward compatibility, the "j" suffix probably still has to be
> handled at compilation time, which would mean you can't actually do
> that first one.)
> 
> Using it would look something like this:
> 
> def spread():
>     """Calculate the thickness of avocado when spread on
>     a single slice of bread"""
>     qty = 1.5mol
>     area = 200mm * 200mm
>     return qty / area
> 
> Unfortunately, these would no longer be "literals" in the same way
> that imaginary numbers are, but let's call them "unit displays". To
> evaluate a unit display, you take the literal (1.5) and the unit
> (stored as a string, "mol"), and do a lookup into the core table
> (CPython would probably have an opcode for this, rather than doing it
> with a method that could be overridden, but it would basically be
> "sys.lookup_unit(1.5, 'mol')" or something). Whatever it gives back is
> the object you use.
> 
> Does this seem like a plausible way to go about it?

As far as registering units, I think registering individual units is a bit 
much. Of course, several of these statements could be put inside a module or 
package to make things easier. But I also don't like that it means the syntax 
of the "literals" needs to be allowed during parsing, and left to the 
interpreter to figure out if the unit was registered. I do think it is 
reasonable to require programmers to "opt in" to using SI or other units, and 
possibly even specify which set or sets of units they intend to use. But if 
their constants are ill-formed, then that should still be caught during parsing 
and throw a SyntaxError.

How that would be implemented behind the scenes, I don't know, but from a 
syntax point of view, I am envisioning something like a namespace statement 
with a new keyword (I propose `measure`). Here, I am referring to namespaces 
like `local` and `global`, not something like `argparse.Namespace`. Consider 
the following example as of today:

```
A = 1
global A
A = 2
```

This will generate a syntax error during parsing:
SyntaxError: name 'A' is assigned to before global declaration

Similarly, what I envision is something like this:
```
length = 12m
```
SyntaxError: invalid syntax

```
measure SI
length = 12m
width = 10mm
area = length * width
print(area)
```
... with no SyntaxErrors and a result of "0.12 m2"

After the "measure SI" statement, all literals that are formed with SI units 
are considered valid syntax and are evaluated accordingly. Prior to "measure 
SI", only the unitless primitives are allowed. Clearly this works differently 
than does the `global` or `local` statement, which are modifying a namespace. 
Also, the choice of keyword matters, because making "measure" a keyword would 
probably break a lot of existing code (3to4.py!!!). But it is dead simple, and 
it does behave in a way that is actually quite similar to modifying the 
existing namespace.

This ended up being a much longer reply than I anticipated, but I hope it helps.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/QUD5GG3CBORW5OJ45DVNSACFZQG6SOXN/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Native support for units [was: custom literals]

Reply via email to