[Metamath] Re: Philosophy and goals for set.mm

Norman Megill Wed, 29 Jan 2020 20:08:13 -0800

On Tuesday, January 28, 2020 at 1:53:14 PM UTC-5, Alexander van der Vekens 
wrote:
>
> I want to come back to the first post of this topic, and I want to 
> concentrate on the "definition" parts (2 and 4):
>


Thanks for your comments!
 

>
> On Friday, January 24, 2020 at 5:58:14 AM UTC+1, Norman Megill wrote:
>>
>> ...
>> *2.* A very important thing to me is the philosophy of striving for 
>> simplicity wherever possible.  This particularly impacts what definitions 
>> we choose.
>>
>> *"Rules" for introducing new definitions*
>>
>> Definitions are at the heart of achieving simplicity.  While developing 
>> set.mm, two rules of thumb that evolved in my own work were (1) there 
>> shouldn't be two definitions that mean almost the same thing and (2) 
>> definitions shouldn't be introduced unless there is an actual need in the 
>> work at hand.  Maybe I should add (3), don't formally define something that 
>> doesn't help proofs and can be stated simply in English in a theorem's 
>> comment.
>>
> I fully agree with this in principle, but what does "almost the same", 
> "actual need" and "help proofs" exactly mean? There are major discrepancies 
> in how to interpret these terms.
>

I agree they are vague at this point, partly to initiate discussion, and 
partly because there will always be borderline cases where a subjective 
decision is required.  Also, perhaps I should have called them "guidelines" 
rather than "rules" (they certainly aren't official rules at this point).

"Almost the same" is somewhat subjective, but "B > A" the very clearly the 
same as "A < B" except for argument order.  Similarly, we don't introduce 
backwards epsilon even though we verbally sometimes say "B contains A" to 
mean "A e. B".  We don't try to introduce a new notation for every possible 
phrase that occurs in a comment.  Instead, we explain it in the comment 
when necessary or when the notation is introduced.  See e.g. the comment of 
~wel where we describe 4 different English phrases for "e." but don't 
introduce 4 notations.

"Actual need" - If we want a strict, well-defined rule, we could require 
that the definition reduces the file size of set.mm (or will reduce it 
eventually with its expected usage).  That's probably too strict though.  A 
better meaning of "actual need" would be a definition whose need is driven 
by the work in progress in order to simplify that work.  Sometimes it will 
be a definition that might not even be in the literature, and sometimes 
literature definitions will not be needed.

"Helps proofs" means the definition should ideally lead to shorter proofs.  
For example, defining A C_ B (subset) clearly makes proofs simpler by not 
having to work with A. x ( x e. A -> x e. B ) all the time.  If we replaced 
A C_ B by its definiens everywhere, many proofs would be significantly 
larger.


>> Concerning the 1st rule, very early on, when I was starting to work with 
>> ordinals, I noticed that many books would redefine "e."  (epsilon) as "<" 
>> and "C_" (subset) as "<_" (less than or equal) since those were the 
>> ordering roles they played in von Neumann ordinal theory.  I figured since 
>> books did it, it must be important, so I also did that in the early 
>> set.mm.  (I may have even used those symbols because reals were a 
>> distant vision at that point.)  Books casually interchanged them in 
>> theorems and proofs, but formalization meant I might have 2 or more 
>> versions of many theorems.  It was a constant nuisance to have to convert 
>> back and forth inside of proofs, making proofs longer.  I eventually 
>> decided to use just "e." and "C_", and it made formalizations much more 
>> pleasant with fewer theorems and shorter proofs, sometimes providing more 
>> general theorems that could be used outside of ordinal theory.  I never 
>> missed not having the ordering symbols to remind me of their roles.  I 
>> liked the feeling of being closer to the underlying set theory rather than 
>> being an abstraction layer away from it.  If the "less than" role of "e." 
>> was important for human understanding in a particular case, I could simply 
>> use the English language "less than" the comment (3rd rule) while 
>> continuing to use "e." in the $p statement.
>>
> Yes, that's absolutely right. These would be exactly identical definitions 
> for the same concept, only using different symbols, so here it is clear 
> what "almost the same" means (namely "exactly the same"). In this case, the 
> symbols "<" and "C_" are used to denote different things anyway. But do you 
> think the definition of ">" would be "almost the same" as the available 
> definition of "<"?
>

Absolutely.  It is exactly the same except for the order of the arguments.  
We could, and perhaps should, add to the comment of df-ltxr something like, 
""In theorem descriptions, we occasionally use the terminology 'A is 
greater than B' to mean 'B is less than A'."  We haven't done so because 
literally no one has expressed confusion being able to figure out what 'A 
is greater than B' means in the comment when the theorem itself says 'B < 
A'.  Elementary school children are expected to understand that; their 
books have word problems that interchange terminology like that all the 
time.
 

>
>
>> As for the 2nd rule, there are often different choices that can be made 
>> in the details of how something is defined.  It is sometimes hard or 
>> impossible to anticipate what choice is optimal until we actually start 
>> using the definition for serious work.  There is also the question of 
>> whether the definition is even necessary.  More than once I've found that a 
>> definition I thought might be needed (for example was used in the 
>> literature proof) actually wasn't.  Sometimes, like the "e." vs.  "<" case 
>> above, the definition was even a hindrance and made proofs longer.  Other 
>> times an intermediate definition of my own invention that isn't in the 
>> literature turned out to be advantageous for shorter proofs.  It can be 
>> best to let the work drive the need for the definition and its precise 
>> details.
>>
>> Another example of a literature definition we purposely don't use and 
>> illustrates the 3rd rule is Takeuti/Zaring's Russell class with symbol 
>> "Ru", which they formally define as "{ x | x e/ x }".  The only theorem 
>> that uses it (in set.mm and in T/Z) is ~ ru.  For the purposes of set.mm, 
>> it is wasteful and pointless since we can just define the Russell class 
>> verbally in the comment of ~ ru.
>>
>> A problem that has arisen in the past is where a person has added a set 
>> of definitions from the literature, proved some simple property theorems, 
>> then is disappointed that the work isn't imported into the main set.mm.  
>> Sometimes we do and sometimes we don't, but the main issue is whether the 
>> 2nd rule above is being followed.  Without knowing exactly how the 
>> definition is going to be applied, very often it won't be optimal and will 
>> have to be adjusted, and sometimes it is more efficient just to start from 
>> scratch with a definition in the form that is needed.  I prefer to see some 
>> "serious" theorems proved from a definition before importing it.
>>
> Here you use again criteria which could be interpreted differently: what 
> is "serious"?
>

This is also subjective, but I would say "serious" means that the 
definition leads to something other than just restatements of its 
properties i.e. something other than what I would call "shallow" theorems.  
For example, assLaw can't prove anything deeper than the associative law 
AFAIK, which is an immediately obvious consequence.  Something like 
uniqueness of the identity element in Grp might be an example that is 
slightly beyond "shallow", since it isn't immediately obvious to everyone.
 

> Is it the number of theorems using (directly or indirectly) a definition,
>

Not in itself.  We could add a theorem for ">" reproducing every theorem 
now using "<", and many more with all possible combinations of the two, but 
that wouldn't justify introducing a definition for it.
 

> or must it be a theorem from the 100 theorem list (or the list of theorems 
> in Wikipedia) using the definition?
>
 
No, it doesn't have to.
 

> Or is it sufficient that the theorem using the definition is mentioned in 
> a book, maybe labelled as theorem (or proposition etc.)?
>

No, that isn't sufficient (or necessary).

I would say the most important thing is that it involves material that is 
actively being worked on, and its introduction is driven by that work and 
practically essential for its continued practical development.  Sometimes 
this may even mean that we introduce definitions that aren't in any books.
 

>
>> As a somewhat crude example of why a definition in isolation may not be 
>> optimal, suppose someone defined sine as "$a |- sin A = ..."  (not a straw 
>> man; this kind of thing has been attempted in the past) i.e. a 1-place 
>> function-like "sin A" rather than a stand-alone object "sin".  While this 
>> can be used to prove a few things, it is very limiting since the symbol 
>> "sin" in isolation has no meaning, and we can't prove e.g.  "sin :  CC --> 
>> CC".  It is an easy mistake to make if you don't understand set theoretical 
>> picture but are blindly copying a textbook definition.  Here it is pretty 
>> obvious, but in general such issues can be subtle and may depend on the 
>> application.  Also, there are often multiple ways to define something, some 
>> of which are easy to start from and others which would require a lot of 
>> background development.  The best choice can't always be anticipated 
>> without knowing what background will be available.
>>
> Absolutely correct, but here we have rules how (new) definitions should 
> look like: they usually should be binary relations or functions/operations, 
> with exceptions only in very specially justified cases - I think these are 
> not documented yet (in section "Conventions"), but these conventions for 
> definitions came up as I started with Graph Theory, in a discussion (in 
> this Google group) mainly with Mario. By such a conventioned, such crude 
> definitions as descibed can be avoided.
>
>>
>>
>> *3.* Let me comment on the recent proposal to import df-mgm, df-asslaw, 
>> df-sgrp.  These are my opinions, and correct me if I am wrong.
>>
>> df-mgm:  A magma is nothing but an empty shell with a binary operation.  
>> There are no significant theorems that follow from it.  All the properties 
>> of binary operations that we have needed have already been proven.  If 
>> additional properties are needed in the future it seems it would be better 
>> to state them directly because they will be more useful generally.  Just 
>> because it appears in Bourbaki or other texts is not to me a 
>> justification:  it also needs to be useful and be driven by an actual need 
>> for it.  There are many definitions in textbooks that have turned out to be 
>> unnecessary.
>>
> We have already several theorems valid for magmas, which are proven for 
> monoids, or at least in the context of monoids (because we have nothing 
> else). Proving them for monoids, however, could confuse a reader, because 
> he does not see if an identity is required for the proof or not. 
> Additionally, ther term "magma" is used 20 times in main set.mm, so it 
> has some significance. Furthermore, the definition of a monoid could be 
> based on the definition of a magma (or semigroup, see below) as it is done 
> for the definition of groups based on the definition of monoids, e.g.
> MndALT1 = { g e. Mgm | [. ( Base ` g ) / b ]. [. ( +g ` g ) / p ]. ( A. x 
> e. b A. y e. b A. z e. b ( ( x p y ) p z ) = ( x p ( y p z ) ) /\ E. e e. b 
> A. x e. b ( ( e p x ) = x /\ ( x p e ) = x ) ) }
> or using the definition of a semigroup
> MndALT2 = { g e. SGrp | [. ( Base ` g ) / b ]. [. ( +g ` g ) / p ]. E. e 
> e. b A. x e. b ( ( e p x ) = x /\ ( x p e ) = x ) }
>

Unless they are used for many things other than monoid, I don't see the 
advantage of either of these over just df-mnd.  Neither magma nor semigroup 
have any "serious" (non-shallow) consequences AFAIK.  The overhead of 
introducing them just to define 2 layers under monoid is far more than the 
overhead of just df-mnd by itself.

Finally, df-mnd is self-contained allowing the reader to grasp it 
immediately, and the comment in df-mnd explains it very clearly, so I think 
df-mnd is much less of a burden on the reader, especially one who has never 
heard of magma and semigroup and has no reason to care about them.


> For me, these are already serious reasons to have these definitions. To be 
> concrete, however, would it be sufficient to prove the statement in ~ 
> xrs1mnd that RR*s is neither a monoid nor a semigroup (it is "only" a 
> magma) to justify the definition of a magma?
>
>
>> df-asslaw:  All this is doing is encapsulating the class of associative 
>> operations.  The only thing that can be derived from it AFAIK is the 
>> associative law.  I don't see how it would benefit proofs since additional 
>> work is needed to convert to and from it.  I don't see how it helps human 
>> understanding because it is easier for a human to see "show the associative 
>> law" in the description of a theorem than be asked to learn yet another 
>> definition (an example of the 3rd rule).
>>
> The main reason why I (still want to) introduce these definitions, is that 
> they can be used as building blocks for very different structures.
>

Then I think we should wait until we have a need for those structures.  We 
don't right now.  If and when it makes sense to change df-mnd to 
incorporate them, we can do it then.  This was done with df-grp to 
incorporate monoid.  It's a relatively trivial retrofit.
 

> If you look at Wikipedia (e.g. the article "Magma"), you can find a table 
> classifying all "Group-like structures" - the "law"-theorems correspond to 
> the columns of this table.
>

The article also mentions about 25 variations of magmas obtained by adding 
various properties.  Many of them don't even link to Wikipedia pages, 
suggesting they are probably not widely used if used at all.  I don't think 
they should be added to set.mm until a need for them arises, driven by work 
in progress.

 

> By these, you could characterize a monoid by three of these laws. Instead 
> of defining a monoid as it is done today by
> df-mnd $a |- Mnd = { g | [. ( Base ` g ) / b ]. [. ( +g ` g ) / p ]. ( A. 
> x e. b A. y e. b A. z e. b ( ( x p y ) e. b
>             /\ ( ( x p y ) p z ) = ( x p ( y p z ) ) ) /\ E. e e. b A. x 
> e. b ( ( e p x ) = x /\ ( x p e ) = x ) ) }
> you do not see immediately that it contains closure, associativity and 
> having an identity! This would be different by a definition like
>

The user knows from the description and from the property theorems that are 
referenced.  But even without that, I do see all of these properties pretty 
immediately - they're right there in the definition: ( x p y ) e. b for 
closure,  ( ( x p y ) p z ) = ( x p ( y p z ) ) for associativity, and E. e 
... ( ( e p x ) = x /\ ( x p e ) = x ) for left and right identity.
 

> df-mndalt $a |- MndALT = { g | [. ( Base ` g ) / b ]. [. ( +g ` g ) / p ]. 
> ( p clLaw b /\ p assLaw b /\ p idLaw b ) }
>

The user would not know what this means without looking up all the 
auxiliary definitions.  It isn't obvious that idLaw means both left and 
right identity.  If avoiding the quantifiers that clutter df-mnd is the 
goal, they are going to encounter those anyway when drilling down to learn 
the extra definitions.

To me, these kinds of definitions are better explained in the English 
description.  They are more things to burden the reader with learning.  The 
description of df-mnd very clearly states these properties.
 

>
> Concerning the human understanding, we have (and we keep) corresponding 
> theorems, e.g. 
> mndass $p |- ( ( G e. Mnd /\ ( X e. B /\ Y e. B /\ Z e. B ) ) -> ( ( X .+ 
> Y ) .+ Z ) = ( X .+ ( Y .+ Z ) ) ) $=
> But in parallel, we could have theorems in the short form, e.g.:
> mndasslaw $p |- ( ( G e. MndALT -> .+ assLaw B ) $=
>
>
> df-sgrp:  This is basically df-asslaw converted to an extensible 
>> structure.  So in essence it duplicates df-asslaw (or vice versa depending 
>> on your point of view), violating the first rule (no redundancy).  More 
>> importantly, AFAIK there is nothing that can be derived from it except the 
>> associative law, unlike say df-grp where deep and non-obvious theorems are 
>> possible.  Yes, it can be used as a "layer" on top of which we can slightly 
>> simplify df-mnd and df-rng, but it is really worth having an entire, 
>> otherwise useless definitional layer just to avoid having to state 
>> "(x+y)+z=x+(y+z)"?  The fact that it simplifies two definitions is a slight 
>> positive, but I'm still torn about it.  It doesn't quite pay for itself in 
>> terms of reducing set.mm size, and it increases the burden on the reader 
>> who has to learn another definition and drill down another layer to 
>> understand df-mnd and df-rng.  Personally I'd prefer to see 
>> "(x+y)+z=x+(y+z)" directly because it is immediately obvious without having 
>> to look at a deeper layer.  In the case of the description in df-rng0, it 
>> would be more direct and less obscure to state "associative operation" 
>> instead of "semigroup operation", since most people (at the level of 
>> studying that definition) would know what "associative" means, but fewer 
>> would would know what "semigroup" means.
>>
> In my opinion, this is not a duplication: df-asslaw would be part of 
> df-sgrp, characterizing the operation of the extensible structure. 
> Concerning the reader: yes, for some it is a burden, for others it is a 
> pleasure to have a stringent hierarchy of definitions.
>

This is a good point.  How do we choose which reader to satisfy?  
Mathematicians who already know the stuff probably would find the hierarchy 
more pleasing, and readers who are trying to learn unfamiliar material 
would find more definitions to learn a burden.

 

> Finally, I think there will be a lot of serious theorems which could be 
> proven (there is a complete "Semigroup Theory"). To wait until such 
> theorems are formally proven within set.mm is not necessary, in this 
> case, because basic theorems can already be provided - and they cannot be 
> wrong, because they are also valid for monoids in its currently defined 
> manner.
>

>> I can't say that these will never be imported.  Possibly something will 
>> drive a need for them, maybe something related to category theory where 
>> even a magma might be a useful object.  But I would want to see the 
>> definitions be driven by that need when it arises; we don't even know yet 
>> whether the present extensible structure form can be adapted for that 
>> purpose.  It is fine to store them in a mathbox indefinitely, but I'd like 
>> to see a true need driving their use before importing.
>>
>>
>> *4.* As for adding an exhaustive list of all possible definitions one 
>> can find in Bourbaki or whatever, as someone suggested, I don't think 
>> something like that belongs in set.mm 
>> <http://www.google.com/url?q=http%3A%2F%2Fset.mm&sa=D&sntz=1&usg=AFQjCNGyXl3zrxstB4U2tIe0Dll1hr7fxA>,
>>  
>> for all the reasons above.  There are other resources that already list 
>> these (Wikipedia, Planetmath, nLab).  Their precise formalization will 
>> depend on the context in which they are needed in set.mm, which is 
>> unknown until they are used.  In isolation (perhaps with a couple of 
>> property theorems that basically check syntax) there is no guarantee that 
>> they are correct, even in set.mm.  To restate, I think the philosophy 
>> should be that definitions should be added based on need, not as busywork.  
>> Adding a definition when needed is one of the smallest parts of building a 
>> series of significant proofs.  It doesn't have to be done in advance, and I 
>> don't think it is generally productive or useful to do so.
>>
> This is also something we can agree on immediately: Defnitions without 
> usage (i.e. without theorems or only with theorems for syntax checks) are 
> not useful (within set.mm) and should not be contained in the main body 
> of set.mm.
>
> In summary, we agree in most cases, and differences are only in the 
> interpretation of the rules. Regarding the "law" definitions and 
> definitions for magmas, semigroups and (non-unital) rings, I still would 
> like to have them in the main body of set.mm, making it
> more concise and consistent (especially if you look at the theorems with 
> labels containing "rng" which are valid for unital rings only, which might 
> be confusing). From my point of view, the benefits for a user are greater 
> than the disadvantages.
>

"rng" was chosen for the label abbreviation because I wasn't aware that 
non-unital rings had their own symbol.  It is a simple substitution to 
change the names.  We still have no application driving the introduction of 
Rng.

Norm 

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/metamath/eb33b563-c1dc-48d5-92eb-37f808faa66c%40googlegroups.com.

[Metamath] Re: Philosophy and goals for set.mm

Reply via email to