Re: [sage-devel] Re: Code duplication and aliases in methods

Nathann Cohen Fri, 06 Jul 2012 05:06:06 -0700

Helloooooooooooooooooooooooooooo Birk !!!

> Why does the graph complement G make a copy the graph itself and then
> changes the edges? I think "G = copy(self)" is the root of problem. The
> graph and its complement are too different: in general, properties like
> bipartitness, connectedness,
> coloring, shortest paths, ... are destroyed as well as all labelings on
> edges.


Well, I would say developper lazyness (and I would probably have done the
same myself). The common points between a graph and its complement are many
-- vertex set, which includes vertex labels, perhaps the 2D layout too, the
fact that they accept loops or not, whether they are directed or not... To
name the hidden fields of a Graph object which would have to be copied
manually otherwise :-)

As to whether a graph and its complement are very different, it depends on
the point of view. "My graphs" usually have nothing in common with their
complement -- and I would not compute their complements anyway, my graphs
are usually very sparse -- but I talked recently with people from group
theory who constantly confused a graph and its complement -- it made no
difference at all in their claims.
Of course, colorings, connectedness and friends are immediately destroyed
when you replace a graph by their complement, but qs we do not store them
anyway...

> Instead the function complement has to be defined in Graph and in DiGraph
> and needs to create a new Graph or DiGraph object... For the moment, I
will
> not suggest to change it because there are other fundamental things to
> discuss... :-)

I would say that compared to the implementation of complement(), the copy
makes very little difference... Copying something is pretty quick. The
implementation of complement(), though, scares me since the first time I
saw it. Just look at it :

---
G = copy(self)
G.delete_edges(G.edges())
G.name('complement(%s)'%self.name())
for u in self:
   for v in self:
        if not self.has_edge(u,v):
             G.add_edge(u,v)
return G
---

There are no functions to remove all edges at once, so the delete_edge
commands looks iteratively for each edge in the list, removes it, and takes
the next. Hence Manyyyyyy adjacency lookups. Then, n^2 times, all possibles
adjacencies are tested in self (and if the vertices are complicated objects
instead of integers these operations are not necessarily very cheap [1]),
and an edge is added if necessary. I would say that the best is to build a
dense copy of self for a quick adjacency test, THEN to build G from this
complement of self.
Of course, taking the complement of a dense graph is *MUCH* easier (just
complement every bit, you do not even need to know where you are in the
matrix).

The point is that I wanted to implement such methods directly inside of the
graph backends, and that I still do not understand their code very well.
>From time to time I have to change something there, so I add some
documentation to it in the same patch, but I still do not get it
completely.I would like to write a new backend at some point and see if I
can do better than what Sage already has.

>>> BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).independent_set()
>
> There should real be no error with this!!

There should not be any BipartiteGraph class :-PPP

> No, there should be absolutely no need to rewrite a function to run on a
> subclass of the class Graph. Is there something wrong with the design???

Well, unless there is a way to know automatically which are the functions
that work well with BipartiteGraph, and which ones do not. And of course
this list of methods would be different for any other property you would
like to preserve.
I actually like to copy stuff around. People usually prefer to totally
destroy a class a rebuild it with many inheritances instead of copying
stuff around, and then call it "the correct implementation".
I just see that afterwards there are ten classes instead of only one, and
that the only justification for that is the existence of a class doing
something unrelated, and that because of that other class working on the
old one becomes hell.

> To me, BipartiteGraph(n) does not make sense.

+1 :-p

> It is a graph without edges.

Yep. I do not know how it is implemented, but I guess that it is only "call
the Graph constructor with the same input, check that the graph is
bipartite, and store any bipartition of the vertex set that you find"

> Trying to find out to which partition class vertex 0 belongs to
> (G=BipartiteGraph(3); G.<tab>) I was struck by this immense list of more
> than 270 functions.

Yeah, Sage graphs are very knowledgeable :-D

> I found that G.Bipartition() will do the job.
> Surprisingly (or not) all vertices belong to one class and the other is
> empty. For me, BipartiteGraph(k,l) would be more natural (maybe with the
> convertion that the first partition class contains vertices 0,1,...,k-1,
and
> the second class k,k+1,...,k+l-1).

Ahaahah. So no labels anymore ? :-p

I thought about it for a while and did not find any way to implement it
correctly.... But as I do not even see the point in having a BipartiteGraph
class in the first place anyway ^^;;;

> It just needs to be well documented that adding an edge is only possible
> between different partition class. I agree with how it is now. If you want
> to add random edges then you need to use the class Graph() and check with
> is_bipartite every time you add an edge.

Well, at least "BipartiteGraph" would ensure that the graph is Bipartite,
and not "preserves the first bipartition it found" :-p

Documentation is very nice sometimes, but you expect something from
BipartiteGraph when you read the class' name that really is *not* what the
class does. At the very least -- with such a name -- I think that the
BipartiteGraph constructor should take a mandatory "bipartition" argument,
so that the user necessarily knows that the class is "bipartition-based",
and that there is something more to BipartiteGraph objects than a bipartite
graph.
Even if we let this "bipartition" argument be set to "auto" (nondefault
value) so that the output of bipartite_sets is automatically computed.

> Can you or someone else explain the details? What is exactly the problem?
Is
> it
> about storing the large amount of graphs somewhere?

Nono, it happens when you want to use graphs as dictionary keys. I mean --
stupid case -- you have a graph G , several graphs h_list = [h1, h2, h3,
...] and you would like to store how many times h_list[i] occurs in G.
Well, having a dictionary such that occurrences[hi] =
numer_of_occurrence_of_hi_in_G makes sense, but that would only work if hi
is hashable (and hence immutable), otherwise Python will refuse to work for
you.
In Sage-combinat they have fancier situations -- they want to define graphs
whose vertices are graphs. Edge labels too. Crazy guys ! :-p

>>> I also wrote some kind of static graph structure myself -- low level
ones
>
> In dead, different graph implementations should be possible...

Yep, though those ones inherit absolutely nothing and do not support
labels, ... They are meant to be used when you have real graph work going
on, and that usually does not depend on labels.

> I would not care so much about the size of the files if the code in them
is
> in a good shape.... ;-)

Yeah, but Jeroen sometimes complain that files are too big. I do not mind
myself at all, but when Jeroen expresses a wish I try to make it happen, no
question asked :-)

Actually, I do mind a bit that G.<tab> fills the screen with weird stuff.
But GOD, tab completion is good to have around. No good answer to that yet
:-p

> For the long term plan, my major critics is about the violation of the
> "single responsibility principle" of object oriented programming (OOP).
This
> important principle of object oriented programming (OOP) states that a
class
> should have only one responsibility (see also
> https://en.wikipedia.org/wiki/Single_responsibility_principle).

I hate principles, on principle.
I also deny that the design of our graph classes is object oriented. I hate
objects.
I have traveled a long road and met many people who work in Java. I have
seen wonders. I have seen them store objects for which the memory cost of
pointers and object stucture was around 3 times the amount of data stored
by the object.
I have seen them sort a loooong (several GB) list of objects, and at the
end of the procedure the *pointers* toward the objects were contiguous in
memory, but of course *not the objects themselves*.
In this world any graph algorithm has its own class. Its output has its own
class. A shortest path is not a list, it is a ShortestPath object. An edge
is not an pair of values, it is an Edge object. Or perhaps even an
Edge<EdgeFactory> object, I do not remember.
Anyway I saw all that, and I felt scared, threatened, and insecure O_O;;;

> I want to give you some analogy with the real world. If one had a serious
> problem with his eyes, would he like to go to a doctor who is a dentist
and
> a surgeon and an eye specialist at the same time? - No. He probably would
> like to seek advice from someone who focuses only on eye problems because
of
> deeper knowledge about this problem and better chances of cure.

I feel insulted at your hint that I have a terrible eyesight.
Would I admit it (and it is very hard to make me admit anything), I would
indeed go see an ophthalmologist, but I would obviously steer away from a
schizophrenic madman (Graph class) that believes he is sometimes an
ophtalmologist (backend), a chiropractor (GraphAlgorithms), a Zoo Guard
(GraphExport) or an indian sorcerer (GraphBasicOperations)

> Similar, it is for classes. If one class focuses only one responsibility,
> the functionality of that class is much easier to understand.

Well, the Graph class is not very complicated. It just has one thousand
methods that apply to graphs. But at least you never wonder in which file
they may be (except for the GenericGraph stuff. except for the backends
stuff....), you can call any function from wherever you are without
worrying of anything.
You do not have to know a class design with 10 files to contribute either.
I mean... Classes are sometimes useful, but if we just create 10 of them
that all inherit each other we just have the same as before, written in a
more complicated way. Importing from modules looks easier to me, whether it
is done manually or automatically [2]

> Now let us look at the classes of graph theory. How many responsibilities
> does the class Graph() have?
> 1) representing graphs in files, e.g. sparse6_string, write_to_eps
> 2) running algorithm on graphs, e.g. vertex_cover, coloring, is_bipartite,
> 3) transformation to other classes, e.g. to_directed
> 4) drawing/layout, e.g. show3d
> 5) creating new graph objects, e.g. union, kronecker_product
> 6) storing and manipulating graphs, e.g. add_edge, delete_vertex
>
> Six responsibilities?!?! O_o

Come oooon. You can dig as many "responsibilities" as you may like,
depending on what you do with them !
- Checking properties
- Solving optimization problems
- Compute graph decompositions
- Compute statistics

But more importantly, if you can achieve to build boxes that would split
all the current graph functions into subgroups, it is much harder to change
that later on when you add stuff to existing boxes. That's the cost of
abstraction, you must be ready to change the view you have on things
anytime, and that's not easily done when it comes to code and changing
those boxes breaks Sage in many places.
I like not to have any boxes. Of course when you do that you look like a
retarded peasant, of course everything is pretty basic and obvious and
everything.
But everybody understand how it works, you easily know what you deal with,
and you do not get bitten later on by the abstractions you built.

> So let me turn my previous experience into some advice and with some
> discussion we can make it even better!! I will break down things into
steps
> and post them separate.
> - import / export of graphs (modifications in graph theory, Step 1)

Yep, by the way users often complain that we have no easy way to
import/export graphs, and with all NetworkX methods lying around all we
have to do is to call it to do the job in our stead. I think I wrote that
patch a looong time ago, but it must have been lost since :-p

> - algorithms on graphs (modifications in graph theory, Step 2)
> (- and probably more are coming...)
>
> Any comments are welcome... or wait for the next posts ;-)

Well. I personnally very much like the current "C-style" of the Graph
classes, and I think it does not need to be more complicated than that. The
main problem of Sage's graphs is that they can do an awful lot of stuff,
and I would almost prefer to remove features from Sage than to split the
classes according to some classification.
I have no technical argument to support this. I just do not like that.
Everything I like eventually gets improved until there's nothing I like in
it anymore. That always happens when people begin to think that it is
serious stuff, and to make it even more serious they define things that did
not need to be defined in the first place. Then everything has a purpose,
and if you come up later with something that does not fit with somebody's
well-defined role you find yourself throwing it away because nobody will
accept it.

One week ago I was at the tribunal because my former real estate agency
still owes me 600 euros that I should have got back last october. After 10
months of struggle the judge told me that the opponent's lawyer had just
obtained the documents from my opponent, and that because of that the trial
would be moved to next october.
Yesterday I was at the post office, and a guy who wanted to get a parcel
from a friend heard that the parcel had been opened by the border guards,
that they had "guessed" its value to be 400 euros, and that he had to pay a
100 tax to get it.

So to me, we are not doing very complicated stuff. We just want to find out
who stole who's money, how and why. We just want to send and receive
parcels, and you should not have to open law books for that.
We also play with graphs, and Sage's graph library is 'a bunch of
functions'. I like to move code around to gain speed, when I need to write
a lot of doc, or when I think that the stuff only interests me and that
nobody should have to see it unless they explicitely ask for it. But I
don't like at all when we have to create *structure* to deal with file
length.

Of course, that's only my advice. Nobody should ever take it seriously, for
if there exists a great way to split all our files in an object-oriented
way I will be saying two days later that C is nothing but old garbage, and
that OOP is the only proper way to write code.

And by the way for the next two months all graph patches will come and go
without any interruption from me, and a lot can happen during that time :-)

Thank you for your post Birk ! It's nice to talk about these things
together. See you perhaps in Singapore eventually !

Have fuuuuuuuuuuuuuuuuuuun !!

Nathann

[1] http://trac.sagemath.org/sage_trac/ticket/12933
[2] http://trac.sagemath.org/sage_trac/ticket/13188

-- 
-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Re: Code duplication and aliases in methods

Reply via email to