Re: [sage-devel] Re: Code duplication and aliases in methods

Dima Pasechnik Mon, 09 Jul 2012 18:36:44 -0700


On Tuesday, 10 July 2012 03:19:52 UTC+8, Robert Bradshaw wrote:
>
> The problem with Graph vs. BipartiteGraph is because the one extends 
> from the other. Instead, both should descend from a common base class 
> that implements the common functionality. Something like add_edge 
> would not belong in this baseclass, and algorithms that take advantage 
> of special graph functionality would override the relevant methods. 
>
> As for the issues of mutability, I think the builder pattern could 
> come in handy here. This could either be explicit or perhaps even 
> cleverly done implicitly with copy-on-write semantics. Properties like 
> bipartiteness could be checked on creation, once, after all edges are 
> added. For example 
>
>     g = 
> GraphBuilder(n).add_edge(...).add_edge(...).asBipartiteGraph(vertex_partition)
>  
>
>
> At the very least a set_immutable() method could be provided to make 
> them hashable and perhaps even transform into an optimized 
> representation. 
>
> Another idea I haven't seen floated is storing the properties of a 
> graph as a set of flags, asserted at creation (or cleared when the 
> graph is modified).



I mentioned this possibility some time ago, saying that's the way GRAPE 
package in GAP
works.
 

> You could then dynamically look up the best method 
> to use for any algorithm by consulting this flag list (and perhaps 
> some standard naming scheme for automatic lookup). 
>
> and this sounds like GAP's type system :–)
One should give GAP's people a credit here for designing a system 
well-suited
for the quite arcane situations often appearing in algebra.

Dima


 

> - Robert 
>
>
> On Tue, Jun 19, 2012 at 2:06 PM, Nathann Cohen <nathann.co...@gmail.com> 
> wrote: 
> > Hellooooooo everybody !! 
> > 
> > As in the best times, I tried 5 times to answer all who answered this 
> > thread, got angry, erased the message, and began again. But Robert 
> Miller 
> > has a nice saying when we get to such situations : "shut the fuck up and 
> > show me the code". So that's what I will try to do. Give examples 
> instead of 
> > wondering aloud whether the world has gone mad. 
> > 
> > Be warned, for it is a long mail (I --love-- long emails).  It also 
> gives 
> > some idea of what is happening in Sage's graph library these days, if 
> > anybody (but me) cares :-) 
> > 
> > 1) Some history : the BipartiteGraph class. 
> > ----------------------------------------------------------- 
> > The purpose of this class was to *at long last* have Graph classes that 
> > represent graph properties. There are two possibilities when writing 
> such a 
> > thing : 
> > a) make it inherit all the Graph functions 
> > b) make it a new class 
> > The problem when picking b) is that this class gets totally useless as 
> it 
> > has none of the interesting methods of Graph. So a) was picked, and of 
> > course most if not all methods of BipartiteGraph break when called on a 
> > BipartiteGraph object. 
> > The reason why is easy : some of the most fundamental graph operations 
> are 
> > invalid. Like add_edge. 
> > I assure you that not being able to add an edge in a graph is *very* 
> > problematic. 
> > The code ? 
> > 
> > Try the following : 
> > sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).complement() 
> > 
> > Of course, the complement of a bipartite graph needs not be bipartite, 
> right 
> > ? Yeah, but the graph methods do not know that, nor should they mind 
> most of 
> > the time. Try that : 
> > 
> > sage: 
> BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).independent_set() 
> > 
> > Basically all graphs functions would need to be rewritten to work with 
> > BipartiteGraph. The only sane design choice would be for BipartiteGraph 
> > *not* to inherit Graph methods, but of course it would be pretty empty 
> and 
> > all methods would need to be copied anyway. 
> > 
> > There is another problem with Bipartite Graph. The add_edge function 
> needs 
> > to be wrapped so that no edge is added to the graph that creates an odd 
> > cycle. This is a costly operation. *VERY* costly operation (compared to 
> > add_edge, of course). And so it was not done this way, for efficiency. 
> Hence 
> > instead of checking whether the graph "plus the new edge" stays 
> bipartite, 
> > the vertices are bicolored from the start, and an exception is raised 
> when 
> > add_edge tries to add an edge that does not fit *the former bicoloring*. 
> > That's mostly incorrect, does not correspond to the user's intent, but 
> it 
> > is....faster >_< 
> > 
> > sage: g = BipartiteGraph(5) 
> > sage: g.add_edge(0,1) 
> > 
> > This could could be fixed by checking that the graph remains bipartite 
> each 
> > time an edge is added, but of course that comes with a cost, and a cost 
> that 
> > it is not healthy to pay when you *just* want to add an edge. The most 
> > natural explanation behind is that you usually know what you are doing 
> when 
> > you add an edge to a graph, and that you should know for yourself 
> whether 
> > you graph remains bipartite or not. If not, please check it with 
> > is_bipartite when you need it, but *do not make everybody's code 10000 
> times 
> > slower out of sheer laziness*. 
> > 
> > At some point somebody proposed to override the complement() function so 
> > that the complement of a graph stays bipartite : this can be achieved by 
> > switching edges and nonedges *between the two sets* and nowhere else. 
> Though 
> > this operation makes perfect sense in graph theory, it would of course 
> be 
> > very dangereous to replace the .complement() function by doing that, for 
> all 
> > algorithms that take a BipartiteGraph as an instance and compute graph 
> > complements would then return wrong result *and not know it*. 
> > 
> > Sooooooo, these are the problems created by a property we want to keep 
> on a 
> > graph. It is not *yet* about immutable graphs, because NONE of these 
> methods 
> > would be available at all. This would also prevent you from running any 
> > method that think they can use the other ones, even when they do not 
> > actually change the graph itself (like independent_set()). 
> > 
> > I said it several times, but for me this class should be removed. It is 
> not 
> > just bad from the computational point of view, but it also creates 
> problems 
> > that new users have no idea how to cope with, because what the methods 
> do is 
> > not what they have in mind. 
> > 
> > 2) Immutable graphs : storing the graph's properties 
> > ----------------------------------------------------------------------- 
> > 
> > Recently in Sage was merged a patch which is an interface with the graph 
> > class database ISGCI [1,2]. There are many "graph classes" (let us 
> > understand this as 'graph property' for the length of this discussion), 
> and 
> > this database can tell us to some extend which properties are implied by 
> > others. When I first wrote this patch, what I had in mind is precisely 
> what 
> > Dima mentionned : some algorithms work for general graphs and have a bad 
> > running time, some other algorithms have a better running time because 
> they 
> > are only meant to be run on graphs with a specific property [3]. 
> > This is tempting, because the graph theory litterature is full of this 
> kind 
> > of algorithms "which work only if my graph has this and this and that". 
> > Hence the goal was to run on each graph the implementation most adapted 
> to 
> > it. 
> > 
> > ~~~~~~ 
> > By the way Dima, there are "very sane ways" to write 
> perfect-graph-specific 
> > algorithms in Graph.py, hence that can be accessed for all graphs. What 
> > would you think of : 
> > Graph.independent_set(algorithm = "perfect_graph") ? 
> > It would not be the default algorithm, it would not be run by mistake, 
> and 
> > because perfect independent set are not only interesting for perfect 
> graphs, 
> > this is actually what would make the most sense. No new class, no 
> overhead 
> > over the most important graph functions... 
> > ~~~~~~ 
> > 
> > Of course, we cannot attempt, each time independent_set is run, to check 
> all 
> > graphs properties for which we have a specific algorithm. Unless the 
> current 
> > implementation (very bad, but -correct-) is changed, it is very costly 
> to 
> > check whether a graph is perfect. Much more than to compute an 
> independent 
> > set, actually. 
> > 
> > Hence it would be nice to have a class of immutable graphs, to which 
> could 
> > be associated (cached) some properties that would not have to be 
> computed 
> > over and over afterwards. The most natural question, now : what about 
> the a) 
> > and b) options from the BipartiteGraph class ? Should they inherit all 
> > methods, or none ? 
> > My personal advice is clear --> absolutely none. Unless somebody can 
> think 
> > of a way to avoid all the problems we had with BipartiteGraph. Hence 
> > immutable graphs would only be used for "storage purposes", and the 
> combinat 
> > guys already (repeatedly) asked for such a thing, as they sometimes need 
> > their graphs to be hashable (and to store them in sets, or as dictionary 
> > keys). It also happened to me recently, and I had to store spar6_string 
> > instead. Not so bad, but not a good answer either. 
> > I also wrote some kind of static graph structure myself -- low level 
> ones -- 
> > because you can perform some optimizations when you know from the start 
> that 
> > the graph will not be modified. Having a static graph structure thus is 
> a 
> > way to have a faster data structure, which we use for instance to 
> compute 
> > distances between all pairs of vertices [4]. 
> > In particular, the purpose of these speedups is to save the time it 
> takes to 
> > call has_edge, or to iterate over the neighbors of a graph. 
> > In particular, it is totally unthinkable that graphs you actually want 
> to 
> > compute things on could be immutable. We already lose time compared to 
> > networkX for stupid things like add/remove edges, there is no way on 
> earth 
> > that add_edge would copy the whole thing. 
> > 
> > Dima, I have been working with Thomas on groups recently, with GAP and 
> > Magma, which was pretty new to me. Of course their objects are usually 
> > immutable, of course you create them once and never touch them, but the 
> way 
> > graph theory works is totally different. We spend our lifetimes adding 
> and 
> > removing edges, and vertices, and building a complete graph on n 
> vertices 
> > edge by edge *cannot* take n^4 time, which is what happens if you copy 
> the 
> > graph each time you want to add an edge. 
> > Honestly. All algorithms are built like that. Any greedy algorithm works 
> > like that, and graph theory has a lot of them. If you want to compute 
> > matchings, if you want to compute decompositions, god, everything is 
> being 
> > done step by step, corrected 10 times, and you just cannot afford to 
> copy 
> > your whole data each time. 
> > 
> > 3) Splitting the files 
> > --------------------------- 
> > 
> > We have something like "three files" for graph methods (data structures 
> > excluded). generic_graph, graph, and digraph. The graph one is pretty 
> thick, 
> > and should be even thicker considering that some methods stored in 
> > generic_graph should be moved to graph.py (old mistakes, most probably 
> > mine). I do not see among the methods any clear way to split them (I 
> already 
> > had to list all graphs methods thematically twice, each time I reached 
> the 
> > same conclusion -- that it is totally arbitrary and unnatural), so I 
> will 
> > refrain from doing it myself. Of course anybody else can write a patch 
> doing 
> > that, which somebody else will review, Sage is a free software so my 
> > personal advice does not matter much. But I do not see how it could be 
> done 
> > smartly. 
> > Some recent additions to Sage's graph library were done in modules. 
> These 
> > functions are not necessarily made available through functions from the 
> > Graph class, but can be imported. 
> > I think that it is the best way to procede now that the algorithms we 
> add 
> > get more and more specific, somehow "research-level", at least in the 
> view I 
> > have of graphs ("topological things", like connectivity, distances, 
> sparse 
> > graphs, that stuff). 
> > 
> > Having a large amount of graph function available in modules only is 
> also a 
> > way to get the users used to query the documentation when they wonder 
> what 
> > Sage can do, and not only the automatic completion of "g.<tab>". Many 
> graph 
> > methods are not that fundamental anyway, and the same way that it would 
> not 
> > make sense to import them systematically to Graph, there is no sense in 
> > creating a "GraphThatYouWantToColor" class whose methods would be the 
> > coloring-related stuff (currently stored in graph_coloring.py). 
> > 
> > Hence, as this thread originally suggested, I think the best is to work 
> with 
> > modules, and to lessen the load of Graph.py by "importing" the useful 
> > functions there, which just requires one line (thank you Florent for 
> your 
> > trick !!!). 
> > I am sorry to only think of the matrix0, matrix1, matrix2 and matrix3 
> files 
> > as a joke. The only reason I see to keep them this way is that "we 
> always 
> > kept them this way", and that people are usually scared of backward 
> > compatibility. Of course, I do not know this code at all, so I will not 
> come 
> > and dirty it with my hands. 
> > Jason suggested that it originally came from a technical problem in 
> Cython, 
> > and that makes sense. Honestly this is just a terrible design, and 
> unless 
> > you are forced to do it, you just don't. That's common sense. You do not 
> use 
> > object-oriented programming to create files matrix0...matrix3. That's 
> > confusing, that's... 
> > Come on, that's just a mistake. 
> > It shows. 
> > 
> > I mean.... The thing is that for me importing stuff from module to 
> classes 
> > just solves everything at no cost. Of course I may have missed things, 
> in 
> > which case somebody will probably tell me what eventually, but why look 
> > anywhere else if we have an answer ? Is there a bad side to this way of 
> > doing ? I work with group theory these days, so I am used to missing 
> obvious 
> > things :-) 
> > 
> > ------------- 
> > 
> > Well. By the academic world's standard this mail is long only forgot 
> > unimportant things, so I better resume filling boxes as I will be 
> leaving 
> > Belgium the day after tomorrow. Gosh, my appartment is a mess ^^; 
> > 
> > Oh, and by the way I expect to be very close to totally unreachable for 
> a 
> > long period, probably two months and a half, from early july to end of 
> > september. If anybody around happens to be in southeast Asia during that 
> > time (Thailand/Vietnam/Cambodge/Laos), I will be backpacking there, 
> let's 
> > meet somewhere !!! 
> > 
> > Haaaaaaaaaaaaaaaaaaaaaaaaaaave 
> > fuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuun ! :-) 
> > 
> > Nathann 
> > 
> > [1] http://www.graphclasses.org/ 
> > [2] http://combinat.sagemath.org/doc/reference/sage/graphs/isgci.html 
> > [3] Of course Dima I only talked about methods *implemented* in Sage 
> > [4] 
> > 
> http://www.sagemath.org/doc/reference/sage/graphs/distances_all_pairs.html 
> > [5] 
> > 
> http://www.sagemath.org/doc/reference/sage/graphs/base/static_sparse_graph.html
>  
> > 
> > -- 
> > To post to this group, send an email to sage-devel@googlegroups.com 
> > To unsubscribe from this group, send an email to 
> > sage-devel+unsubscr...@googlegroups.com 
> > For more options, visit this group at 
> > http://groups.google.com/group/sage-devel 
> > URL: http://www.sagemath.org 
>

-- 
-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Re: Code duplication and aliases in methods

Reply via email to