Re: [sage-devel] Re: Code duplication and aliases in methods

Robert Bradshaw Mon, 09 Jul 2012 12:20:26 -0700

The problem with Graph vs. BipartiteGraph is because the one extends
from the other. Instead, both should descend from a common base class
that implements the common functionality. Something like add_edge
would not belong in this baseclass, and algorithms that take advantage
of special graph functionality would override the relevant methods.


As for the issues of mutability, I think the builder pattern could
come in handy here. This could either be explicit or perhaps even
cleverly done implicitly with copy-on-write semantics. Properties like
bipartiteness could be checked on creation, once, after all edges are
added. For example

    g = 
GraphBuilder(n).add_edge(...).add_edge(...).asBipartiteGraph(vertex_partition)

At the very least a set_immutable() method could be provided to make
them hashable and perhaps even transform into an optimized
representation.

Another idea I haven't seen floated is storing the properties of a
graph as a set of flags, asserted at creation (or cleared when the
graph is modified). You could then dynamically look up the best method
to use for any algorithm by consulting this flag list (and perhaps
some standard naming scheme for automatic lookup).

- Robert


On Tue, Jun 19, 2012 at 2:06 PM, Nathann Cohen <nathann.co...@gmail.com> wrote:
> Hellooooooo everybody !!
>
> As in the best times, I tried 5 times to answer all who answered this
> thread, got angry, erased the message, and began again. But Robert Miller
> has a nice saying when we get to such situations : "shut the fuck up and
> show me the code". So that's what I will try to do. Give examples instead of
> wondering aloud whether the world has gone mad.
>
> Be warned, for it is a long mail (I --love-- long emails).  It also gives
> some idea of what is happening in Sage's graph library these days, if
> anybody (but me) cares :-)
>
> 1) Some history : the BipartiteGraph class.
> -----------------------------------------------------------
> The purpose of this class was to *at long last* have Graph classes that
> represent graph properties. There are two possibilities when writing such a
> thing :
> a) make it inherit all the Graph functions
> b) make it a new class
> The problem when picking b) is that this class gets totally useless as it
> has none of the interesting methods of Graph. So a) was picked, and of
> course most if not all methods of BipartiteGraph break when called on a
> BipartiteGraph object.
> The reason why is easy : some of the most fundamental graph operations are
> invalid. Like add_edge.
> I assure you that not being able to add an edge in a graph is *very*
> problematic.
> The code ?
>
> Try the following :
> sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).complement()
>
> Of course, the complement of a bipartite graph needs not be bipartite, right
> ? Yeah, but the graph methods do not know that, nor should they mind most of
> the time. Try that :
>
> sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).independent_set()
>
> Basically all graphs functions would need to be rewritten to work with
> BipartiteGraph. The only sane design choice would be for BipartiteGraph
> *not* to inherit Graph methods, but of course it would be pretty empty and
> all methods would need to be copied anyway.
>
> There is another problem with Bipartite Graph. The add_edge function needs
> to be wrapped so that no edge is added to the graph that creates an odd
> cycle. This is a costly operation. *VERY* costly operation (compared to
> add_edge, of course). And so it was not done this way, for efficiency. Hence
> instead of checking whether the graph "plus the new edge" stays bipartite,
> the vertices are bicolored from the start, and an exception is raised when
> add_edge tries to add an edge that does not fit *the former bicoloring*.
> That's mostly incorrect, does not correspond to the user's intent, but it
> is....faster >_<
>
> sage: g = BipartiteGraph(5)
> sage: g.add_edge(0,1)
>
> This could could be fixed by checking that the graph remains bipartite each
> time an edge is added, but of course that comes with a cost, and a cost that
> it is not healthy to pay when you *just* want to add an edge. The most
> natural explanation behind is that you usually know what you are doing when
> you add an edge to a graph, and that you should know for yourself whether
> you graph remains bipartite or not. If not, please check it with
> is_bipartite when you need it, but *do not make everybody's code 10000 times
> slower out of sheer laziness*.
>
> At some point somebody proposed to override the complement() function so
> that the complement of a graph stays bipartite : this can be achieved by
> switching edges and nonedges *between the two sets* and nowhere else. Though
> this operation makes perfect sense in graph theory, it would of course be
> very dangereous to replace the .complement() function by doing that, for all
> algorithms that take a BipartiteGraph as an instance and compute graph
> complements would then return wrong result *and not know it*.
>
> Sooooooo, these are the problems created by a property we want to keep on a
> graph. It is not *yet* about immutable graphs, because NONE of these methods
> would be available at all. This would also prevent you from running any
> method that think they can use the other ones, even when they do not
> actually change the graph itself (like independent_set()).
>
> I said it several times, but for me this class should be removed. It is not
> just bad from the computational point of view, but it also creates problems
> that new users have no idea how to cope with, because what the methods do is
> not what they have in mind.
>
> 2) Immutable graphs : storing the graph's properties
> -----------------------------------------------------------------------
>
> Recently in Sage was merged a patch which is an interface with the graph
> class database ISGCI [1,2]. There are many "graph classes" (let us
> understand this as 'graph property' for the length of this discussion), and
> this database can tell us to some extend which properties are implied by
> others. When I first wrote this patch, what I had in mind is precisely what
> Dima mentionned : some algorithms work for general graphs and have a bad
> running time, some other algorithms have a better running time because they
> are only meant to be run on graphs with a specific property [3].
> This is tempting, because the graph theory litterature is full of this kind
> of algorithms "which work only if my graph has this and this and that".
> Hence the goal was to run on each graph the implementation most adapted to
> it.
>
> ~~~~~~
> By the way Dima, there are "very sane ways" to write perfect-graph-specific
> algorithms in Graph.py, hence that can be accessed for all graphs. What
> would you think of :
> Graph.independent_set(algorithm = "perfect_graph") ?
> It would not be the default algorithm, it would not be run by mistake, and
> because perfect independent set are not only interesting for perfect graphs,
> this is actually what would make the most sense. No new class, no overhead
> over the most important graph functions...
> ~~~~~~
>
> Of course, we cannot attempt, each time independent_set is run, to check all
> graphs properties for which we have a specific algorithm. Unless the current
> implementation (very bad, but -correct-) is changed, it is very costly to
> check whether a graph is perfect. Much more than to compute an independent
> set, actually.
>
> Hence it would be nice to have a class of immutable graphs, to which could
> be associated (cached) some properties that would not have to be computed
> over and over afterwards. The most natural question, now : what about the a)
> and b) options from the BipartiteGraph class ? Should they inherit all
> methods, or none ?
> My personal advice is clear --> absolutely none. Unless somebody can think
> of a way to avoid all the problems we had with BipartiteGraph. Hence
> immutable graphs would only be used for "storage purposes", and the combinat
> guys already (repeatedly) asked for such a thing, as they sometimes need
> their graphs to be hashable (and to store them in sets, or as dictionary
> keys). It also happened to me recently, and I had to store spar6_string
> instead. Not so bad, but not a good answer either.
> I also wrote some kind of static graph structure myself -- low level ones --
> because you can perform some optimizations when you know from the start that
> the graph will not be modified. Having a static graph structure thus is a
> way to have a faster data structure, which we use for instance to compute
> distances between all pairs of vertices [4].
> In particular, the purpose of these speedups is to save the time it takes to
> call has_edge, or to iterate over the neighbors of a graph.
> In particular, it is totally unthinkable that graphs you actually want to
> compute things on could be immutable. We already lose time compared to
> networkX for stupid things like add/remove edges, there is no way on earth
> that add_edge would copy the whole thing.
>
> Dima, I have been working with Thomas on groups recently, with GAP and
> Magma, which was pretty new to me. Of course their objects are usually
> immutable, of course you create them once and never touch them, but the way
> graph theory works is totally different. We spend our lifetimes adding and
> removing edges, and vertices, and building a complete graph on n vertices
> edge by edge *cannot* take n^4 time, which is what happens if you copy the
> graph each time you want to add an edge.
> Honestly. All algorithms are built like that. Any greedy algorithm works
> like that, and graph theory has a lot of them. If you want to compute
> matchings, if you want to compute decompositions, god, everything is being
> done step by step, corrected 10 times, and you just cannot afford to copy
> your whole data each time.
>
> 3) Splitting the files
> ---------------------------
>
> We have something like "three files" for graph methods (data structures
> excluded). generic_graph, graph, and digraph. The graph one is pretty thick,
> and should be even thicker considering that some methods stored in
> generic_graph should be moved to graph.py (old mistakes, most probably
> mine). I do not see among the methods any clear way to split them (I already
> had to list all graphs methods thematically twice, each time I reached the
> same conclusion -- that it is totally arbitrary and unnatural), so I will
> refrain from doing it myself. Of course anybody else can write a patch doing
> that, which somebody else will review, Sage is a free software so my
> personal advice does not matter much. But I do not see how it could be done
> smartly.
> Some recent additions to Sage's graph library were done in modules. These
> functions are not necessarily made available through functions from the
> Graph class, but can be imported.
> I think that it is the best way to procede now that the algorithms we add
> get more and more specific, somehow "research-level", at least in the view I
> have of graphs ("topological things", like connectivity, distances, sparse
> graphs, that stuff).
>
> Having a large amount of graph function available in modules only is also a
> way to get the users used to query the documentation when they wonder what
> Sage can do, and not only the automatic completion of "g.<tab>". Many graph
> methods are not that fundamental anyway, and the same way that it would not
> make sense to import them systematically to Graph, there is no sense in
> creating a "GraphThatYouWantToColor" class whose methods would be the
> coloring-related stuff (currently stored in graph_coloring.py).
>
> Hence, as this thread originally suggested, I think the best is to work with
> modules, and to lessen the load of Graph.py by "importing" the useful
> functions there, which just requires one line (thank you Florent for your
> trick !!!).
> I am sorry to only think of the matrix0, matrix1, matrix2 and matrix3 files
> as a joke. The only reason I see to keep them this way is that "we always
> kept them this way", and that people are usually scared of backward
> compatibility. Of course, I do not know this code at all, so I will not come
> and dirty it with my hands.
> Jason suggested that it originally came from a technical problem in Cython,
> and that makes sense. Honestly this is just a terrible design, and unless
> you are forced to do it, you just don't. That's common sense. You do not use
> object-oriented programming to create files matrix0...matrix3. That's
> confusing, that's...
> Come on, that's just a mistake.
> It shows.
>
> I mean.... The thing is that for me importing stuff from module to classes
> just solves everything at no cost. Of course I may have missed things, in
> which case somebody will probably tell me what eventually, but why look
> anywhere else if we have an answer ? Is there a bad side to this way of
> doing ? I work with group theory these days, so I am used to missing obvious
> things :-)
>
> -------------
>
> Well. By the academic world's standard this mail is long only forgot
> unimportant things, so I better resume filling boxes as I will be leaving
> Belgium the day after tomorrow. Gosh, my appartment is a mess ^^;
>
> Oh, and by the way I expect to be very close to totally unreachable for a
> long period, probably two months and a half, from early july to end of
> september. If anybody around happens to be in southeast Asia during that
> time (Thailand/Vietnam/Cambodge/Laos), I will be backpacking there, let's
> meet somewhere !!!
>
> Haaaaaaaaaaaaaaaaaaaaaaaaaaave
> fuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuun ! :-)
>
> Nathann
>
> [1] http://www.graphclasses.org/
> [2] http://combinat.sagemath.org/doc/reference/sage/graphs/isgci.html
> [3] Of course Dima I only talked about methods *implemented* in Sage
> [4]
> http://www.sagemath.org/doc/reference/sage/graphs/distances_all_pairs.html
> [5]
> http://www.sagemath.org/doc/reference/sage/graphs/base/static_sparse_graph.html
>
> --
> To post to this group, send an email to sage-devel@googlegroups.com
> To unsubscribe from this group, send an email to
> sage-devel+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org

-- 
-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Re: Code duplication and aliases in methods

Reply via email to