On 8/12/13 10:55 AM, Evan Ward wrote: > On 08/12/2013 12:19 PM, Phil Steitz wrote: >> On 8/12/13 9:04 AM, Evan Ward wrote: >>> On 08/09/2013 06:55 PM, Gilles wrote: >>>>>> [...] >>>>>> >>>>>>> This does make it important to decide on a well written and >>>>>>> complete API before releasing it. >>>>>> When the scope of the software is well circumscribed, that would be >>>>>> possible. With the whole of [Math]ematics, much less so. :-} >>>>>> And state-of-the-art in Java is a moving target, aimed at by changing >>>>>> CM contributors with differing needs and tastes; this adds to the >>>>>> unstable mix. >>>>> That's a good point. I still prefer the interface design (though I may >>>>> be in the minority) for two reasons. First, if a concrete class only >>>>> publicly exposes the methods defined in an interface it encourages >>>>> polymorphism. User code that uses one implementation can be easily >>>>> switched to another and new implementations are less constrained. >>>>> Second, it encourages composition over inheritance. I agree with Josh >>>>> Bloch that composition produces more maintainable code. Adding new >>>>> methods to an existing interface/class breaks composition. >>>> The "problem" is to get the interface right. As it happens, at some point >>>> we discover something that was not foreseen; and to correct/improve the >>>> design, compatibility must be broken. >>>> [But refactoring is not a failure of development; it's part of it.] >>>> >>>>> I think the interface/abstract class discussion is partially separable >>>>> from the immutable/mutable discussion. I see the algorithm as the part >>>>> that could really benefit from the polymorphism. Perhaps separating the >>>>> problem definition (data) from the algorithm will improve the >>>>> flexibility of the API. For example, >>>>> >>>>> PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){ >>>>> //define problem to solve in an independent object >>>>> NLLSProblem p = new NLLSProblem(/*model functions, weights, >>>>> convergence checker, ...*/); >>>>> >>>>> //provide algorithm with the data it needs >>>>> //algorithm has no problem specific state >>>>> return opt.optimize(p); >>>>> } >>>> I may be missing something, but how much better is it to store >>>> everything the optimizer needs in yet another class? >>>> [Then, that's a possible approach, but it's not what we started >>>> from in Commons Math, and when trying to fix some inconsistency >>>> or removing duplicate code, I tried to retain what could be from >>>> the existing design.] >>> I've looked at the implementations of GN and LM and it seems that the >>> abstract classes are primarily concerned with evaluating the model, and >>> the concrete classes use the evaluation to compute the next step. I >>> think separating those two concerns could simplify the implementation of >>> the optimizers. The NLLSProblem class would provide methods to evaluate >>> the weighted Jacobian, residuals, etc. The concrete class would look >>> almost the same as they do today, except calls to the parent class would >>> be replaced with calls to the NLLSProblem class. >>> >>> The benefit is that the optimizer implementation/hierarchy would be >>> simpler and other classes could use the methods for model evaluation. I >>> don't think keeping the optimization algorithm and data in the same >>> class gains much since all the scratch space is reallocated on each call >>> to optimize. (Performance will be about the same either way.) >>> >>>>>>> [...] Thread safety is a tricky beast. I think we agree that the only >>>>>>> way to guarantee thread safety is to only depend on final concrete >>>>>>> classes that are thread safe themselves. >>>>>> I don't think so. When objects are immutable, thread-safety follows >>>>> It is somewhat off topic, but a counter example would be Vector3D. Since >>>>> the class is not final, a user could extend it and override all the >>>>> methods and add some set{X,Y,Z} methods to make it mutable. Even though >>>>> Vector3D is immutable, there is no _guarantee_ that every instanceof >>>>> Vector3D is immutable unless it is final. This is why String is final. >>>> I think I don't get your point: If someone extends a class that is safe >>>> in a way that the extension is unsafe, that's his problem. ;-) >>>> >>>>>>> [...] copying any large matrices or arrays is prohibitively >>>>>>> expensive. For the NLLS package we would be copying a pointer to a >>>>>>> function that can generate a large matrix. I think adding some >>>>>>> documentation that functions should be thread safe if you want to use >>>>>>> them from multiple threads would be sufficient. >>>>>> I you pass a "pointer" (i.e. a "reference" in Java), all bets are off: >>>>>> the >>>>>> class is not inherently thread-safe. That's why I suggested to >>>>>> mandate a >>>>>> _deep_ "copy" method (with a stringent contract that should allow a >>>>>> caller >>>>>> to be sure that all objects owned by an instance are disconnected from >>>>>> any >>>>>> other objects). >>>>> As someone who has designed a thread safe application based on deep >>>>> copying I don't think this is route to follow. A deep copy means you >>>>> have to be able to copy an arbitrary (possibly cyclical) reference >>>>> graph. Without the graph copy there are many subtle bugs. (References to >>>>> the same object are now references to different objects.) With the graph >>>>> copy the implementation is very complex. This is the reason >>>>> Serialization has a separate "patch up" step after object creation, >>>>> which leads to some nasty tricks/bugs. Similarly, Cloneable only >>>>> produces a shallow copy. Opinions may vary, but in my experience >>>>> immutability is an easier approach to thread safety, especially when you >>>>> have to depend on user code. >>>> I agree that using immutability is easier, but my point all along is that >>>> it is at odds with simplicity (which is aimed at with "fluent API"). >>>> And since >>>> 1. the internals of the optimizers are not thread-safe yet (see e.g. >>>> LevenbergMarquardtOptimizer"), and >>> Though the LM implementation looks like a mess in Java, the Fortran it >>> was translated from (thanks Luc) is purely procedural. The required >>> change would be allocating the arrays in doOptimize() as locals instead >>> of as instance variables. >>> >>> Phil Steitz wrote: >>>>> In my use case I have a class that solves several related, but >>>>> different, NLLS problems concurrently. I would like the optimization >>>>> algorithm to be a configurable dependency (GN or LM). Currently I have a >>>>> custom (thread safe) interface with two implementations that wrap the >>>>> commons math optimizers, in order to provide thread safety and >>>>> polymorphic access to all the options that both optimizers support. >>>> Can you explain this a little more? I am still not getting the use >>>> case requiring concurrent access to a single optimizer instance. It >>>> appears that you have one. It would be good to understand it better. >>> Sure. I have class that has to solve different NLLS problems >>> concurrently. The problems share the same convergence criteria, but use >>> different model functions. With the current implementation that means I >>> need one optimizer per thread. >> How bad is that, actually? That is what I am not getting. Why do >> you need a thread-safe facade when all you are doing is creating new >> instances per thread? Sorry if I am missing something simple here. >> >> <side rant>Historically, we did not care about thread-safety at all >> in [math], assuming the standard use case was *always* going to be >> one instance per thread. The statistics aggregators are an example >> where multithreaded access makes sense, but this is much more the >> exception than the rule in [math]. I would really like to get clear >> on which classes really need to be threadsafe themselves rather than >> blindly assuming that all do and insisting that everything be >> immutable so that we don't have to think about how to make things >> threadsafe.</side rant> >> >> Phil > How bad is it to use a library though wrapper classes? Well it is > annoying to have to design my own optimization API that delegates to the > C-M implementations.
Sorry, by "bad" I didn't mean the wrappers, I meant having to have a separate instance per thread. Rereading above, I think I get it now. Apart from API instability (which I agree is a problem) the use case is actually driven by the "common config" part - i.e., you need to separately encapsulate the common config somehow. For that, your suggestion to actually do this explicitly - i.e., create an optimization problem config object - makes sense to me. > From the posts on the users list I think other > people were confused by the old API and probably developed their own > wrappers to insulate themselves from it. When I saw the optimization > package was being redesigned in an backward incompatible way, I thought > it was an opportunity to address a range of concerns that are easiest to > address early in the design process. Luc's original fluent + immutable > proposal would have allowed thread safety. (If not initially, it could > be implemented through patches with no change to the API.) The > separation of algorithm and data would allow each to be much simpler > while still allowing for thread safety and a fluent API. (I.e. GN would > not need 5 superclasses and fields for storing upper and lower bounds.) > > With the API re-design you have the opportunity to design for thread > safety, which can be hard to add on to an existing API. > > Thanks for the whole discussion; It has given me some new ideas. :) Thank *you* for sharing your experience and ideas. Phil > > Regards, > Evan > >>> Since I don't want to hard code the >>> optimizer as a dependency (GN and LM both have their advantages) I >>> created an optimizer interface that was thread safe. Under the hood, the >>> thread safe optimize method just creates a new instance of GN or LM. >>> >>> Immutability, a builder API, or the copy method would all allow direct >>> use of the optimizer. I've been pushing the immutable method because I >>> think it is simpler to understand its behavior. With the "separation of >>> concerns" approach described above I think the optimizer could be easily >>> made immutable. The NLLSProblem class would be easy to make immutable + >>> fluent as well since it is a concrete class. (though it could have a >>> builder or a mutable + fluent API) >>> >>> Best Regards, >>> Evan >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org