Re: [Math] Fluent API, inheritance and immutability

Phil Steitz Mon, 12 Aug 2013 12:26:33 -0700

On 8/12/13 10:55 AM, Evan Ward wrote:
> On 08/12/2013 12:19 PM, Phil Steitz wrote:
>> On 8/12/13 9:04 AM, Evan Ward wrote:
>>> On 08/09/2013 06:55 PM, Gilles wrote:
>>>>>> [...]
>>>>>>
>>>>>>> This does make it important to decide on a well written and
>>>>>>> complete API before releasing it.
>>>>>> When the scope of the software is well circumscribed, that would be
>>>>>> possible. With the whole of [Math]ematics, much less so. :-}
>>>>>> And state-of-the-art in Java is a moving target, aimed at by changing
>>>>>> CM contributors with differing needs and tastes; this adds to the
>>>>>> unstable mix.
>>>>> That's a good point. I still prefer the interface design (though I may
>>>>> be in the minority) for two reasons. First, if a concrete class only
>>>>> publicly exposes the methods defined in an interface it encourages
>>>>> polymorphism. User code that uses one implementation can be easily
>>>>> switched to another and new implementations are less constrained.
>>>>> Second, it encourages composition over inheritance. I agree with Josh
>>>>> Bloch that composition produces more maintainable code. Adding new
>>>>> methods to an existing interface/class breaks composition.
>>>> The "problem" is to get the interface right. As it happens, at some point
>>>> we discover something that was not foreseen; and to correct/improve the
>>>> design, compatibility must be broken.
>>>> [But refactoring is not a failure of development; it's part of it.]
>>>>
>>>>> I think the interface/abstract class discussion is partially separable
>>>>> from the immutable/mutable discussion. I see the algorithm as the part
>>>>> that could really benefit from the polymorphism. Perhaps separating the
>>>>> problem definition (data) from the algorithm will improve the
>>>>> flexibility of the API. For example,
>>>>>
>>>>> PointVectorValuePair solveMyNLLSProblem(NLLSOptimizer opt){
>>>>>     //define problem to solve in an independent object
>>>>>     NLLSProblem p = new NLLSProblem(/*model functions, weights,
>>>>> convergence checker, ...*/);
>>>>>
>>>>>     //provide algorithm with the data it needs
>>>>>     //algorithm has no problem specific state
>>>>>     return opt.optimize(p);
>>>>> }
>>>> I may be missing something, but how much better is it to store
>>>> everything the optimizer needs in yet another class?
>>>> [Then, that's a possible approach, but it's not what we started
>>>> from in Commons Math, and when trying to fix some inconsistency
>>>> or removing duplicate code, I tried to retain what could be from
>>>> the existing design.]
>>> I've looked at the implementations of GN and LM and it seems that the
>>> abstract classes are primarily concerned with evaluating the model, and
>>> the concrete classes use the evaluation to compute the next step. I
>>> think separating those two concerns could simplify the implementation of
>>> the optimizers. The NLLSProblem class would provide methods to evaluate
>>> the weighted Jacobian, residuals, etc. The concrete class would look
>>> almost the same as they do today, except calls to the parent class would
>>> be replaced with calls to the NLLSProblem class.
>>>
>>> The benefit is that the optimizer implementation/hierarchy would be
>>> simpler and other classes could use the methods for model evaluation. I
>>> don't think keeping the optimization algorithm and data in the same
>>> class gains much since all the scratch space is reallocated on each call
>>> to optimize. (Performance will be about the same either way.)
>>>
>>>>>>> [...] Thread safety is a tricky beast. I think we agree that the only
>>>>>>> way to guarantee thread safety is to only depend on final concrete
>>>>>>> classes that are thread safe themselves.
>>>>>> I don't think so. When objects are immutable, thread-safety follows
>>>>> It is somewhat off topic, but a counter example would be Vector3D. Since
>>>>> the class is not final, a user could extend it and override all the
>>>>> methods and add some set{X,Y,Z} methods to make it mutable. Even though
>>>>> Vector3D is immutable, there is no _guarantee_ that every instanceof
>>>>> Vector3D is immutable unless it is final. This is why String is final.
>>>> I think I don't get your point: If someone extends a class that is safe
>>>> in a way that the extension is unsafe, that's his problem. ;-)
>>>>
>>>>>>> [...] copying any large matrices or arrays is prohibitively
>>>>>>> expensive. For the NLLS package we would be copying a pointer to a
>>>>>>> function that can generate a large matrix. I think adding some
>>>>>>> documentation that functions should be thread safe if you want to use
>>>>>>> them from multiple threads would be sufficient.
>>>>>> I you pass a "pointer" (i.e. a "reference" in Java), all bets are off:
>>>>>> the
>>>>>> class is not inherently thread-safe. That's why I suggested to
>>>>>> mandate a
>>>>>> _deep_ "copy" method (with a stringent contract that should allow a
>>>>>> caller
>>>>>> to be sure that all objects owned by an instance are disconnected from
>>>>>> any
>>>>>> other objects).
>>>>> As someone who has designed a thread safe application based on deep
>>>>> copying I don't think this is route to follow. A deep copy means you
>>>>> have to be able to copy an arbitrary (possibly cyclical) reference
>>>>> graph. Without the graph copy there are many subtle bugs. (References to
>>>>> the same object are now references to different objects.) With the graph
>>>>> copy the implementation is very complex. This is the reason
>>>>> Serialization has a separate "patch up" step after object creation,
>>>>> which leads to some nasty tricks/bugs. Similarly, Cloneable only
>>>>> produces a shallow copy. Opinions may vary, but in my experience
>>>>> immutability is an easier approach to thread safety, especially when you
>>>>> have to depend on user code.
>>>> I agree that using immutability is easier, but my point all along is that
>>>> it is at odds with simplicity (which is aimed at with "fluent API").
>>>> And since
>>>> 1. the internals of the optimizers are not thread-safe yet (see e.g.
>>>>    LevenbergMarquardtOptimizer"), and
>>> Though the LM implementation looks like a mess in Java, the Fortran it
>>> was translated from (thanks Luc) is purely procedural. The required
>>> change would be allocating the arrays in doOptimize() as locals instead
>>> of as instance variables.
>>>
>>> Phil Steitz wrote:
>>>>> In my use case I have a class that solves several related, but
>>>>> different, NLLS problems concurrently. I would like the optimization
>>>>> algorithm to be a configurable dependency (GN or LM). Currently I have a
>>>>> custom (thread safe) interface with two implementations that wrap the
>>>>> commons math optimizers, in order to provide thread safety and
>>>>> polymorphic access to all the options that both optimizers support.
>>>> Can you explain this a little more?  I am still not getting the use
>>>> case requiring concurrent access to a single optimizer instance.  It
>>>> appears that you have one.  It would be good to understand it better.
>>> Sure. I have class that has to solve different NLLS problems
>>> concurrently. The problems share the same convergence criteria, but use
>>> different model functions. With the current implementation that means I
>>> need one optimizer per thread.
>> How bad is that, actually?  That is what I am not getting.  Why do
>> you need a thread-safe facade when all you are doing is creating new
>> instances per thread?  Sorry if I am missing something simple here.
>>
>> <side rant>Historically, we did not care about thread-safety at all
>> in [math], assuming the standard use case was *always* going to be
>> one instance per thread.  The statistics aggregators are an example
>> where multithreaded access makes sense, but this is much more the
>> exception than the rule in [math].  I would really like to get clear
>> on which classes really need to be threadsafe themselves rather than
>> blindly assuming that all do and insisting that everything be
>> immutable so that we don't have to think about how to make things
>> threadsafe.</side rant>
>>
>> Phil
> How bad is it to use a library though wrapper classes? Well it is
> annoying to have to design my own optimization API that delegates to the
> C-M implementations.


Sorry, by "bad" I didn't mean the wrappers, I meant having to have a
separate instance per thread.  Rereading above, I think I get it
now.   Apart from API instability (which I agree is a problem) the
use case is actually driven by the "common config" part - i.e., you
need to separately encapsulate the common config somehow.  For that,
your suggestion to actually do this explicitly - i.e., create an
optimization problem config object - makes sense to me.
>  From the posts on the users list I think other
> people were confused by the old API and probably developed their own
> wrappers to insulate themselves from it. When I saw the optimization
> package was being redesigned in an backward incompatible way, I thought
> it was an opportunity to address a range of concerns that are easiest to
> address early in the design process. Luc's original fluent + immutable
> proposal would have allowed thread safety. (If not initially, it could
> be implemented through patches with no change to the API.) The
> separation of algorithm and data would allow each to be much simpler
> while still allowing for thread safety and a fluent API. (I.e. GN would
> not need 5 superclasses and fields for storing upper and lower bounds.)
>
> With the API re-design you have the opportunity to design for thread
> safety, which can be hard to add on to an existing API.
>
> Thanks for the whole discussion; It has given me some new ideas. :)

Thank *you* for sharing your experience and ideas.

Phil
>
> Regards,
> Evan
>
>>>  Since I don't want to hard code the
>>> optimizer as a dependency (GN and LM both have their advantages) I
>>> created an optimizer interface that was thread safe. Under the hood, the
>>> thread safe optimize method just creates a new instance of GN or LM.
>>>
>>> Immutability, a builder API, or the copy method would all allow direct
>>> use of the optimizer. I've been pushing the immutable method because I
>>> think it is simpler to understand its behavior. With the "separation of
>>> concerns" approach described above I think the optimizer could be easily
>>> made immutable. The NLLSProblem class would be easy to make immutable +
>>> fluent as well since it is a concrete class. (though it could have a
>>> builder or a mutable + fluent API)
>>>
>>> Best Regards,
>>> Evan
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Fluent API, inheritance and immutability

Reply via email to