Re: [math] Should this throw a NO_DATA exception?

Ole Ersoy Sat, 09 Jan 2016 10:54:13 -0800

HI,

On 01/09/2016 06:21 AM, Gilles wrote:
[...]

But we should know the target of the improvement.
I mean, is it a drop-in replacement of the current "RealVector"?

OK - I think it's probably confusing because I posted JDK8 examples earlier.  
I'm just wondering whether the current RealVector norm methods should throw a 
no data exception?  I think they should.


If so, how can it happen before we agree that Java 8 JDK can be
used in the next major release of CM?

At some point I'm sure CM will switch over, so we can start experimenting with 
features now.


If it's a redesign, maybe we should define a "wish list" of what
properties should belong to which concept.

I think that is a good inclusive approach for a community.  My primary wishes 
are:
- Remove inheritance when possible in order to keep it simple (Possibly at the 
expense of generic use)
- Design classes that are focused on doing small simple things
- Modularize (I could list all the benefits, but I think we know them).  The 
longer CM takes to do this the harder it will be.  Every single time someone 
sprinkles it FastMath it gets a little harder...

So in general just keep it simple.  If it needs to support other requirements 
then:

Reuse operations from a FunctionXXX class.  Support new forms of state in a new 
module.

E.g. for a "matrix" it might be useful to be mutable (as per
previous discussions on the subject),


I think the approach here should be very strict.  For example ArrayRealVector 
has almost half the code dedicated to mutation that can easily be done 
elsewhere.  I think this was done because CM is not modular.  We can't defer to 
an array module for the manipulations so they had to be baked into 
ArrayRealVector.

but for a (geometrical)
vector it might be interesting to not be (as in the case for
"Vector3D" in the "geometry" package).

The "matrix" concept probably requires a more advanced interface
in order to allow efficient implementations of basic operations
like multiplication...

Yes - For example when multiplying a sparce matrix times a sparce vector?  Or a 
normal vector times a sparce matrix?  Etc.  I'm hoping there's a very simple 
way to accomplish this outside of using inheritance.


There is a issue on the bug-tracking system that started to
collect many of the various problems (specific and general)
of data containers ("RealVector", "RealMatrix", etc.) of the
"o.a.c.m.linear" package.


Perhaps it should more useful, for future reference, to list
everything in one place.

Sure - I think in this case though we can knock it out fast.
Sometimes when we list everything in one place people look at it, get
a headache, and start drinking :).  To me it seems like a vector that
is empty (no elements) is different from having a vector with 1 or
more 0d entries.  In the latter case, according to the formula, the
norm is zero, but in the first case, is it?


To be on the safe side, it should be an error, but I've just had
to let this kind of condition pass (cf. MATH-1300 and related on
the implemenation of "nextBytes(byte[],int,int)" feature).


On Fri, 8 Jan 2016 18:41:27 -0600, Ole Ersoy wrote:

public double getLInfNorm() {
        double norm = 0;
        Iterator<Entry> it = iterator();
        while (it.hasNext()) {
            final Entry e = it.next();
            norm = FastMath.max(norm, FastMath.abs(e.getValue()));
        }
        return norm;
    }


The main problem with the above is that it assumes that the elements
of a "RealVector" are Cartesian coordinates.
There is no provision that it must be the case, and assuming it is
then in contradiction with other methods like "append".


While experimenting with the design of the current implementation I
ended up throwing the exception.  I think it's the right thing to do.
The net effect is that if someone creates a new ArrayVector(new
double[]{}), then the exception is thrown, so if they don't want it
thrown then they should new ArrayVector(new double[]{0}).   More
explanations of this design below ...


I don't know at this point (not knowing the intended usage).

One way to look at it is to say "Conceptually it is not correct, but we are using it 
in a way that eliminates this flaw, so it's OK". Which I don't think is OK, unless 
we can say conclusively and globally that it's OK for all users in all cases.  In this 
case I think returning a zero norm when there is no data is wrong, and can potentially 
lead to wrong results.


[I think this is low-level discussion that is not impacting on the
design but would fixe an API at a too early stage.]

Yes I see your point there.  Why patch the roof if the house is getting 
demolished in two weeks.  CM seems to be really nice to all the interested 
parties with respect to this though.  Ubuntu provides long term supported 
releases.  Fedora releases every six months and discontinuous updates for the 
previous releases, but CentOS picks up the slack there.


At first (and second and third) sight, I think that these container
classes should be abandoned and replaced by specific ones.
For example:
* Single "matrix" abstract type or interface for computations in
  the "linear" package (rather than "vector" and "matrix" types)
* Perhaps a "DoubleArray" (for such things as "append", etc.).
  And by the way, there already exists "ResizableDoubleArray" which
  could be a start.
* Geometrical vectors (that can perhaps support various coordinate
  systems)
* ...


I think we are thinking along the same lines here.  So far I have the
following:
A Vector interface with only these methods:
- getDimension()
- getEntry()
- setEntry()


And what is the concept that is being represented by this interface?

It's just a simple vector (The type we see in Wolfram documentation) [x1, x2, 
x3,.....xN].  Ideally it would just be an array, but it needs to throw a 
MathException.OUT_OF_RANGE when attempts are made to get or alter non existing 
entries.


I think that is necessary to list use-cases so that we don't again
come up with a design that may prove not specific enough to satisfy
some requirements of the purported audience.

We can do that.  With the above I'm just starting with the smallest building 
block possible.  We can add things if needed.  I'm hoping it's not needed, and 
that we can support additional requirements in another small class.

An ArrayVector implements Vector implementation where the one and
only constructor takes a double[] array argument.  The vector length
cannot be mutated.  If someone wants to do that they have to create a
new one.


Assuming we explore the 3 concepts I had listed above
* it cannot be "matrix" (since I supposed that a row or column matrix
  could be of type "matrix" not "vector")

So I think in general I prefer if one thing cannot be another.  It's simpler 
when the thing is just the thing.  With the latter it's easy to start getting 
convoluted.  Sometimes it's worth it ...but I think it's easier on designers, 
maintainers, and users most of the time when things are distinct.

So say we discover later that we think the Vector really should be a one 
dimensional Matrix.  That might be worth it, but ATM I don't see how to do it 
without making the function and vector interface more complex.

* it cannot be an appendable sequence, since the size is fixed.

That's fine (I think...maybe there's a case that shows that this adds a lot of 
overhead?...) because it is an Array based vector. The size is inherently 
fixed.  If we want to change the size grab the underlying array and change it.  
If the data structure is different, then still do the same thing.


* it cannot be a geometrical vector since "getEntry(int)" and
  "setEntry(int, double)" are too low level to ensure consistency
  under transformations (since we cannot assume that the entries
  would always be Cartesian coordinates).

Which I think is good.  I looked at OJAlgo and I found the use of generics a 
bit extreme.  Very little code is documented and the lack of simple examples 
suggests to me that it could be a lot simpler.


A VectorFunctionFactory class containing methods that return Function
and BiFunction instances that can be used to perform vector mapping
and reduction.  For example:

    /**
     * Returns a {@link Function} that produces the lInfNorm of the vector
     * {@code v} .
     *
     * Example {@code lInfNorm().apply(v);}
     * @throws MathException
     *             Of type {@code NO_DATA} if {@code v1.getDimension()} == 0.
     */
    public static Function<Vector, Double> lInfNorm() {
        return lInfNorm(false);
    };

    /**
     * Returns a {@link Function} that produces the lInfNormNorm of
the vector
     * {@code v} .
     *
     * Example {@code lInfNorm(true).apply(v);}
     *
     * @param parallel
     *            Whether to perform the operation in parallel.
     * @throws MathException
     *             Of type {@code NO_DATA} if {@code v.getDimension()}  == 0.
     *
     */
    public static Function<Vector, Double> lInfNorm(boolean parallel) {
        return (v) -> {
LinearExceptionFactory.checkNoVectorData(v.getDimension());
            IntStream stream = range(0, v.getDimension());
            stream = parallel ? stream.parallel() : stream;
            return stream.mapToDouble(i ->
Math.abs(v.getEntry(i))).max().getAsDouble();
        };
    }


This is a nice possibility, but without a purpose, it could seem that
you just move the "operations" from the container class to another one.

The primary purpose is that we can use any of those operations without needing 
an instance of a class or inheriting it.

It's cleaner, certainly, but could it be that the factory will end up
with as many conceptually incompatible operations as the current design?

Maybe?  My first goal is to be able to provide simple examples.  If there's 
something that can't be done, then I'll first design a simple API example that 
gets it done, and then consider how the implementation for that should be done. 
 Maybe that leads to some additional work, but I prefer that over cluttering 
classes or making them overly generic (Unless there's a really good strong 
reason).

So the design leaves more specialized structures like Sparce matrices
to a different module.  I'm not sure if this is the best design, but
so far I'm feeling pretty good about it.  WDYT?


So you were really working on the "matrix" design?

I'm looking at the whole linear package in general.


Did you look at what the requirements are for these structures
(e.g. for fast multiplication) and how they achieve it in other
packages (e.g. "ojalgo")?

Yes I did have a look and I will look more.  The lack of simple examples was a 
bit of a breaker for me.


If it's not about "matrix" but about blocks of (possibly multi-dimensional)
data that can be "mapped" and "reduced", perhaps that the one-dimensional
version (which seems what your new "Vector" is) should just be a special
case of an interface for this kind of structure (?).

Maybe.  My hunch is that as is it is very easy to work with and understand and 
that if something more complex is needed then it should be built in an other 
module.  I'm willing to change the whole approach if someone can demonstrate an 
example / concept that shows that a brunt of use cases cannot be satisfied 
without a lot of rework.

[It this latter case, the CM "MultidimensionalCounter" (in package "util")
might be something that can be reused (?).]

Does it fit with streams?  I'm seeing how far I can go with these ATM.  I need 
to get more educated on the Multidimensionalcounter.

Cheers,
Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Should this throw a NO_DATA exception?

Reply via email to