Robin:

The key to victory was pointer arithmetics as I notices that I have used them in the C++ implementation, too. xD

Perhaps with LDC2 it's not necessary.


I will just post the whole code again so that you can see what I have changed.

The code looks better.

There's no need to put Dimension in another module. In D modules
contain related stuff, unlike Java.

Also feel free to use some free-standing functions. With UFCS
they get used in the same way, and they help making
classes/structs simpler.

Some of your imports could be moved to more local scopes, instead
of being all at module-level.


               result ~= to!string(this[r, c]);

=>

                 result ~= this[r, c].text;


writeln("\tTime required: " ~ to!string(secs) ~ " secs, " ~ to!string(msecs) ~ " msecs");

=>

writeln("\tTime required: ", secs, " secs, ", msecs, " msecs");


ref Matrix opOpAssign(string s)(in T scalar) pure nothrow if (s == "*") {

=>

     ref Matrix opOpAssign(string op)(in T scalar) pure nothrow if
(op == "*") {


Also you have two functions with code like:

this.data[] op= scalar;

You can define a single template (untested):


     /**
      * Adds or subtracts all entries of this matrix by the given
other matrix.
      */
     ref Matrix opOpAssign(string op)(const ref Matrix other) pure
nothrow
     if (op == "+" || op == "-")
     in
     {
         assert(this.dim == other.dim);
     }
     body
     {
         mixin("this.data[] " ~ op ~ "= other.data[];");
         return this;
     }



Matrix opBinary(string s)(const ref Matrix other) const pure if (s == "*")

Given that on a 32 bit system a Matrix is just 16 bytes, it could
be better to not accept the argument by ref, and avoid one more
level of indirection:

     Matrix opBinary(string s)(in Matrix other) const pure if (s
== "*")


However, there were some strange things of which I am very confused ...

void allocationTest() {
        writeln("allocationTest ...");
        sw.start();
        auto m1 = Matrix!double(10000, 10000);
        { auto m2 = Matrix!double(10000, 10000); }
        { auto m2 = Matrix!double(10000, 10000); }
        { auto m2 = Matrix!double(10000, 10000); }
        //{ auto m2 = Matrix!double(10000, 10000); }
        sw.stop();
        printBenchmarks();
}

This is the most confusing code snippet. I have just changed the whole allocation for all m1 and m2 from new Matrix!double (on heap) to Matrix!double (on stack)

The matrix data is always on the heap. What ends on the stack is
a very limited amount of stuff.


This is extremely confusion as I allocate these matrices on the stack and since I have allocated them within their own scoped-block they should instantly release their memory

You are mistaken. minimallyInitializedArray allocates memory on
the GC heap (and there's not enough space on the stack to
allocate 10000^2 doubles). In both D and Java the deallocation of
memory managed by the GC is not deterministic, so it's not
immediately released at scope exit. Also unlike the Java GC, the
D GC is less refined, and by design it is currently not precise.
With such large arrays often there are _inbound_ pointers that
keep the memory alive, especially on 32 bit systems. So perhaps
your problems are caused by the GC.

You can have deterministic memory management in your struct-based
matrices, but you have to allocate the memory manually (from the
GC or probably better from the C heap) and free it in the struct
destructor usign RAII.


Another strange things was that the new opEquals implementation:

        bool opEquals(const ref Matrix other) const pure nothrow {
                if (this.dim != other.dim) {
                        return false;
                }
                foreach (immutable i; 0 .. this.dim.size) {
                        if (this.data[i] != other.data[i]) return false;
                }
                return true;
        }

is actually about 20% faster than the one you have suggested. With the single line of "return (this.dim == other.dim && this.data[] == other.data[]).

I think this small performance bug is fixed in dmd 2.065 that is
currently in beta3.


The last thing I haven't quite understood is that I tried to replace

auto t = Matrix(other).transposeAssign();

in the matrix multiplication algorithm with its shorter and clearer form

auto t = other.transpose(); // sorry for the nasty '()', but I like them! :/

This however gave me wonderful segmentation faults on runtime while using the matrix multiplication ...

This looks like a null-related bug.

I'll benchmark your code a little, but I think I don't have as
much RAM as you.

Bye,
bearophile

Reply via email to