Where is the time spent in the algorithm? I assume that most of it is in the matrix manipulation work not the orchestration of finding a pivot and reducing based on that. I.e. `elim-rows!` is the expensive part. Given that you only specialized the outer part of the loop, I wouldn't expect huge performance changes.
On Sun, May 11, 2014 at 2:13 PM, Jens Axel Søgaard <jensa...@soegaard.net> wrote: > I tried restricting the matrix-solve and matrix-gauss-elim to (Matrix Flonum). > I can't observe a change in the timings. > > #lang typed/racket > (require math/matrix > math/array > math/private/matrix/utils > math/private/vector/vector-mutate > math/private/unsafe > (only-in racket/unsafe/ops unsafe-fl/) > racket/fixnum > racket/list) > > (define-type Pivoting (U 'first 'partial)) > > (: flonum-matrix-gauss-elim > (case-> ((Matrix Flonum) -> (Values (Matrix Flonum) (Listof Index))) > ((Matrix Flonum) Any -> (Values (Matrix Flonum) (Listof Index))) > ((Matrix Flonum) Any Any -> (Values (Matrix Flonum) (Listof > Index))) > ((Matrix Flonum) Any Any Pivoting -> (Values (Matrix > Flonum) (Listof Index))))) > (define (flonum-matrix-gauss-elim M [jordan? #f] [unitize-pivot? #f] > [pivoting 'partial]) > (define-values (m n) (matrix-shape M)) > (define rows (matrix->vector* M)) > (let loop ([#{i : Nonnegative-Fixnum} 0] > [#{j : Nonnegative-Fixnum} 0] > [#{without-pivot : (Listof Index)} empty]) > (cond > [(j . fx>= . n) > (values (vector*->matrix rows) > (reverse without-pivot))] > [(i . fx>= . m) > (values (vector*->matrix rows) > ;; None of the rest of the columns can have pivots > (let loop ([#{j : Nonnegative-Fixnum} j] [without-pivot > without-pivot]) > (cond [(j . fx< . n) (loop (fx+ j 1) (cons j > without-pivot))] > [else (reverse without-pivot)])))] > [else > (define-values (p pivot) > (case pivoting > [(partial) (find-partial-pivot rows m i j)] > [(first) (find-first-pivot rows m i j)])) > (cond > [(zero? pivot) (loop i (fx+ j 1) (cons j without-pivot))] > [else > ;; Swap pivot row with current > (vector-swap! rows i p) > ;; Possibly unitize the new current row > (let ([pivot (if unitize-pivot? > (begin (vector-scale! (unsafe-vector-ref rows i) > (unsafe-fl/ 1. pivot)) > (unsafe-fl/ pivot pivot)) > pivot)]) > (elim-rows! rows m i j pivot (if jordan? 0 (fx+ i 1))) > (loop (fx+ i 1) (fx+ j 1) without-pivot))])]))) > > (: flonum-matrix-solve > (All (A) (case-> > ((Matrix Flonum) (Matrix Flonum) -> (Matrix Flonum)) > ((Matrix Flonum) (Matrix Flonum) (-> A) -> (U A (Matrix > Flonum)))))) > (define flonum-matrix-solve > (case-lambda > [(M B) (flonum-matrix-solve > M B (λ () (raise-argument-error 'flonum-matrix-solve > "matrix-invertible?" 0 M B)))] > [(M B fail) > (define m (square-matrix-size M)) > (define-values (s t) (matrix-shape B)) > (cond [(= m s) > (define-values (IX wps) > (parameterize ([array-strictness #f]) > (flonum-matrix-gauss-elim (matrix-augment (list M B)) #t #t))) > (cond [(and (not (empty? wps)) (= (first wps) m)) > (submatrix IX (::) (:: m #f))] > [else (fail)])] > [else > (error 'flonum-matrix-solve > "matrices must have the same number of rows; given ~e and > ~e" > M B)])])) > > (: mx Index) > (define mx 600) > > (: r (Index Index -> Flonum)) > (define (r i j) (random)) > > (: A : (Matrix Flonum)) > (define A (build-matrix mx mx r)) > > (: sum : Integer Integer -> Flonum) > (define (sum i n) > (let loop ((j 0) (acc 0.0)) > (if (>= j mx) acc > (loop (+ j 1) (+ acc (matrix-ref A i j))) ))) > > (: b : (Matrix Flonum)) > (define b (build-matrix mx 1 sum)) > > (time > (let [(m (flonum-matrix-solve A b))] > (matrix-ref m 0 0))) > (time > (let [(m (matrix-solve A b))] > (matrix-ref m 0 0))) > > (time > (let [(m (flonum-matrix-solve A b))] > (matrix-ref m 0 0))) > (time > (let [(m (matrix-solve A b))] > (matrix-ref m 0 0))) > > (time > (let [(m (flonum-matrix-solve A b))] > (matrix-ref m 0 0))) > (time > (let [(m (matrix-solve A b))] > (matrix-ref m 0 0))) > > 2014-05-11 21:48 GMT+02:00 Neil Toronto <neil.toro...@gmail.com>: >> The garbage collection time is probably from cleaning up boxed flonums, and >> possibly intermediate vectors. If so, a separate implementation of Gaussian >> elimination for the FlArray type would cut the GC time to nearly zero. >> >> Neil ⊥ >> >> >> On 05/11/2014 01:36 PM, Jens Axel Søgaard wrote: >>> >>> Or ... you could take a look at >>> >>> >>> https://github.com/plt/racket/blob/master/pkgs/math-pkgs/math-lib/math/private/matrix/matrix-gauss-elim.rkt >>> >>> at see if something can be improved. >>> >>> /Jens Axel >>> >>> >>> 2014-05-11 21:30 GMT+02:00 Jens Axel Søgaard <jensa...@soegaard.net>: >>>> >>>> Hi Eduardo, >>>> >>>> The math/matrix library uses the arrays from math/array to represent >>>> matrices. >>>> >>>> If you want to try the same representation as Bigloo, you could try Will >>>> Farr's >>>> matrix library: >>>> >>>> >>>> http://planet.racket-lang.org/package-source/wmfarr/simple-matrix.plt/1/1/planet-docs/simple-matrix/index.html >>>> >>>> I am interested in hearing the results. >>>> >>>> /Jens Axel >>>> >>>> >>>> >>>> 2014-05-11 21:18 GMT+02:00 Eduardo Costa <edu50...@gmail.com>: >>>>> >>>>> What is bothering me is the time Racket is spending in garbage >>>>> collection. >>>>> >>>>> ~/wrk/scm/rkt/matrix$ racket matrix.rkt >>>>> 0.9999999999967226 >>>>> cpu time: 61416 real time: 61214 gc time: 32164 >>>>> >>>>> If I am reading the output correctly, Racket is spending 32 seconds out >>>>> of >>>>> 61 seconds in garbage collection. >>>>> >>>>> I am following Junia Magellan's computer language comparison and I >>>>> cannot >>>>> understand why Racket needs the garbage collector for doing Gaussian >>>>> elimination. In a slow Compaq/HP machine, solving a system of 800 linear >>>>> equations takes 17.3 seconds in Bigloo, but requires 58 seconds in >>>>> Racket, >>>>> even after removing the building of the linear system from >>>>> consideration. >>>>> Common Lisp is also much faster than Racket in processing arrays. I >>>>> would >>>>> like to point out that Racket is very fast in general. The only occasion >>>>> that it lags badly behind Common Lisp and Bigloo is when one needs to >>>>> deal >>>>> with arrays. >>>>> >>>>> Basically, Junia is using Rasch method to measure certain latent traits >>>>> of >>>>> computer languages, like productivity and coaching time. In any case, >>>>> she >>>>> needs to do a lot of matrix calculations to invert the Rasch model. >>>>> Since >>>>> Bigloo works with homogeneous vectors, she wrote a few macros to access >>>>> the >>>>> elements of a matrix: >>>>> >>>>> (define (mkv n) (make-f64vector n)) >>>>> (define $ f64vector-ref) >>>>> (define $! f64vector-set!) >>>>> (define len f64vector-length) >>>>> >>>>> (define-syntax $$ >>>>> (syntax-rules () >>>>> (($$ m i j) (f64vector-ref (vector-ref m i) j)))) >>>>> >>>>> (define-syntax $$! >>>>> (syntax-rules () >>>>> (($$! matrix row column value) >>>>> ($! (vector-ref matrix row) column value)))) >>>>> >>>>> I wonder whether homogeneous vectors would speed up Racket. In the same >>>>> computer that Racket takes 80 seconds to build and invert a system of >>>>> equations, Bigloo takes 17.3 seconds, as I told before. Common Lisp is >>>>> even >>>>> faster. However, if one subtracts the gc time from Racket's total time, >>>>> the >>>>> result comes quite close to Common Lisp or Bigloo. >>>>> >>>>> ~/wrk/bgl$ bigloo -Obench bigmat.scm -o big >>>>> ~/wrk/bgl$ time ./big >>>>> 0.9999999999965746 1.000000000000774 0.9999999999993039 >>>>> 0.9999999999982576 >>>>> 1.000000000007648 0.999999999996588 >>>>> >>>>> real 0m17.423s >>>>> user 0m17.384s >>>>> sys 0m0.032s >>>>> ~/wrk/bgl$ >>>>> >>>>> Well, bigloo may perform global optimizations, but Common Lisp doesn't. >>>>> When >>>>> one is not dealing with matrices, Racket is faster than Common Lisp. I >>>>> hope >>>>> you can tell me how to rewrite the program in order to avoid garbage >>>>> collection. >>>>> >>>>> By the way, you may want to know why not use Bigloo or Common Lisp to >>>>> invert >>>>> the Rasch model. The problem is that Junia and her co-workers are using >>>>> hosting services that do not give access to the server or to the >>>>> jailshell. >>>>> Since Bigloo requires gcc based compilation, Junia discarded it right >>>>> away. >>>>> Not long ago, the hosting service stopped responding to the sbcl Common >>>>> Lisp >>>>> compiler for reasons that I cannot fathom. Although Racket 6.0 stopped >>>>> working too, Racket 6.0.1 is working fine. This left Junia, her >>>>> co-workers >>>>> and students with Racket as their sole option. As for myself, I am just >>>>> curious. >>>>> >>>>> >>>>> 2014-05-11 6:23 GMT-03:00 Jens Axel Søgaard <jensa...@soegaard.net>: >>>>> >>>>>> 2014-05-11 6:09 GMT+02:00 Eduardo Costa <edu50...@gmail.com>: >>>>>>> >>>>>>> The documentation says that one should expect typed/racket to be >>>>>>> faster >>>>>>> than >>>>>>> racket. I tested the math/matrix library and it seems to be almost as >>>>>>> slow >>>>>>> in typed/racket as in racket. >>>>>> >>>>>> >>>>>> What was (is?) slow was a call in an untyped module A to a function >>>>>> exported >>>>>> from a typed module B. The functions in B must check at runtime that >>>>>> the values coming from A are of the correct type. If the A was written >>>>>> in Typed Racket, the types would be known at compile time. >>>>>> >>>>>> Here math/matrix is written in Typed Racket, so if you are writing an >>>>>> untyped module, you will in general want to minimize the use of,say, >>>>>> maxtrix-ref. Instead operations that works on entire matrices or >>>>>> row/columns are preferred. >>>>>> >>>>>>> (: sum : Integer Integer -> Flonum) >>>>>>> (define (sum i n) >>>>>>> (let loop ((j 0) (acc 0.0)) >>>>>>> (if (>= j mx) acc >>>>>>> (loop (+ j 1) (+ acc (matrix-ref A i j))) ))) >>>>>>> >>>>>>> (: b : (Matrix Flonum)) >>>>>>> (define b (build-matrix mx 1 sum)) >>>>>> >>>>>> >>>>>> The matrix b contains the sums of each row in the matrix. >>>>>> Since matrices are a subset of arrays, you can use array-axis-sum, >>>>>> which computes sum along a given axis (i.e. a row or a column when >>>>>> speaking of matrices). >>>>>> >>>>>> (define A (matrix [[0. 1. 2.] >>>>>> [3. 4. 5.] >>>>>> [6. 7. 8.]])) >>>>>> >>>>>>> (array-axis-sum A 1) >>>>>> >>>>>> - : (Array Flonum) >>>>>> (array #[3.0 12.0 21.0]) >>>>>> >>>>>> However as Eric points out, matrix-solve is an O(n^3) algorithm, >>>>>> so the majority of the time is spent in matrix-solve. >>>>>> >>>>>> Apart from finding a way to exploit the relationship between your >>>>>> matrix A and the column vector b, I see no obvious way of >>>>>> speeding up the code. >>>>>> >>>>>> Note that when you benchmark with >>>>>> >>>>>> time racket matrix.rkt >>>>>> >>>>>> you will include startup and compilation time. >>>>>> Therefore if you want to time the matrix code, >>>>>> insert a literal (time ...) call. >>>>>> >>>>>> -- >>>>>> Jens Axel Søgaard >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> -- >>>> Jens Axel Søgaard >>> >>> >>> >>> >> >> ____________________ >> Racket Users list: >> http://lists.racket-lang.org/users > > > > -- > -- > Jens Axel Søgaard > > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users