Ah, I see that I was mistaken about the timing. Sorry about that.
After a lot of fiddling around, I cam up with this faster form:
(defn countnl-lite
[#^bytes buf]
(areduce buf idx count (int 0)
(if (= (clojure.lang.RT/aget buf idx) 10)
(unchecked-add count 1)
count)))
Key points are initializing count to a primitive integer and directly
calling clojure's aget to avoid an unnecessary integer cast.
On my system:
The unmodified countnl function takes ~ 180 msecs
Without AOT compilation countnl-lite takes around 66 msecs
With AOT compilation countnl-lite takes ~46 msecs
The java method takes ~19 msecs.
I've lost a factor of 2.25 somewhere and it makes me sad that I can't find it.
I would be very interested if anyone could improve countnl-lite.
--Robert McIntyre
On Mon, Aug 30, 2010 at 8:41 PM, Alan <[email protected]> wrote:
> I think this misses the point. Of course java, c, and clojure will all
> have roughly the same wall-clock time for this program, since it is
> dominated by the I/O. You can even see that in the output from $ time
> java Iterate: less than 0.5s was spent in user space, the rest was
> spent in system code - that is, mostly doing I/O.
>
> The java version is a second faster as counted by the wall clock, and
> this is unlikely to be a coincidence: tsuraan's timing data suggests
> that the clojure program takes 80ms longer in each loop, and loops 10
> times. That comes out to 0.8 seconds, which is quite close to the
> differential you observed when timing from the command line.
>
> On Aug 30, 1:38 pm, Robert McIntyre <[email protected]> wrote:
>> I don't know what the heck is going here, but ignore the time the
>> program is reporting and just
>> pay attention to how long it actually takes wall-clock style and
>> you'll see that your clojure and
>> java programs already take the same time.
>>
>> Here are my findings:
>>
>> I saved Iterate.java into my rlm package and ran:
>> time java -server rlm.Iterate
>>
>> results:
>> time java -server rlm.Iterate
>> Wanted 16777216 got 16777216 bytes
>> counted 65341 nls in 27 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65310 nls in 27 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 66026 nls in 21 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65473 nls in 19 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65679 nls in 19 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65739 nls in 19 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65310 nls in 21 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65810 nls in 18 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65531 nls in 21 msec
>> Wanted 16777216 got 16777216 bytes
>> counted 65418 nls in 21 msec
>>
>> real 0m27.469s
>> user 0m0.472s
>> sys 0m26.638s
>>
>> I wrapped the last bunch of commands in your clojure script into a
>> (run) function:
>> (defn run []
>> (let [ifs (FileInputStream. "/dev/urandom")
>> buf (make-array Byte/TYPE *numbytes*)]
>> (dotimes [_ 10]
>> (let [sz (.read ifs buf)]
>> (println "Wanted" *numbytes* "got" sz "bytes")
>> (let [count (time (countnl buf))]
>> (println "Got" count "nls"))))))
>>
>> and ran
>> (time (run)) at the repl:
>>
>> (time (run))
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.081975 msecs"
>> Got 65894 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.001814 msecs"
>> Got 65949 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.061934 msecs"
>> Got 65603 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.031131 msecs"
>> Got 65563 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.122567 msecs"
>> Got 65696 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 182.968066 msecs"
>> Got 65546 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.058508 msecs"
>> Got 65468 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 182.932395 msecs"
>> Got 65872 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 183.074646 msecs"
>> Got 65498 nls
>> Wanted 16777216 got 16777216 bytes
>> "Elapsed time: 187.733636 msecs"
>> Got 65434 nls
>> "Elapsed time: 28510.331507 msecs"
>> nil
>>
>> Total running time for both programs is around 28 seconds.
>> The java program seems to be incorrectly reporting it's time.
>>
>> --Robert McIntyre
>>
>> On Mon, Aug 30, 2010 at 4:03 PM, tsuraan <[email protected]> wrote:
>> > Just to try to see if clojure is a practical language for doing
>> > byte-level work (parsing files, network streams, etc), I wrote a
>> > trivial function to iterate through a buffer of bytes and count all
>> > the newlines that it sees. For my testing, I've written a C version,
>> > a Java version, and a Clojure version. I'm running each routine 10
>> > times over a 16MB buffer read from /dev/urandom (the buffer is
>> > refreshed between each call to the newline counting function). With
>> > gcc -O0, I get about 80ms per 16MB buffer. With gcc -O3, I get ~14ms
>> > per buffer. With javac (and java -server) I get 20ms per 16MB buffer.
>> > With clojure, I get 105ms per buffer (after the jvm warms up). I'm
>> > guessing that the huge boost that java and gcc -O3 get is from
>> > converting per-byte operations to per-int ops; at least that ~4x boost
>> > looks like it would come from something like that. Is that an
>> > optimization that is unavailable to clojure? The java_interop doc
>> > makes it sound like java and clojure get the exact same bytecode when
>> > using areduce correctly, so maybe there's something I could be doing
>> > better. Here are my small programs; if somebody could suggest
>> > improvements, I'd appreciate them.
>>
>> > iterate.clj:
>>
>> > (set! *warn-on-reflection* true)
>> > (import java.io.FileInputStream)
>>
>> > (def *numbytes* (* 16 1024 1024))
>>
>> > (defn countnl
>> > [#^bytes buf]
>> > (let [nl (byte 10)]
>> > (areduce buf idx count 0
>> > (if (= (aget buf idx) nl)
>> > (inc count)
>> > count))))
>>
>> > (let [ifs (FileInputStream. "/dev/urandom")
>> > buf (make-array Byte/TYPE *numbytes*)]
>> > (dotimes [_ 10]
>> > (let [sz (.read ifs buf)]
>> > (println "Wanted" *numbytes* "got" sz "bytes")
>> > (let [count (time (countnl buf))]
>> > (println "Got" count "nls")))))
>>
>> > Iterate.java:
>>
>> > import java.io.FileInputStream;
>>
>> > class Iterate
>> > {
>> > static final int NUMBYTES = 16*1024*1024;
>>
>> > static int countnl(byte[] buf)
>> > {
>> > int count = 0;
>> > for(int i = 0; i < buf.length; i++) {
>> > if(buf[i] == '\n') {
>> > count++;
>> > }
>> > }
>> > return count;
>> > }
>>
>> > public static final void main(String[] args)
>> > throws Throwable
>> > {
>> > FileInputStream input = new FileInputStream("/dev/urandom");
>> > byte[] buf = new byte[NUMBYTES];
>> > int sz;
>> > long start, end;
>>
>> > for(int i = 0; i < 10; i++) {
>> > sz = input.read(buf);
>> > System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes");
>> > start = System.currentTimeMillis();
>> > int count = countnl(buf);
>> > end = System.currentTimeMillis();
>> > System.out.println("counted " + count + " nls in " +
>> > (end-start) + " msec");
>> > }
>>
>> > input.close();
>> > }
>> > }
>>
>> > iterate.c:
>>
>> > #include<sys/types.h>
>> > #include<sys/stat.h>
>> > #include<sys/time.h>
>> > #include<stdlib.h>
>> > #include<unistd.h>
>> > #include<stdio.h>
>> > #include<fcntl.h>
>>
>> > int countnl(char *buf, int sz)
>> > {
>> > int i;
>> > int count = 0;
>> > for(i = 0; i < sz; i++) {
>> > if(buf[i] == '\n') {
>> > count++;
>> > }
>> > }
>> > return count;
>> > }
>>
>> > int main()
>> > {
>> > int fd = open("/dev/urandom", O_RDONLY);
>> > const int NUMBYTES = 16*1024*1024;
>> > char *buf = (char*)malloc(NUMBYTES);
>>
>> > int sz;
>> > struct timeval start, end;
>>
>> > int i;
>> > for(i = 0; i < 10; i++) {
>> > sz = read(fd, buf, NUMBYTES);
>> > printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz);
>> > gettimeofday(&start, 0);
>> > int count = countnl(buf, sz);
>> > gettimeofday(&end, 0);
>> > printf("counted %d nls in %f msec\n", count,
>> > (float)(end.tv_sec-start.tv_sec)*1e3 +
>> > (end.tv_usec-start.tv_usec)/1e3);
>> > }
>>
>> > free(buf);
>> > close(fd);
>> > return 0;
>> > }
>>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Clojure" group.
>> > To post to this group, send email to [email protected]
>> > Note that posts from new members are moderated - please be patient with
>> > your first post.
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> >http://groups.google.com/group/clojure?hl=en
>>
>>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en