Just to try to see if clojure is a practical language for doing
byte-level work (parsing files, network streams, etc), I wrote a
trivial function to iterate through a buffer of bytes and count all
the newlines that it sees. For my testing, I've written a C version,
a Java version, and a Clojure version. I'm running each routine 10
times over a 16MB buffer read from /dev/urandom (the buffer is
refreshed between each call to the newline counting function). With
gcc -O0, I get about 80ms per 16MB buffer. With gcc -O3, I get ~14ms
per buffer. With javac (and java -server) I get 20ms per 16MB buffer.
With clojure, I get 105ms per buffer (after the jvm warms up). I'm
guessing that the huge boost that java and gcc -O3 get is from
converting per-byte operations to per-int ops; at least that ~4x boost
looks like it would come from something like that. Is that an
optimization that is unavailable to clojure? The java_interop doc
makes it sound like java and clojure get the exact same bytecode when
using areduce correctly, so maybe there's something I could be doing
better. Here are my small programs; if somebody could suggest
improvements, I'd appreciate them.
iterate.clj:
(set! *warn-on-reflection* true)
(import java.io.FileInputStream)
(def *numbytes* (* 16 1024 1024))
(defn countnl
[#^bytes buf]
(let [nl (byte 10)]
(areduce buf idx count 0
(if (= (aget buf idx) nl)
(inc count)
count))))
(let [ifs (FileInputStream. "/dev/urandom")
buf (make-array Byte/TYPE *numbytes*)]
(dotimes [_ 10]
(let [sz (.read ifs buf)]
(println "Wanted" *numbytes* "got" sz "bytes")
(let [count (time (countnl buf))]
(println "Got" count "nls")))))
Iterate.java:
import java.io.FileInputStream;
class Iterate
{
static final int NUMBYTES = 16*1024*1024;
static int countnl(byte[] buf)
{
int count = 0;
for(int i = 0; i < buf.length; i++) {
if(buf[i] == '\n') {
count++;
}
}
return count;
}
public static final void main(String[] args)
throws Throwable
{
FileInputStream input = new FileInputStream("/dev/urandom");
byte[] buf = new byte[NUMBYTES];
int sz;
long start, end;
for(int i = 0; i < 10; i++) {
sz = input.read(buf);
System.out.println("Wanted " + NUMBYTES + " got " + sz + " bytes");
start = System.currentTimeMillis();
int count = countnl(buf);
end = System.currentTimeMillis();
System.out.println("counted " + count + " nls in " +
(end-start) + " msec");
}
input.close();
}
}
iterate.c:
#include<sys/types.h>
#include<sys/stat.h>
#include<sys/time.h>
#include<stdlib.h>
#include<unistd.h>
#include<stdio.h>
#include<fcntl.h>
int countnl(char *buf, int sz)
{
int i;
int count = 0;
for(i = 0; i < sz; i++) {
if(buf[i] == '\n') {
count++;
}
}
return count;
}
int main()
{
int fd = open("/dev/urandom", O_RDONLY);
const int NUMBYTES = 16*1024*1024;
char *buf = (char*)malloc(NUMBYTES);
int sz;
struct timeval start, end;
int i;
for(i = 0; i < 10; i++) {
sz = read(fd, buf, NUMBYTES);
printf("Wanted %d bytes, got %d bytes\n", NUMBYTES, sz);
gettimeofday(&start, 0);
int count = countnl(buf, sz);
gettimeofday(&end, 0);
printf("counted %d nls in %f msec\n", count,
(float)(end.tv_sec-start.tv_sec)*1e3 + (end.tv_usec-start.tv_usec)/1e3);
}
free(buf);
close(fd);
return 0;
}
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en