FYI IEEE doubles are typically 64 bit
IEEE floats are typically 32 bit. The wikipedia article is good: http://en.wikipedia.org/wiki/IEEE_754-2008 The IEEE standard (requires login): http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935 I'm not sure how the JVM implements them precisely. Hey, this is a *really* stupid question, but... Is everyone using 64 bit hardware? On Aug 6, 9:32 am, Nicolas Oury <nicolas.o...@gmail.com> wrote: > Hello, > > I will try to have a guess. If 98% of time is spend allocating Doubles, the > program is loading new lines of memory in cache > every n Doubles. At some point down the different levels of cache, you have > a common cache/main memory for both cores and the bus to this memory has to > be shared in some way. > > So maybe you are measuring the throughput of your memory, and not the > computational speed. > > To confirm/infirm the guess it would be interesting to > - see where the int version spend its time and compare the hprof. > - replace doubles with floats. (I assume Floats are smaller than Doubles but > I have no clue whether it is the case). > - replace ints with longs (corresponding assumption). > - I read somewhere about an option of the JVM for better cache-aware > allocation. But I don't remember what it was > ans whether it was in 6 or will be in 7. > > Cheers, > > Nicolas. > > On Thu, Aug 6, 2009 at 11:07 AM, Andy Fingerhut < > > andy_finger...@alum.wustl.edu> wrote: > > > On Aug 5, 6:09 am, Rich Hickey <richhic...@gmail.com> wrote: > > > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus<johann.kr...@gmail.com> > > wrote: > > > > >> Could it be that your CPU has a single floating-point unit shared by 4 > > > >> cores on a single die, and thus only 2 floating-point units total for > > > >> all 8 of your cores? If so, then that fact, plus the fact that each > > > >> core has its own separate ALU for integer operations, would seem to > > > >> explain the results you are seeing. > > > > > Exactly, this would explain the behaviour. But unfortunately it is not > > > > the case. I implemented a small example using Java (Java Threads) and > > > > C (PThreads) and both times I get a linear speedup. See the attached > > > > code below. The cores only share 12 MB cache, but this should be > > > > enough memory for my micro-benchmark. Seeing the linear speedup in > > > > Java and C, I would negate a hardware limitation. > > > > > _ > > > > Johann > > > > I looked briefly at your problem and don't see anything right off the > > > bat. Do you have a profiler and could you try that out? I'm > > > interested. > > > Rich > > > I ran these tests on my iMac with 2.16 GHz Intel Core 2 Duo (2 cores) > > using latest Clojure and clojure-contrib from git as of some time on > > Aug 4, 2009. The Java implementation is from Apple, version 1.6.0_13. > > > ---------------------------------------------------------------------- > > For int, there are 64 "jobs" run, each of which consists of doing > > (inc 0) 1,000,000,000 times. See pmap-batch.sh and pmap-testing.clj > > for details. > > >http://github.com/jafingerhut/clojure-benchmarks/blob/398688c71525964... > > >http://github.com/jafingerhut/clojure-benchmarks/blob/398688c71525964... > > > Yes, yes, I know. I should really use a library for command line > > argument parsing to avoid so much repetitive code. I may do that some > > day. > > > Results for int 1 thread - jobs run sequentially > > > "Elapsed time: 267547.789 msecs" > > real 269.22 > > user 268.61 > > sys 1.79 > > > int 2 threads - jobs run in 2 threads using modified-pmap, which > > limits the number of futures causing threads to run jobs to be at most > > 2 at a time. > > > "Elapsed time: 177428.626 msecs" > > real 179.14 > > user 330.30 > > sys 15.46 > > > Comment: Elapsed time with 2 threads is about 2/3 of elapsed time with > > 1 thread. Not as good as the 1/2 as we'd like with a 2 core machine, > > but better than not being faster at all. > > > ---------------------------------------------------------------------- > > For double, there are 16 "jobs" run, each of which consists of doing > > (inc 0.1) 1,000,000,000 times. > > > double 1 thread > > > "Elapsed time: 258659.424 msecs" > > real 263.28 > > user 247.29 > > sys 12.17 > > > double 2 threads > > > "Elapsed time: 229382.68 msecs" > > Dumping CPU usage by sampling running threads ... done. > > real 231.05 > > user 380.79 > > sys 11.49 > > > Comment: Elapsed time with 2 threads is about 7/8 of elapsed time with > > 1 thread. Hardly any improvement at all for something that should be > > "embarrassingly parallel", and the user time reported by Mac OS X's > > /usr/bin/time increased by a factor of about 1.5. That seems like way > > too much overhead for thread coordination. > > > Here are hprof output files for the "double 1 thread" and "double 2 > > threads" tests: > > >http://github.com/jafingerhut/clojure-benchmarks/blob/51d499c2679c2d5... > > >http://github.com/jafingerhut/clojure-benchmarks/blob/51d499c2679c2d5... > > > In both cases, over 98% of the time is spent in > > java.lang.Double.valueOf(double d). See the files for the full stack > > backtraces if you are curious. > > > I don't see any reason why that method should have any kind of > > contention or worse performance when running on 2 cores vs. 1 core, > > but I don't know the guts of how it is implemented. At least in > > OpenJDK all it does is "return new Double(d)", where d is the double > > arg to valueOf(). Is there any reason why "new" might exhibit > > contention between parallel threads? > > > Andy --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---