Re: Question about pmap

Sean Devlin Thu, 06 Aug 2009 10:29:27 -0700

FYI

IEEE doubles are typically 64 bit


IEEE floats are typically 32 bit.

The wikipedia article is good:

http://en.wikipedia.org/wiki/IEEE_754-2008

The IEEE standard (requires login):

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4610935

I'm not sure how the JVM implements them precisely.

Hey, this is a *really* stupid question, but...

Is everyone using 64 bit hardware?

On Aug 6, 9:32 am, Nicolas Oury <nicolas.o...@gmail.com> wrote:
> Hello,
>
> I will try to have a guess. If 98% of time is spend allocating Doubles, the
> program is loading new lines of memory in cache
> every n Doubles. At some point down the different levels of cache, you have
> a common cache/main memory for both cores and the bus to this memory has to
> be shared in some way.
>
> So maybe you are measuring the throughput of your memory, and not the
> computational speed.
>
> To confirm/infirm the guess it would be interesting to
> - see where the int version spend its time and compare the hprof.
> - replace doubles with floats. (I assume Floats are smaller than Doubles but
> I have no clue whether it is the case).
> - replace ints with longs (corresponding assumption).
> - I read somewhere about an option of the JVM for better cache-aware
> allocation. But I don't remember what it was
> ans whether it was in 6 or will be in 7.
>
> Cheers,
>
> Nicolas.
>
> On Thu, Aug 6, 2009 at 11:07 AM, Andy Fingerhut <
>
> andy_finger...@alum.wustl.edu> wrote:
>
> > On Aug 5, 6:09 am, Rich Hickey <richhic...@gmail.com> wrote:
> > > On Wed, Aug 5, 2009 at 8:29 AM, Johann Kraus<johann.kr...@gmail.com>
> > wrote:
>
> > > >> Could it be that your CPU has a single floating-point unit shared by 4
> > > >> cores on a single die, and thus only 2 floating-point units total for
> > > >> all 8 of your cores?  If so, then that fact, plus the fact that each
> > > >> core has its own separate ALU for integer operations, would seem to
> > > >> explain the results you are seeing.
>
> > > > Exactly, this would explain the behaviour. But unfortunately it is not
> > > > the case. I implemented a small example using Java (Java Threads) and
> > > > C (PThreads) and both times I get a linear speedup. See the attached
> > > > code below. The cores only share 12 MB cache, but this should be
> > > > enough memory for my micro-benchmark. Seeing the linear speedup in
> > > > Java and C, I would negate a hardware limitation.
>
> > > > _
> > > > Johann
>
> > > I looked briefly at your problem and don't see anything right off the
> > > bat. Do you have a profiler and could you try that out? I'm
> > > interested.
> > > Rich
>
> > I ran these tests on my iMac with 2.16 GHz Intel Core 2 Duo (2 cores)
> > using latest Clojure and clojure-contrib from git as of some time on
> > Aug 4, 2009.  The Java implementation is from Apple, version 1.6.0_13.
>
> > ----------------------------------------------------------------------
> > For int, there are 64 "jobs" run, each of which consists of doing
> > (inc 0) 1,000,000,000 times.  See pmap-batch.sh and pmap-testing.clj
> > for details.
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/398688c71525964...
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/398688c71525964...
>
> > Yes, yes, I know.  I should really use a library for command line
> > argument parsing to avoid so much repetitive code.  I may do that some
> > day.
>
> > Results for int 1 thread - jobs run sequentially
>
> > "Elapsed time: 267547.789 msecs"
> > real       269.22
> > user       268.61
> > sys          1.79
>
> > int 2 threads - jobs run in 2 threads using modified-pmap, which
> > limits the number of futures causing threads to run jobs to be at most
> > 2 at a time.
>
> > "Elapsed time: 177428.626 msecs"
> > real       179.14
> > user       330.30
> > sys         15.46
>
> > Comment: Elapsed time with 2 threads is about 2/3 of elapsed time with
> > 1 thread.  Not as good as the 1/2 as we'd like with a 2 core machine,
> > but better than not being faster at all.
>
> > ----------------------------------------------------------------------
> > For double, there are 16 "jobs" run, each of which consists of doing
> > (inc 0.1) 1,000,000,000 times.
>
> > double 1 thread
>
> > "Elapsed time: 258659.424 msecs"
> > real       263.28
> > user       247.29
> > sys         12.17
>
> > double 2 threads
>
> > "Elapsed time: 229382.68 msecs"
> > Dumping CPU usage by sampling running threads ... done.
> > real       231.05
> > user       380.79
> > sys         11.49
>
> > Comment: Elapsed time with 2 threads is about 7/8 of elapsed time with
> > 1 thread.  Hardly any improvement at all for something that should be
> > "embarrassingly parallel", and the user time reported by Mac OS X's
> > /usr/bin/time increased by a factor of about 1.5.  That seems like way
> > too much overhead for thread coordination.
>
> > Here are hprof output files for the "double 1 thread" and "double 2
> > threads" tests:
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/51d499c2679c2d5...
>
> >http://github.com/jafingerhut/clojure-benchmarks/blob/51d499c2679c2d5...
>
> > In both cases, over 98% of the time is spent in
> > java.lang.Double.valueOf(double d).  See the files for the full stack
> > backtraces if you are curious.
>
> > I don't see any reason why that method should have any kind of
> > contention or worse performance when running on 2 cores vs. 1 core,
> > but I don't know the guts of how it is implemented.  At least in
> > OpenJDK all it does is "return new Double(d)", where d is the double
> > arg to valueOf().  Is there any reason why "new" might exhibit
> > contention between parallel threads?
>
> > Andy
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Question about pmap

Reply via email to