On 10/5/2018 10:32 AM, Matthew Flatt wrote:
At Fri, 5 Oct 2018 15:36:04 +0200, Paulo Matos wrote: > Again, I am
really surprised that you mention that places are not > separate
processes. Documentation does say they are separate racket > virtual
machines, how is this accomplished if not by using separate >
processes? Each place is an OS thread within the Racket process. The
virtual machine is essentially instantiated once in each thread, where
things that look like global variables at the C level are actually
thread-local variables to make them place-specific. Still, there is
some sharing among the threads. > My workers are really doing Z3 style
work - number crushing and lots of > searching. No IO (writing to
disk) or communication so I would expect > them to really max out all
CPUs. My best guess is that it's memory-allocation bottlenecks,
probably at the point of using mmap() and mprotect(). Maybe things
don't scale well beyond the 4-core machines that I use. On my
machines, the enclosed program can max out CPU use with system time
being a small fraction. It scales ok from 1 to 4 places (i.e., real
time increased only some). The machine's core are hyperthreaded, and
the example maxes out CPU utilization at 8 --- but it takes twice as
long in real time, so the hardware threads don't help much in this
case. Running two processes with 4 places takes about the same real
time as running one process with 8 places, as does 2 processes with 2
places. Do you see similar effects, or does this little example stop
scaling before the number of processes matches the number of cores?
As Matthew said, this may be a case where multiple processes are better.
One thing that likely is vastly different between your two systems is
the memory architecture. On Paulo's many-core machine, each group of
[probably] 6 CPUs will have its own physical bank of memory which is
close to it and which it uses preferentially. Access to a different
bank may be very costly. Paulo's machine may be spending a much greater
percentage of time moving data between VM instances that are located in
different memory regions ... something Matthew can't see on his quad-core.
Paulo, you might take a look at how memory is being allocated [not sure
what tools you have for this] and see what happens if you restrict the
process to running on various groups of CPUs. It may be that some banks
of your memory are "closer" than others.
Hope this helps,
George
--
You received this message because you are subscribed to the Google Groups "Racket
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.