Hi Ralph, I understood what you meant.
I often use float for our applicatoin.
float c = (float)(unsinged int a - unsinged int b) could
be very huge number, if a < b. So I always carefully cast to
int from unsigned int when I subtract them. I didn't know/mind
inc d = (unsinged int a - unsinged in
Yes, indeed. In future, when we will have many many cores
in the machine, we will have to take care of overrun of
num_procs.
Tetsuya
> Cool - easily modified. Thanks!
>
> Of course, you understand (I'm sure) that the cast does nothing to
protect the code from blowing up if we overrun the var. I
Cool - easily modified. Thanks!
Of course, you understand (I'm sure) that the cast does nothing to protect the
code from blowing up if we overrun the var. In other words, if the unsigned var
has wrapped, then casting it to int won't help - you'll still get a negative
integer, and the code will
Hi Ralph, I'm a litte bit late to your release.
I found a minor mistake in byobj_span -integer casting problem.
--- rmaps_rr_mappers.30892.c2014-03-01 08:31:50 +0900
+++ rmaps_rr_mappers.c 2014-03-01 08:33:22 +0900
@@ -689,7 +689,7 @@
}
/* compute how many objs need an extra pro
HI Beichuan
To add to what Ralph said,
the RHEL OpenMPI package probably wasn't built with
with PBS Pro support either.
Besides, OMPI 1.5.4 (RHEL version) is old.
**
You will save yourself time and grief if you read the installation FAQs,
before you install from the source tarball:
http://www.
Le 28/02/2014 21:30, Gus Correa a écrit :
> Hi Brice
>
> The (pdf) output of lstopo shows one L1d (16k) for each core,
> and one L1i (64k) for each *pair* of cores.
> Is this wrong?
It's correct. AMD uses this "dual-core compute unit" where L2 and L1i
are shared but L1d isn't.
> BTW, if there are
Am 28.02.2014 um 21:23 schrieb Brice Goglin:
> OK, the problem is that node14's BIOS reports invalid NUMA info. It properly
> detects 2 sockets with 16-cores each. But it reports 2 NUMA nodes total,
> instead of 2 per socket (4 total). And hwloc warns because the cores
> contained in these NUMA
On 02/28/2014 03:32 AM, Brice Goglin wrote:
Le 28/02/2014 02:48, Ralph Castain a écrit :
Remember, hwloc doesn't actually "sense" hardware - it just parses files in the
/proc area. So if something is garbled in those files, hwloc will report errors. Doesn't
mean anything is wrong with the hard
OK, the problem is that node14's BIOS reports invalid NUMA info. It
properly detects 2 sockets with 16-cores each. But it reports 2 NUMA
nodes total, instead of 2 per socket (4 total). And hwloc warns because
the cores contained in these NUMA nodes are incompatible with sockets:
socket0 contains 0-
Did you see the note I forwarded to you about SLES issues? Not sure if that is
on your side or ours
On Feb 28, 2014, at 12:05 PM, Latham, Robert J. wrote:
> On Wed, 2014-02-26 at 17:14 -0600, Edgar Gabriel wrote:
>> that was my fault, I did not follow up the time, got probably side
>> tracked b
On Wed, 2014-02-26 at 17:14 -0600, Edgar Gabriel wrote:
> that was my fault, I did not follow up the time, got probably side
> tracked by something. Anyway, I suspect that you actually have the
> patch, otherwise the current Open MPI trunk and the 1.7 release series
> would not have the patch after
You might also want to check the BIOS rev level on node14, Gus - as Brice
suggested, it could be that the board came with the wrong firmware.
On Feb 28, 2014, at 11:55 AM, Gus Correa wrote:
> Hi Brice and Ralph
>
> Many thanks for helping out with this!
>
> Yes, you are right about node15 bei
Hi Brice and Ralph
Many thanks for helping out with this!
Yes, you are right about node15 being OK.
Node15 was a red herring, as along with node14 it was part of
the same job that failed.
However, after a closer look, I noticed that failure reported
by hwloc was indeed in node14.
I attach both
Almost certainly, the redhat package wasn't built with matching infiniband
support and so we aren't picking it up. I'd suggest downloading the latest
1.7.4 or 1.7.5 nightly tarball, or even the latest 1.6 tarball if you want the
stable release, and build it yourself so you *know* it was built fo
Hi there,
I am running jobs on clusters with Infiniband connection. They installed
OpenMPI v1.5.4 via REDHAT 6 yum package). My problem is that although my jobs
gets queued and started by PBS PRO quickly, most of the time they don't really
run (occasionally they really run) and give error info
On Feb 28, 2014, at 12:32 AM, Brice Goglin wrote:
> Le 28/02/2014 02:48, Ralph Castain a écrit :
>> Remember, hwloc doesn't actually "sense" hardware - it just parses files in
>> the /proc area. So if something is garbled in those files, hwloc will report
>> errors. Doesn't mean anything is wr
Le 28/02/2014 02:48, Ralph Castain a écrit :
> Remember, hwloc doesn't actually "sense" hardware - it just parses files in
> the /proc area. So if something is garbled in those files, hwloc will report
> errors. Doesn't mean anything is wrong with the hardware at all.
For the record, that's not
Hello Gus,
I'll need the tarball generated by gather-topology on node14 to debug
this. node15 doesn't have any issue.
We've seen issues on AMD machines because of buggy BIOS reporting
incompatible Socket and NUMA info. If node14 doesn't have the same BIOS
version as other nodes, that could explain
On 2/27/14 14:06 PM, Noam Bernstein wrote:
On Feb 27, 2014, at 2:36 AM, Patrick Begou
wrote:
Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs, that
allocate full nodes, but the majority of our jobs don't do that.
Why ?
We still use this option
On 2/27/14 16:47 PM, Dave Love wrote:
Bernd Dammann writes:
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.
Users
Please take a look at https://svn.open-mpi.org/trac/ompi/ticket/4317
On Feb 27, 2014, at 8:13 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph, I can't operate our cluster for a few days, sorry.
>
> But now, I'm narrowing down the cause by browsing the source code.
>
> My best guess is
21 matches
Mail list logo