Re: [perf-discuss] NUMA and interconnect transfers

Roland Mainz Fri, 18 Jan 2008 13:55:10 -0800

Rafael Vanoni wrote:
> Roland Mainz wrote:
> > Rafael Vanoni wrote:
> >> Roland Mainz wrote:
> >>> Jonathan Chew wrote:
> >>>> Rafael Vanoni wrote:
> > [snip]
> >>> BTW: How does the topology code handle DR ?
> >> Take a look at
> >> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/lgrp.c#522.
> >> That's the platform independent portion of the code called when there's
> >> a DR event.
> >
> > Mhhh... how heavywheight is |lgrp_latency_change()| ?
> 
> lgrp_latency_change() ->
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/lgrp.c#498


I'm asking because it contains a loop which uses |lgrp_alloc_max| - how
large will this value be on a machine like the SF25k ?

> > A crude hack may
> > be to use something similar to the |LGRP_CONFIG_LAT_CHANGE|-codepath
> > when a link becomes saturated (e.g. bump the latency incrementally for
> > anything beyond a certain link usage (maybe 90% as initial threshhold ?
> > And what should lower the latency value if the link usage drops below
> > the threshhold) ... I doubt the hack will work as expected but the
> > experiment may give some hints what may need to be done for a real
> > implementation...
> 
> I don't want to speculate on how simple or complex adding support to
> link usage is because that involves supporting hardware performance
> counters - and that's very platform dependent.
> 
> But increasing latency values wouldn't be enough because the scheduler
> and the VM systems use the lgroup topology to decide how to dispatch
> threads and allocate resources. The topology is created after the
> latency values have been discovered, so that's early boot time - and
> again, modified when a DR event takes place.
> 
> Changing the entire topology whenever a link saturates is obviously not
> the way to go, so we'd need to raise a flag somewhere indicating that
> situation.

BTW: Maybe it's better to use an integer value to store the link
saturation value... using a boolean flag somehow "feels" (e.g. I don't
have a justification... it's just a feeling) like an invitation for
oscillation issues.

> However, that fix would require changing the decision making
> process, places like lgrp_mem_choose(), and I'm not sure how far that
> could take us.

Another, half-unrelated issue: Has anyone looked into supporting things
like "code+data[1] seperation" (e.g. allow that code and data can be
handled seperately, even over different links) and "multiple redundant
links between CPUs" yet ?

[1]="code" and "data" are just examples in this case, maybe there are
other data types which may be usefull to seperate

> I'm not 100% sure this usage readings should be stored within the lgroup
> topology to be honest.

That's why I called it a "crude hack"+"experiment" ... :-)
...but AFAIK it may still be usefull to see what may happen.

> I know there's some work around hardware
> telemetry, and that may very well be the right place for it.

Is there an OpenSolaris.org project for this work ?

> > BTW: Does Solaris/x86 support something like marking certain pages as
> > non-cacheable (for example in some cases it may be nice to turn caching
> > off if the traffic caused by cache coherency (or even the plain usage of
> > cache lines for data which are only read or written once) causes more
> > trouble than just turning the cache off for such data) ?
> 
> Yes, see
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/vm/page.h#485

Is this somehow accessible from userland applications ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] NUMA and interconnect transfers

Reply via email to