>>> On 22.07.15 at 13:22, <andrew.coop...@citrix.com> wrote:
> On 22/07/15 11:04, Jan Beulich wrote:
>>>>> On 22.07.15 at 10:42, <andrew.coop...@citrix.com> wrote:
>>> In the case of having aligned source and destination on a 16-byte
>>> boundary (which we can trivially arrange), then ERMSB (to give it its
>>> Intel name) and rep stosl differ only in the setup cost; they still
>>> scale at the same rate for changes in length.
>>>
>>> Therefore, assuming we arrange for 16-byte alignment, using rep stosl
>>> would appear to be a single 60ish cycle hit over using ERMSB, but would
>>> be substantially more efficient than using rep stosb on a non-ERMSB system.
>>>
>>> Overall, I think 16 byte alignment and rep stosl is the best compromise.
>> Or leaving such code alone, with the assumption that over time the
>> setup cost (on a growing number of systems) outweighs the benefits
>> (on a shrinking set).
> 
> The BSS is large - 295k on the last compile I have from staging.  The
> setup cost is lost in the nose compared to the elapsed time to write
> that many zeroes to memory.
> 
> Therefore, on an ERMBS-capable system, the two options will complete in
> the same amount of time.
> 
> However, on all AMD hardware and Intel hardware older than IvyBridge,
> rep stosl is 4 times faster than rep stosb.

Well, okay then.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to