[gem5-users] Re: Can't explain timing result for flush and fence in classical cache hierarchy

Eliot Moss via gem5-users Wed, 12 Jul 2023 15:33:55 -0700

On 7/6/2023 1:47 PM, Khan Shaikhul Hadi via gem5-users wrote:

In my configuration I used CPUTypes.O3 and PrivateL1SharedL2CacheHeirarchy to check how clflush andfence impacts the timing of workload. In my workload I run 10,000 iteration to update an arrayvalue, 200 updates per thread. In workload, I have :
for( ;index <end_index-1;index++){
ARR[index]=thread_ID;
ARR[index+1]=thread_ID;
FENCE;
}
to simulate two consecutive localized write operations and see the impact of the fence. Insertion ofFENCE ( macro to insert mfence ) increase execution time by 24%. In second scenario, I have :
    for( ;index <end_index-1;index++){
    ARR[index]=thread_ID;
    FLUSH(&ARR[index]);
    FENCE;
    }
Where FLUSH (macro for _mm_clflush) should take more time to complete than ARR[index+1]=thread_IDas this memory update should be highly localized and flush needs to get acknowledgement from alllevels of cache before complete. So, FENCE should have much more penalty for flush compared to writeoperation. So, I was hoping to see a high execution time increase for insertion of fences in thesecond scenario. But insertion of the fence only increases 2% execution time which is counterintuitive.Can anyone explain why I'm seeing this behaviour ? As far as I understand, the memory fence shouldlet the following instruction execute after all previous instructions are completed and removed fromthe store buffer in which case clflush should take more time than regular write operation.


Sorry I am only now seeing this ...

IIRC from my work on improving cache write back / flush behavior,
the gem5 implementation considers the flush complete when the
operation reaches the L1 cache - similar to what happens with
stores.  I agree that from a timing standpoint this is wrong,
which is why I undertook some substantial surgery.  I need to
forward port to more recent releases, do testing, etc., but in
principle have a solution that:

- Gives line flush instructions timing where they are not complete
  until any write back makes it to the memory bus.

- Deals with the weaker ordering of clwb and clflushopt (which
  required retooling the store unit queue processing order).

- Supports invd, wbinvd, and wbnoind in addition to the line
  flush operations.

Not sure when I will be able to accomplish putting these together
as patches for the powers that be to review ...

Regards - Eliot Moss
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

[gem5-users] Re: Can't explain timing result for flush and fence in classical cache hierarchy

Reply via email to