On Fri, Sep 29, 2017 at 08:19:22AM -0700, Mike Travis wrote: > > So I would still like to get clarification on how ART works (or likely > > doesn't) on your systems. I think for now its fairly prudent to kill > > detect_art() on UV. > > I tested with both detect_art enabled and disabled and didn't notice a > difference though I wasn't sure what test to run to verify whether it was > being used or not. (I'd be glad to run some specific test if one can be > suggested?) The num/denom setting for a 2100Mhz CPU was 168/2 if that > information helps?
While ART has a ratio to TSC, it too has an absolute relation to it. Given an ART time stamp we can compute a TSC value and vice versa, this allows correlating device timestamps (Network, Audio/Video etc..) with CPU time stamps. Per detect_art() we have a single system wide offset, namely: rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset); But you use TSC_ADJUST to sync between your cabinets, this cannot ever be right. The ART clock of the other cabinets (those that did not run detect_art) will have a different offset. Currently there are only two device drivers that use ART: drivers/net/ethernet/intel/e1000e/ptp.c: *system = convert_art_to_tsc(sys_cycles); sound/pci/hda/hda_controller.c: *system = convert_art_to_tsc(tsc_counter); Outside of that nobody cares, _for_now_. I'm not sure if there's a means for the CPU to read ART in order to test this correlation. Intel SDM Vol 3B 17.17.4 speaks of 'K' with a footnote about TSC_ADJUST and the VMCS TSC fields. But basically both TSC and ART start at 0 on power on and given the frequency ratio 'K' is a known for native system agents. Again, I would suggest killing detect_art() (and the setting of X86_FEATURE_ART) on UV systems until things are worked out. Also, given you have your own distributed clock, I'm thinking you use that on your own devices, obviating the immediate need for ART. > > Also, while indeed not strictly required, that TSC_ADJUST==0 test on > > bootcpu is nice for consumer systems, BIOS did something 'weird' if that > > is not true. Is something like is_uv_system() available early enough? > > My previous version of the patches had me setting a flag that could be > checked by the tsc_sanitize_first_cpu() function and disable the requirement > of "TSC == 0 on socket 0" for any arch that specified it. > (And UV did set that flag.) > > But Thomas said it was "hackery" and that TSC being 0 on socket 0 was no > longer a requirement. So I took it out for this version and made the "TSC > == 0 on socket 0" no longer the default for any arch. That's where it comes from. But normal systems really _should_ have it at 0 and its a useful sanity check IMO. We really want to know when the BIOS does a funny behind our backs.