Kevin’s observation about floating-point rounding and runtime dispatch is an 
excellent one in general.

Those two CPU’s should, as far as I can tell, be dispatched to the same SIMD 
implementations in this case.

Skimming 
https://github.com/qt/qtbase/blob/v6.8.0/src/gui/painting/qimagescale_sse4.cpp, 
it looks like a fixed-point implementation that entirely avoids floating-poont 
operations. If there are no bugs, and if I’m not missing something, it should 
be possible to get identical results regardless of ISA extensions since no 
rounding is involved.

The fact that the scaling algorithm appears to be integer-based also makes the 
following sources of irreproducibility less likely, but maybe not impossible:

- Some algorithms compute “left-over” leading and/or trailing data with a 
scalar algorithm, and in some cases this could make the results depend on 
alignment of buffers in memory. Besides the fact that this is an integer 
implementation, at a glance, Qt doesn’t appear to be doing this. It looks like 
QImage must be aligned and (over-)allocated to allow everything to be done in 
SIMD, processing some extra pixels outside the image as necessary to make 
complete vectors.

- SIMD algorithms might operate on input values and combine pixels in a 
different order than scalar ones, which could result in different rounding for 
floating-point operations. That shouldn’t matter for an integer algorithm like 
this, except maybe in cases of wrapping/overflow – which might perhaps be in 
play here.

Another relevant fact is that the implementation is multi-threaded using a 
thread pool. If there is anything that depends on the order in which 
pixels/blocks are computed and combined, this could also result in different 
outputs, even in different runs on the same machine, and especially on machines 
with different numbers of cores.

All of this is written on a phone, without digging very deeply into the source 
or doing any practical experiments.

On Sun, Nov 3, 2024, at 7:38 AM, Zbigniew Jędrzejewski-Szmek wrote:
> On Sun, Nov 03, 2024 at 04:08:38AM +0100, Kevin Kofler via devel wrote:
>> Zbigniew Jędrzejewski-Szmek wrote:
>> > With python3-pyqt6-6.8.0-0.1.fc42.x86_64, we get a difference in how the
>> > icons are rendered:
>> > 
>> >     calibre-7.20.0-1.fc42.x86_64
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/16x16/apps/calibre-gui.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-ebook-edit.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-gui.png
>> >         modified-S.5........
>> >         /usr/share/icons/hicolor/32x32/apps/calibre-viewer.png ...
>> > 
>> > There are some tiny differences in shading of some pixels. The difference
>> > is not discernible visually for me. [1] has example icons attached.
>> > 
>> > Is this a bug in Qt and implementation of QImage.scaled [3] ?
>> 
>> As I understand the Qt source code, QImage.scaled with the 
>> Qt.TransformationMode.SmoothTransformation flag ends up calling 
>> QImage.smoothScaled (QImage.scaled calls the general QImage.transformed, 
>> which then detects the special case and calls QImage.smoothScaled), which in 
>> turn calls the private qSmoothScaleImage. And that one uses a different 
>> algorithm based on whether the CPU is runtime-detected to support SSE 4.1 or 
>> not. (For non-x86, there are also optimized implementations for ARM NEON and 
>> Longsoon LSX, also with runtime detection, otherwise the generic C 
>> implementation is used, as on pre-SSE-4.1 x86.) See 
>> https://code.qt.io/cgit/qt/qtbase.git/tree/src/gui/painting/qimagescale.cpp 
>> and 
>> https://code.qt.io/cgit/qt/qtbase.git/tree/src/gui/painting/qimagescale_sse4.cpp
>>  
>> . It is likely that the vectorized implementation rounds slightly 
>> differently. So you then end up with different results when building on non-
>> identical builder hardware.
>
> Wow, thank you, that is a great find.
>
> The koji build used GenuineIntel Intel Xeon Processor (Cascadelake), while
> my rebuilder used AuthenticAMD AMD EPYC 9R14. They both have SSE 4.1 (1,2),
> so theoretically qt_qimageScaleAARGBA_down_x_up_y_sse4() would be used in
> both cases. But those are significantly different CPUs, so it's seems possible
> that the difference is caused by the optimized vector implementations.
> I'm not sure though: could the exact same code deliver non-bit-identical
> results on different CPUs when processing 128-bit ints?
>
> (1) fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
> pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm 
> constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq 
> vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt 
> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
> 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow 
> flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 
> erms invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd 
> avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat vnmi umip pku 
> ospke avx512_vnni md_clear flush_l1d arch_capabilities
>
> (2) fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid 
> aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid 
> sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor 
> lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch 
> topoext perfctr_core ssbd perfmon_v2 ibrs ibpb stibp ibrs_enhanced 
> vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed 
> adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl 
> xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr rdpru 
> wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq 
> avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d
>
> Zbyszek
> -- 
> _______________________________________________
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to