.gitignore | 46 ChangeLog | 1955 +++++++++++++++++++++++ configure.ac | 5 debian/changelog | 17 debian/control | 9 debian/patches/ppc64el.diff | 14 debian/patches/series | 1 debian/rules | 5 pixman/Makefile.am | 2 pixman/pixman-arm-asm.h | 37 pixman/pixman-arm-common.h | 11 pixman/pixman-arm-neon-asm-bilinear.S | 12 pixman/pixman-arm-neon-asm.S | 12 pixman/pixman-arm-neon-asm.h | 20 pixman/pixman-arm-neon.c | 24 pixman/pixman-arm-simd-asm-scaled.S | 11 pixman/pixman-arm-simd-asm.S | 525 ++++++ pixman/pixman-arm-simd-asm.h | 116 + pixman/pixman-arm-simd.c | 44 pixman/pixman-combine-float.c | 338 ++-- pixman/pixman-combine32.c | 1686 +------------------- pixman/pixman-fast-path.c | 2 pixman/pixman-general.c | 27 pixman/pixman-gradient-walker.c | 2 pixman/pixman-inlines.h | 3 pixman/pixman-mips-dspr2-asm.S | 2 pixman/pixman-mips-dspr2-asm.h | 4 pixman/pixman-mips-dspr2.c | 10 pixman/pixman-mips-dspr2.h | 8 pixman/pixman-mmx.c | 109 + pixman/pixman-private.h | 6 pixman/pixman-sse2.c | 24 pixman/pixman-vmx.c | 1315 +++++++++++++++- pixman/pixman.c | 18 test/Makefile.sources | 60 test/affine-bench.c | 436 +++++ test/blitters-test.c | 20 test/check-formats.c | 176 -- test/composite.c | 11 test/lowlevel-blt-bench.c | 507 +++++- test/pixel-test.c | 2780 +++++++++++++++++++++++++++++++++- test/radial-invalid.c | 54 test/solid-test.c | 353 ++++ test/thread-test.c | 29 test/tolerance-test.c | 360 ++++ test/utils.c | 653 ++++++- test/utils.h | 13 47 files changed, 9417 insertions(+), 2455 deletions(-)
New commits: commit 42fab57651e2ebdde5d260ae76809a2500086839 Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 13:40:42 2015 +0200 Bump standards version to 3.9.6. diff --git a/debian/changelog b/debian/changelog index 245fb5c..e73a52d 100644 --- a/debian/changelog +++ b/debian/changelog @@ -6,6 +6,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium * Update Vcs-* fields. * Add upstream url. * Drop XC- prefix from Package-Type field. + * Bump standards version to 3.9.6. [ intrigeri ] * Simplify hardening build flags handling (closes: #760100). diff --git a/debian/control b/debian/control index c78d8b6..6188e41 100644 --- a/debian/control +++ b/debian/control @@ -7,7 +7,7 @@ Build-Depends: dh-autoreconf, pkg-config, quilt, -Standards-Version: 3.9.2 +Standards-Version: 3.9.6 Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git Homepage: http://pixman.org/ commit 56432ef5e5a38ddd77e23d10e1e8f724afcbedd8 Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 13:38:49 2015 +0200 Drop XC- prefix from Package-Type field. diff --git a/debian/changelog b/debian/changelog index e6627d6..245fb5c 100644 --- a/debian/changelog +++ b/debian/changelog @@ -5,6 +5,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium * Enable vmx on ppc64el (closes: #786345). * Update Vcs-* fields. * Add upstream url. + * Drop XC- prefix from Package-Type field. [ intrigeri ] * Simplify hardening build flags handling (closes: #760100). diff --git a/debian/control b/debian/control index 03277a6..c78d8b6 100644 --- a/debian/control +++ b/debian/control @@ -28,7 +28,7 @@ Description: pixel-manipulation library for X and cairo Package: libpixman-1-0-udeb Section: debian-installer -XC-Package-Type: udeb +Package-Type: udeb Architecture: any Depends: ${shlibs:Depends}, commit c0f98e1cf4fa897eb67a3ef737b24deacda5ae7e Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 11:47:45 2015 +0200 Add upstream url. diff --git a/debian/changelog b/debian/changelog index 05d7550..e6627d6 100644 --- a/debian/changelog +++ b/debian/changelog @@ -4,6 +4,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium * New upstream release candidate. * Enable vmx on ppc64el (closes: #786345). * Update Vcs-* fields. + * Add upstream url. [ intrigeri ] * Simplify hardening build flags handling (closes: #760100). diff --git a/debian/control b/debian/control index a56b239..03277a6 100644 --- a/debian/control +++ b/debian/control @@ -10,6 +10,7 @@ Build-Depends: Standards-Version: 3.9.2 Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git +Homepage: http://pixman.org/ Package: libpixman-1-0 Section: libs commit 03e2d2138b1248c79658e5edeaf66b283a278ff2 Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 11:46:39 2015 +0200 Update Vcs-* fields. diff --git a/debian/changelog b/debian/changelog index 4cdb1aa..05d7550 100644 --- a/debian/changelog +++ b/debian/changelog @@ -3,6 +3,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium [ Andreas Boll ] * New upstream release candidate. * Enable vmx on ppc64el (closes: #786345). + * Update Vcs-* fields. [ intrigeri ] * Simplify hardening build flags handling (closes: #760100). diff --git a/debian/control b/debian/control index 18a1b7f..a56b239 100644 --- a/debian/control +++ b/debian/control @@ -8,8 +8,8 @@ Build-Depends: pkg-config, quilt, Standards-Version: 3.9.2 -Vcs-Git: git://git.debian.org/git/pkg-xorg/lib/pixman -Vcs-Browser: http://git.debian.org/?p=pkg-xorg/lib/pixman.git +Vcs-Git: https://anonscm.debian.org/git/pkg-xorg/lib/pixman.git +Vcs-Browser: https://anonscm.debian.org/cgit/pkg-xorg/lib/pixman.git Package: libpixman-1-0 Section: libs commit e6fce5e4e47a7a1597defa0c8f89eba0222b8953 Author: intrigeri <intrig...@debian.org> Date: Sun Aug 31 16:56:42 2014 +0000 Update changelog. Signed-off-by: Andreas Boll <andreas.boll....@gmail.com> diff --git a/debian/changelog b/debian/changelog index 37ddf53..4cdb1aa 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,8 +1,14 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium + [ Andreas Boll ] * New upstream release candidate. * Enable vmx on ppc64el (closes: #786345). + [ intrigeri ] + * Simplify hardening build flags handling (closes: #760100). + Thanks to Simon Ruderich <si...@ruderich.org> for the patch. + * Enable all hardening build flags. Thanks to Simon Ruderich too. + -- Andreas Boll <andreas.boll....@gmail.com> Fri, 04 Sep 2015 11:29:52 +0200 pixman (0.32.6-3) sid; urgency=medium commit 7bc925aa5056ea114822bd9d06d94852946ba3d4 Author: intrigeri <intrig...@debian.org> Date: Sun Aug 31 16:54:54 2014 +0000 Enable all hardening build flags. Thanks to Simon Ruderich <si...@ruderich.org> for the patch. Quoting Simon again: "It currently has the same effect as hardening=+bindnow, but will automatically enable future hardening options and in case the package will ever build binaries those are immediately protected with PIE as well." Signed-off-by: Andreas Boll <andreas.boll....@gmail.com> diff --git a/debian/rules b/debian/rules index 99d67fc..a0e0b9e 100755 --- a/debian/rules +++ b/debian/rules @@ -3,7 +3,7 @@ PACKAGE = libpixman-1-0 SHLIBS = 0.25.2 -export DEB_BUILD_MAINT_OPTIONS = hardening=+bindnow +export DEB_BUILD_MAINT_OPTIONS = hardening=+all # Disable Gtk+ autodetection: override_dh_auto_configure: commit 2fb4da778cc2ce30df4e1e692dc82d00c6593137 Author: intrigeri <intrig...@debian.org> Date: Sun Aug 31 16:53:25 2014 +0000 Simplify hardening build flags handling. Thanks to Simon Ruderich <si...@ruderich.org> for the patch. Quoting Simon Ruderich <si...@ruderich.org>: "There's no need to use dpkg-buildflags manually in debian/rules. Debhelper with compat=9 automatically enables the hardening flags when dh_auto_configure is used. So just by calling dh_auto_configure [...] the hardening flags get automatically passed to the build system. DEB_BUILD_MAINT_OPTIONS is also respected." Signed-off-by: Andreas Boll <andreas.boll....@gmail.com> diff --git a/debian/rules b/debian/rules index a8100d2..99d67fc 100755 --- a/debian/rules +++ b/debian/rules @@ -11,8 +11,7 @@ override_dh_auto_configure: # changelog entry: LS_CFLAGS=" " dh_auto_configure -- --disable-gtk \ --disable-silent-rules \ - --disable-arm-iwmmxt \ - $(shell dpkg-buildflags --export=configure) + --disable-arm-iwmmxt # Install in debian/tmp to retain control through dh_install: override_dh_auto_install: commit e47fb32ae3180d847a4f0e8f88f71174004b90b3 Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 11:34:44 2015 +0200 Enable vmx on ppc64el (closes: #786345). diff --git a/debian/changelog b/debian/changelog index 7db916f..37ddf53 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,6 +1,7 @@ pixman (0.33.2-1) UNRELEASED; urgency=medium * New upstream release candidate. + * Enable vmx on ppc64el (closes: #786345). -- Andreas Boll <andreas.boll....@gmail.com> Fri, 04 Sep 2015 11:29:52 +0200 diff --git a/debian/patches/ppc64el.diff b/debian/patches/ppc64el.diff deleted file mode 100644 index 34a4aa0..0000000 --- a/debian/patches/ppc64el.diff +++ /dev/null @@ -1,14 +0,0 @@ -diff --git a/configure.ac b/configure.ac -index dce76b3..172de8b 100644 ---- a/configure.ac -+++ b/configure.ac -@@ -540,6 +540,9 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ - #if defined(__GNUC__) && (__GNUC__ < 3 || (__GNUC__ == 3 && __GNUC_MINOR__ < 4)) - #error "Need GCC >= 3.4 for sane altivec support" - #endif -+#if defined(__PPC64__) && (__BYTE_ORDER__==__ORDER_LITTLE_ENDIAN__) -+#error VMX utilization is still not ready on ppc64el -+#endif - #include <altivec.h> - int main () { - vector unsigned int v = vec_splat_u32 (1); diff --git a/debian/patches/series b/debian/patches/series index eebecc8..708b774 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,2 +1 @@ -ppc64el.diff test-increase-timeout.diff commit 18e4bdcadf77910f2e22ce66b01b5bd98006c9fa Author: Andreas Boll <andreas.boll....@gmail.com> Date: Fri Sep 4 11:30:12 2015 +0200 Bump changelogs. diff --git a/ChangeLog b/ChangeLog index 2f951b8..96b8c28 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,10 +1,1548 @@ -commit 87eea99e443b389c978cf37efc52788bf03a0ee0 +commit ee790044b08e3b668e6aa5d9229f46ed7295ebf0 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sat Aug 1 22:34:53 2015 +0300 + + Pre-release version bump to 0.33.2 + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + +commit 8d9be3619a906855a3e3a1e052317833cb24cabe +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Wed Jul 1 14:34:07 2015 +0300 + + vmx: implement fast path iterator vmx_fetch_a8 + + no changes were observed when running cairo trimmed benchmarks. + + Running "lowlevel-blt-bench src_8_8888" on POWER8, 8 cores, + 3.4GHz, RHEL 7.1 ppc64le gave the following results: + + reference memcpy speed = 25197.2MB/s (6299.3MP/s for 32bpp fills) + + Before After Change + -------------------------------------------- + L1 965.34 3936 +307.73% + L2 942.99 3436.29 +264.40% + M 902.24 2757.77 +205.66% + HT 448.46 784.99 +75.04% + VT 430.05 819.78 +90.62% + R 412.9 717.04 +73.66% + RT 168.93 220.63 +30.60% + Kops/s 1025 1303 +27.12% + + It was benchmarked against commid id e2d211a from pixman/master + + Siarhei Siamashka reported that on playstation3, it shows the following + results: + + == before == + + src_8_8888 = L1: 194.37 L2: 198.46 M:155.90 (148.35%) + HT: 59.18 VT: 36.71 R: 38.93 RT: 12.79 ( 106Kops/s) + + == after == + + src_8_8888 = L1: 373.96 L2: 391.10 M:245.81 (233.88%) + HT: 80.81 VT: 44.33 R: 48.10 RT: 14.79 ( 122Kops/s) + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 47f74ca94637d79ee66c37a81eea0200e453fcc1 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Mon Jun 29 15:31:02 2015 +0300 + + vmx: implement fast path iterator vmx_fetch_x8r8g8b8 + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. + + cairo trimmed benchmarks : + + Speedups + ======== + t-firefox-asteroids 533.92 -> 489.94 : 1.09x + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit fcbb97d4458d717b9c15858aedcbee2d33c8ac5a +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sun Jun 28 23:25:24 2015 +0300 + + vmx: implement fast path scaled nearest vmx_8888_8888_OVER + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. + reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 134.36 181.68 +35.22% + L2 135.07 180.67 +33.76% + M 134.6 180.51 +34.11% + HT 121.77 128.79 +5.76% + VT 120.49 145.07 +20.40% + R 93.83 102.3 +9.03% + RT 50.82 46.93 -7.65% + Kops/s 448 422 -5.80% + + cairo trimmed benchmarks : + + Speedups + ======== + t-firefox-asteroids 533.92 -> 497.92 : 1.07x + t-midori-zoomed 692.98 -> 651.24 : 1.06x + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit ad612c4205f0ae46fc72a50e0c90ccd05487fcba +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sun Jun 28 22:23:44 2015 +0300 + + vmx: implement fast path vmx_composite_src_x888_8888 + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. + reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 1115.4 5006.49 +348.85% + L2 1112.26 4338.01 +290.02% + M 1110.54 2524.15 +127.29% + HT 745.41 1140.03 +52.94% + VT 749.03 1287.13 +71.84% + R 423.91 547.6 +29.18% + RT 205.79 194.98 -5.25% + Kops/s 1414 1361 -3.75% + + cairo trimmed benchmarks : + + Speedups + ======== + t-gnome-system-monitor 1402.62 -> 1212.75 : 1.16x + t-firefox-asteroids 533.92 -> 474.50 : 1.13x + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit fafc1d403b8405727d3918bcb605cb98044af90a +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sun Jun 28 10:14:20 2015 +0300 + + vmx: implement fast path vmx_composite_over_n_8888_8888_ca + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le. + + reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 61.92 244.91 +295.53% + L2 62.74 243.3 +287.79% + M 63.03 241.94 +283.85% + HT 59.91 144.22 +140.73% + VT 59.4 174.39 +193.59% + R 53.6 111.37 +107.78% + RT 37.99 46.38 +22.08% + Kops/s 436 506 +16.06% + + cairo trimmed benchmarks : + + Speedups + ======== + t-xfce4-terminal-a1 1540.37 -> 1226.14 : 1.26x + t-firefox-talos-gfx 1488.59 -> 1209.19 : 1.23x + + Slowdowns + ========= + t-evolution 553.88 -> 581.63 : 1.05x + t-poppler 364.99 -> 383.79 : 1.05x + t-firefox-scrolling 1223.65 -> 1304.34 : 1.07x + + The slowdowns can be explained in cases where the images are small and + un-aligned to 16-byte boundary. In that case, the function will first + work on the un-aligned area, even in operations of 1 byte. In case of + small images, the overhead of such operations can be more than the + savings we get from using the vmx instructions that are done on the + aligned part of the image. + + In the C fast-path implementation, there is no special treatment for the + un-aligned part, as it works in 4 byte quantities on the entire image. + + Because llbb is a synthetic test, I would assume it has much less + alignment issues than "real-world" scenario, such as cairo benchmarks, + which are basically recorded traces of real application activity. + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit a3e914407e354df70b9200e263608f1fc2e686cf +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 18 15:05:49 2015 +0300 + + vmx: implement fast path composite_add_8888_8888 + + Copied impl. from sse2 file and edited to use vmx functions + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 16 cores, 3.4GHz, ppc64le : + + reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 248.76 3284.48 +1220.34% + L2 264.09 2826.47 +970.27% + M 261.24 2405.06 +820.63% + HT 217.27 857.3 +294.58% + VT 213.78 980.09 +358.46% + R 176.61 442.95 +150.81% + RT 107.54 150.08 +39.56% + Kops/s 917 1125 +22.68% + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit d5b5343c7df99082597e0c37aec937dcf5b6602d +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 18 14:56:47 2015 +0300 + + vmx: implement fast path composite_add_8_8 + + Copied impl. from sse2 file and edited to use vmx functions + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 16 cores, 3.4GHz, ppc64le : + + reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 687.63 9140.84 +1229.33% + L2 715 7495.78 +948.36% + M 717.39 8460.14 +1079.29% + HT 569.56 1020.12 +79.11% + VT 520.3 1215.56 +133.63% + R 514.81 874.35 +69.84% + RT 341.28 305.42 -10.51% + Kops/s 1621 1579 -2.59% + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 339eeaf095f949694d7f79a45171ac03a3b06f90 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 18 14:12:05 2015 +0300 + + vmx: implement fast path composite_over_8888_8888 + + Copied impl. from sse2 file and edited to use vmx functions + + It was benchmarked against commid id 2be523b from pixman/master + + POWER8, 16 cores, 3.4GHz, ppc64le : + + reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills) + + Before After Change + --------------------------------------------- + L1 129.47 1054.62 +714.57% + L2 138.31 1011.02 +630.98% + M 139.99 1008.65 +620.52% + HT 122.11 468.45 +283.63% + VT 121.06 532.21 +339.62% + R 108.48 240.5 +121.70% + RT 77.87 116.7 +49.87% + Kops/s 758 981 +29.42% + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 0cc8a2e9714efcb7cdd7e2a94c9cba49c3e29e00 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sun Jun 28 09:42:19 2015 +0300 + + vmx: implement fast path vmx_fill + + Based on sse2 impl. + + It was benchmarked against commid id e2d211a from pixman/master + + Tested cairo trimmed benchmarks on POWER8, 8 cores, 3.4GHz, + RHEL 7.1 ppc64le : + + speedups + ======== + t-swfdec-giant-steps 1383.09 -> 718.63 : 1.92x speedup + t-gnome-system-monitor 1403.53 -> 918.77 : 1.53x speedup + t-evolution 552.34 -> 415.24 : 1.33x speedup + t-xfce4-terminal-a1 1573.97 -> 1351.46 : 1.16x speedup + t-firefox-paintball 847.87 -> 734.50 : 1.15x speedup + t-firefox-asteroids 565.99 -> 492.77 : 1.15x speedup + t-firefox-canvas-swscroll 1656.87 -> 1447.48 : 1.14x speedup + t-midori-zoomed 724.73 -> 642.16 : 1.13x speedup + t-firefox-planet-gnome 975.78 -> 911.92 : 1.07x speedup + t-chromium-tabs 292.12 -> 274.74 : 1.06x speedup + t-firefox-chalkboard 690.78 -> 653.93 : 1.06x speedup + t-firefox-talos-gfx 1375.30 -> 1303.74 : 1.05x speedup + t-firefox-canvas-alpha 1016.79 -> 967.24 : 1.05x speedup + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit c12ee95089e7d281a29a24bf56b81f5c16dec6ee +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Sun Jun 28 09:42:08 2015 +0300 + + vmx: add helper functions + + This patch adds the following helper functions for reuse of code, + hiding BE/LE differences and maintainability. + + All of the functions were defined as static force_inline. + + Names were copied from pixman-sse2.c so conversion of fast-paths between + sse2 and vmx would be easier from now on. Therefore, I tried to keep the + input/output of the functions to be as close as possible to the sse2 + definitions. + + The functions are: + + - load_128_aligned : load 128-bit from a 16-byte aligned memory + address into a vector + + - load_128_unaligned : load 128-bit from memory into a vector, + without guarantee of alignment for the + source pointer + + - save_128_aligned : save 128-bit vector into a 16-byte aligned + memory address + + - create_mask_16_128 : take a 16-bit value and fill with it + a new vector + + - create_mask_1x32_128 : take a 32-bit pointer and fill a new + vector with the 32-bit value from that pointer + + - create_mask_32_128 : take a 32-bit value and fill with it + a new vector + + - unpack_32_1x128 : unpack 32-bit value into a vector + + - unpacklo_128_16x8 : unpack the eight low 8-bit values of a vector + + - unpackhi_128_16x8 : unpack the eight high 8-bit values of a vector + + - unpacklo_128_8x16 : unpack the four low 16-bit values of a vector + + - unpackhi_128_8x16 : unpack the four high 16-bit values of a vector + + - unpack_128_2x128 : unpack the eight low 8-bit values of a vector + into one vector and the eight high 8-bit + values into another vector + + - unpack_128_2x128_16 : unpack the four low 16-bit values of a vector + into one vector and the four high 16-bit + values into another vector + + - unpack_565_to_8888 : unpack an RGB_565 vector to 8888 vector + + - pack_1x128_32 : pack a vector and return the LSB 32-bit of it + + - pack_2x128_128 : pack two vectors into one and return it + + - negate_2x128 : xor two vectors with mask_00ff (separately) + + - is_opaque : returns whether all the pixels contained in + the vector are opaque + + - is_zero : returns whether the vector equals 0 + + - is_transparent : returns whether all the pixels + contained in the vector are transparent + + - expand_pixel_8_1x128 : expand an 8-bit pixel into lower 8 bytes of a + vector + + - expand_alpha_1x128 : expand alpha from vector and return the new + vector + + - expand_alpha_2x128 : expand alpha from one vector and another alpha + from a second vector + + - expand_alpha_rev_2x128 : expand a reversed alpha from one vector and + another reversed alpha from a second vector + + - pix_multiply_2x128 : do pix_multiply for two vectors (separately) + + - over_2x128 : perform over op. on two vectors + + - in_over_2x128 : perform in-over op. on two vectors + + v2: removed expand_pixel_32_1x128 as it was not used by any function and + its implementation was erroneous + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 034149537be94862b43fb09699b8c2149bfe948d +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jul 2 11:04:20 2015 +0300 + + vmx: add LOAD_VECTOR macro + + This patch adds a macro for loading a single vector. + It also make the other LOAD_VECTORx macros use this macro as a base so + code would be re-used. + + In addition, I fixed minor coding style issues. + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 744134025609a0a5805c2d3b4d34856eb75cb711 +Author: Nemanja Lukic <nemanja.lu...@rt-rk.com> +Date: Fri Jun 27 18:05:39 2014 +0200 + + MIPS: update author's e-mail address + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + +commit e2d211ac491cd9884aae7ccaf18e5b3042469cf2 +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 13:54:01 2015 +0300 + + lowlevel-blt-bench: add option to skip memcpy measurement + + The memcpy speed measurement takes several seconds. When you are running + single tests in a harness that iterates dozens or hundreds of times, the + repeated measurements are redundant and take a lot of time. It is also + an open question whether the measured speed changes over long test runs + due to unidentified platform reasons (Raspberry Pi). + + Add a command line option to set the reference memcpy speed, skipping + the measuring. + + The speed is mainly used to compute how many iterations do run inside + the bench_*() functions, so for repeated testing on the same hardware, + it makes sense to lock that number to a constant. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 31cb0d4267f4f358b62f75fd42c4b1ae625be7ee +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 13:20:47 2015 +0300 + + lowlevel-blt-bench: add CSV output mode + + Add a command line option for choosing CSV output mode. + + In CSV mode, only the results in Mpixels/s are printed in an easily + machine-parseable format. All user-friendly printing is suppressed. + + This is intended for cases where you benchmark one particular operation + at a time. Running the "all" set of benchmarks will print just fine, but + you may have trouble matching rows to operations as you have to look at + the tests_tbl[] to see what row is which. + + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + + v2: don't add a space after comma in CSV. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit 9a7e0bc6d08c0324f09d6440270cd07201929f3f +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 12:41:57 2015 +0300 + + lowlevel-blt-bench: refactor to Mpx_per_sec() + + Refactor the Mpixels/s computations into a function. Easier to read and + better documents what is being computed. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 6e9c48c579e3325506234fa2ee7635f08f2c5a33 +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 12:53:09 2015 +0300 + + lowlevel-blt-bench: all bench funcs to return pix_cnt + + The bench_* functions, that did not already do it, are modified to + return the number of pixels processed during the benchmark. This moves + the computation to the site that actually determines the number, and + simplifies bench_composite() a bit. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 9e8f2bcaf5fabd3729ee0ecc90009fd6cea9e8e9 +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 12:02:17 2015 +0300 + + lowlevel-blt-bench: move speed and scaling printing + + Move the printing of the memory speed and scaling mode into a new + function. This will help with implementing a machine-readable output + option. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit a33c2e6853fe0a76da42a43ed7ed9095e2dbe6a2 +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 11:56:39 2015 +0300 + + lowlevel-blt-bench: print single pattern details + + When given just a single test pattern instead of "all", print the test + details. This can be used to verify the pattern parser agrees with the + user, just like scaling settings are printed. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 3ac7ae201758fe99627fdb2adf783be4063a9b1f +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 11:34:45 2015 +0300 + + lowlevel-blt-bench: make test_entry::testname const + + We assign string literals to it, so it better be const. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 56d8b365f5944bf78a427ac65c5a0d0311e0da5e +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 11:21:14 2015 +0300 + + lowlevel-blt-bench: move explanation printing + + Move explanation printing to a new function. This will help with + implementing a machine-readable output option. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit bddff993ed734f4b9030c1960bcb3ebe1caca807 +Author: Pekka Paalanen <pekka.paala...@collabora.co.uk> +Date: Wed Jun 10 11:14:38 2015 +0300 + + lowlevel-blt-bench: move usage to a function + + Move printing of usage into a new function and use argv[0] as the + program name. This will help printing usage from multiple places. + + Signed-off-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Reviewed-by: Ben Avison <bavi...@riscosopen.org> + +commit 2be523b20402b7c9f548ac33b8c0f0ed00156c64 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 25 15:59:57 2015 +0300 + + vmx: fix pix_multiply for ppc64le + + vec_mergeh/l operates differently for BE and LE, because of the order of + the vector elements (l->r in BE and r->l in LE). + To fix that, we simply need to swap between the input parameters, in case + we are working in LE. + + v2: + + - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency + - fixed whitespaces and indentation issues + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Reviewed-by: Adam Jackson <a...@redhat.com> + Acked-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit 8d379ad88e208bed9697065f6911c9ef83d85276 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 25 15:59:56 2015 +0300 + + vmx: fix unused var warnings + + v2: don't put ';' at the end of macro definition. Instead, move it to + each line the macro is used. + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Reviewed-by: Adam Jackson <a...@redhat.com> + Acked-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit ff66a4a3ce95f2adcbf30b354eac60944596d6a2 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 25 15:59:55 2015 +0300 + + vmx: encapsulate the temporary variables inside the macros + + v2: fixed whitespaces and indentation issues + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Reviewed-by: Adam Jackson <a...@redhat.com> + Acked-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit f6a26d09257dde9cd41144120543c8b754de515f +Author: Fernando Seiti Furusato <ferse...@linux.vnet.ibm.com> +Date: Thu Jun 25 15:59:54 2015 +0300 + + vmx: adjust macros when loading vectors on ppc64le + + Replaced usage of vec_lvsl to direct unaligned assignment + operation (=). That is because, according to Power ABI Specification, + the usage of lvsl is deprecated on ppc64le. + + Changed COMPUTE_SHIFT_{MASK,MASKS,MASKC} macro usage to no-op for powerpc + little endian since unaligned access is supported on ppc64le. + + v2: + + - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency + - fixed whitespaces and indentation issues + + Signed-off-by: Fernando Seiti Furusato <ferse...@linux.vnet.ibm.com> + Reviewed-by: Adam Jackson <a...@redhat.com> + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Acked-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit b3a61703f41c6b34ba2ec9736030e1df04f53ab4 +Author: Oded Gabbay <oded.gab...@gmail.com> +Date: Thu Jun 25 15:59:53 2015 +0300 + + vmx: fix splat_alpha for ppc64le + + The permutation vector isn't correct for LE, so correct its values + in case we are in LE mode. + + v2: + + - replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency + - change #ifndef to #ifdef for readability + + Signed-off-by: Oded Gabbay <oded.gab...@gmail.com> + Reviewed-by: Adam Jackson <a...@redhat.com> + Acked-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + +commit eebc1b78200aff075dbcae9c8d00edad1f830d91 +Author: Ben Avison <bavi...@riscosopen.org> +Date: Tue May 26 23:58:29 2015 +0100 + + mmx/sse2: Use SIMPLE_NEAREST_SOLID_MASK_FAST_PATH for NORMAL repeat + + These two architectures were the only place where + SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the + equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used + immediately afterwards, so including the NORMAL case in the main macro + simplifies the fast path table. + + [Pekka: removed extra comma from the end of + SIMPLE_NEAREST_SOLID_MASK_FAST_PATH] + + Reviewed-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> + +commit 7f6692807902b840b81f860fb2196d2fb242d977 +Author: Ben Avison <bavi...@riscosopen.org> +Date: Tue May 26 23:58:28 2015 +0100 + + mmx/sse2: Use SIMPLE_NEAREST_FAST_PATH macro + + There is some reordering, but the only significant thing to ensure that + the same routine is chosen is that a COVER fast path for a given + combination of operator and source/destination pixel formats must + precede all the variants of repeated fast paths for the same + combination. This patch (and the other mmx/sse2 one) still follows that + rule. + + I believe that in every other case, the set of operations that match any + pair of fast paths that are reordered in these patches are mutually + exclusive. While there will be a very subtle timing difference due to + the distance through the table we have to search to find a match + (sometimes faster, sometime slower) there is no evidence that the tables + have been carefully ordered by frequency of occurrence - just for ease + of copy-and-pasting. + + Reviewed-by: Pekka Paalanen <pekka.paala...@collabora.co.uk> + Acked-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> +