Am 31.12.23 um 13:56 schrieb Tomas Härdin:
+    for (int y = 0; y < height; y++) {
+        const uint8_t *src1 = src1_data[0] + y * src1_linesize[0];
+        const uint8_t *src2 = src2_data[0] + (y + pos_y) *
src2_linesize[0] + pos_x * src2_step[0];
+        uint8_t       *dest = dest_data[0] + (y + pos_y) *
dest_linesize[0] + pos_x * sizeof(uint32_t);
+        for (int x = 0; x < width; x++) {
+            int src1_alpha = src1[0];
+            int src2_alpha = src2[0];
+
+            if (src1_alpha == 255) {
+                memcpy(dest, src1, sizeof(uint32_t));
+            } else if (src1_alpha + src2_alpha == 0) {
+                memset(dest, 0, sizeof(uint32_t));
+            } else {
+                int tmp_alpha = src2_alpha - ROUNDED_DIV(src1_alpha
* src2_alpha, 255);
+                int blend_alpha = src1_alpha + tmp_alpha;
+
+                dest[0] = blend_alpha;
+                dest[1] = ROUNDED_DIV(src1[1] * src1_alpha + src2[1]
* tmp_alpha, blend_alpha);
+                dest[2] = ROUNDED_DIV(src1[2] * src1_alpha + src2[2]
* tmp_alpha, blend_alpha);
+                dest[3] = ROUNDED_DIV(src1[3] * src1_alpha + src2[3]
* tmp_alpha, blend_alpha);
+            }

Is branching and a bunch of function calls (which I hope get optimized
out) really faster than just always doing the blending?

If I trust my START_TIMER/STOP_TIMER interpretation, I'd say so:

With branches:
253315 UNITS in blend_alpha_yuva,     128 runs,      0 skips

Always blending:
351104 UNITS in blend_alpha_yuva,     128 runs,      0 skips



It mgiht also be worthwhile to check 8 bytes at a time against
UINT64_MAX and 0. That doesn't need to hold up this patch though. Same
with the YUVA version.

+static int blend_frame_into_canvas(WebPContext *s)
+{
+    AVFrame *canvas = s->canvas_frame.f;
+    AVFrame *frame  = s->frame;
+    int width, height;
+    int pos_x, pos_y;
+
+    if ((s->anmf_flags & ANMF_BLENDING_METHOD) ==
ANMF_BLENDING_METHOD_OVERWRITE
+        || frame->format == AV_PIX_FMT_YUV420P) {
+        // do not blend, overwrite
+
+        if (canvas->format == AV_PIX_FMT_ARGB) {
+            width  = s->width;
+            height = s->height;
+            pos_x  = s->pos_x;
+            pos_y  = s->pos_y;
+
+            for (int y = 0; y < height; y++) {
+                const uint32_t *src = (uint32_t *) (frame->data[0] +
y * frame->linesize[0]);
+                uint32_t *dst = (uint32_t *) (canvas->data[0] + (y +
pos_y) * canvas->linesize[0]) + pos_x;
+                memcpy(dst, src, width * sizeof(uint32_t));
+            }

This could be reduced to a single memcpy() when linesizes are equal.
Same for the other memcpy()s

Its a subimage copied into a canvas (see pos_x and pos_y).
Has to be copied line-by-line.

Same for the other loops.

-Thilo

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to