sön 2023-12-31 klockan 15:54 +0100 skrev Thilo Borgmann via ffmpeg- devel: > > Am 31.12.23 um 13:56 schrieb Tomas Härdin: > > > + for (int y = 0; y < height; y++) { > > > + const uint8_t *src1 = src1_data[0] + y * > > > src1_linesize[0]; > > > + const uint8_t *src2 = src2_data[0] + (y + pos_y) * > > > src2_linesize[0] + pos_x * src2_step[0]; > > > + uint8_t *dest = dest_data[0] + (y + pos_y) * > > > dest_linesize[0] + pos_x * sizeof(uint32_t); > > > + for (int x = 0; x < width; x++) { > > > + int src1_alpha = src1[0]; > > > + int src2_alpha = src2[0]; > > > + > > > + if (src1_alpha == 255) { > > > + memcpy(dest, src1, sizeof(uint32_t)); > > > + } else if (src1_alpha + src2_alpha == 0) { > > > + memset(dest, 0, sizeof(uint32_t)); > > > + } else { > > > + int tmp_alpha = src2_alpha - > > > ROUNDED_DIV(src1_alpha > > > * src2_alpha, 255); > > > + int blend_alpha = src1_alpha + tmp_alpha; > > > + > > > + dest[0] = blend_alpha; > > > + dest[1] = ROUNDED_DIV(src1[1] * src1_alpha + > > > src2[1] > > > * tmp_alpha, blend_alpha); > > > + dest[2] = ROUNDED_DIV(src1[2] * src1_alpha + > > > src2[2] > > > * tmp_alpha, blend_alpha); > > > + dest[3] = ROUNDED_DIV(src1[3] * src1_alpha + > > > src2[3] > > > * tmp_alpha, blend_alpha); > > > + } > > > > Is branching and a bunch of function calls (which I hope get > > optimized > > out) really faster than just always doing the blending? > > If I trust my START_TIMER/STOP_TIMER interpretation, I'd say so: > > With branches: > 253315 UNITS in blend_alpha_yuva, 128 runs, 0 skips > > Always blending: > 351104 UNITS in blend_alpha_yuva, 128 runs, 0 skips
Alright. Still curious if it can be sped up by checking multiple pixels at a time. But that can be done later > > > > +static int blend_frame_into_canvas(WebPContext *s) > > > +{ > > > + AVFrame *canvas = s->canvas_frame.f; > > > + AVFrame *frame = s->frame; > > > + int width, height; > > > + int pos_x, pos_y; > > > + > > > + if ((s->anmf_flags & ANMF_BLENDING_METHOD) == > > > ANMF_BLENDING_METHOD_OVERWRITE > > > + || frame->format == AV_PIX_FMT_YUV420P) { > > > + // do not blend, overwrite > > > + > > > + if (canvas->format == AV_PIX_FMT_ARGB) { > > > + width = s->width; > > > + height = s->height; > > > + pos_x = s->pos_x; > > > + pos_y = s->pos_y; > > > + > > > + for (int y = 0; y < height; y++) { > > > + const uint32_t *src = (uint32_t *) (frame- > > > >data[0] + > > > y * frame->linesize[0]); > > > + uint32_t *dst = (uint32_t *) (canvas->data[0] + > > > (y + > > > pos_y) * canvas->linesize[0]) + pos_x; > > > + memcpy(dst, src, width * sizeof(uint32_t)); > > > + } > > > > This could be reduced to a single memcpy() when linesizes are > > equal. > > Same for the other memcpy()s > > Its a subimage copied into a canvas (see pos_x and pos_y). > Has to be copied line-by-line. Ah, I missed that /Tomas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".