Uwe Freese (2019-01-01): > > This can be optimized, and since it is the inner loop of the filter, I > > think it is worth it. You can declare a pointer that will stay the same > > for the whole y loop: > > > > double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1); > > > > and then use it in that loop: > > > > interpd += ... * xweight[abs(bx - (x - logo_x1))]; > > > > It avoids a lot of multiplications. > > I'm not sure about this point. First, I would absolutely assume from the > compiler that it optimizes expressions like "(logo_w - 1)" or a fixed offset > to access the array in the loop and that they are only calculated once.
Relying a lot on compiler optimizations is a sure way of being disappointed. But the logo_w_minus_one variable was not about optimization but about organization. Each time there is the same computation at several places, it is a sign that a specific variable should probably be used. That way, if the design changes a little, only the initialization of the variable needs to be changed. In this particular instance, logo_w_minus_one was not a good name (it was for explaining); table_stride would be better. That way, if you decide to change the structure of the table, you do not need to look at all uses of logo_w, you just need to update the initialization of table_stride. > Secondly, I don't exactly understand how *xweight in your example should be > used (and would mean smaller or easier code). XXX > > > + interp = (uint64_t)(interpd / > > > + uglarmweightsum[(x - logo_x1) - 1 + (y - logo_y1 - > > > 1) * (logo_w - 2)]); > > The cast to uint64_t seems suspicious. Can you explain? > > Every pixel value of the inner logo area is an integer. Only for the > calculation of the weighted average, all values use floats. At the end, it > is rounded (truncated) to an int. int and uint64_t are not the same thing. Why uint64_t? > > pow(x * x * aspect2 + y * y, exponent / 2); > > Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't do this > normally. In this particular instance, definitely yes. Compilers have very little latitude to optimize floating point operations, and they will certainly not optimize mathematical functions based on subtle arithmetic properties. This is a C compiler, not a TI-89. > > But I have a better suggestion: > > > > #define MAX_SUB 2 > > > > double uglarmtable[MAX_SUB + 1][MAX_SUB + 1] > > > > and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the > > table will be computed only once for Y and A and once for U and V. > > > > The assert is still needed with that solution, though. > > I don't understand this. Please provide a complete example, or it stays as > it is. - and could of course be optimized later. I do not see how it could be more complete without including code that is irrelevant to the example. But since you insist: #define MAX_SUB 2 typedef struct DelogoContext { ... double uglarmtable[MAX_SUB + 1][MAX_SUB + 1] ... } ... av_assert0(hsub <= MAX_SUB && vsub <= MAX_SUB); if (!s->uglarmtable[hsub][vsub]) s->uglarmtable[hsub][vsub] = av_malloc_array(...); ... calc_uglarm_tables(s->uglarmtable[hsub][vsub], s->uglarmweightsum[hsub][vsub]); ... apply_delogo(..., s->uglarmtable[hsub][vsub], s->uglarmweightsum[hsub][vsub]); > But why should the for loop run in xy mode and check "planes count" times to > free the memory? The code without the "if" also looks to me more like this > is not checked by mistake. This is the usual way this project does things: free everything unconditionally. The rationale is that it is less likely to lead to leaks in case of code change. > Hope that the code is correct like this? > > s->uglarmtable[plane] = > av_malloc_array((logo_w - 1) * (logo_h - 1), > sizeof(s->uglarmtable[plane])); Sorry, no: you are taking the size of the pointer uglarmtable[plane]; you need to take the size of the pointed objects *uglarmtable[plane]. Regards, -- Nicolas George _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel