> > So in neither of those scenarios testing maxsize=minsize alone makes too > > much sense to me... What was the original motivation for differentiating > > between precisely known size?
There is a case that could meet small maxsize. https://godbolt.org/z/489Tf7ssj typedef unsigned char e_u8; #define MAXBC 8 void MixColumn(e_u8 a[4][MAXBC], e_u8 BC) { e_u8 b[4][MAXBC]; int i, j; for(i = 0; i < 4; i++) for(j = 0; j < BC; j++) a[i][j] = b[i][j]; } Where BC is unsigned char so maxsize will be 256. If we set stringop_alg to rep_1_byte the code could be like movzbl %sil, %r8d movq %rdi, %rdx leaq -40(%rsp), %rax movq %r8, %r9 leaq -8(%rsp), %r10 testb %r9b, %r9b je .L5 movq %rdx, %rdi movq %rax, %rsi movq %r8, %rcx rep movsb addq $8, %rax addq $8, %rdx cmpq %r10, %rax jne .L2 ret In our test we found this is much slower than current trunk because rep movsb triggers machine clear events, while in the current trunk such small size is handled in the loop mov epilogue and rep movsq is never executed. So here we disabled inline for unknown size to avoid potential issues like this. H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年4月1日周四 上午1:55写道: > > On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > > Reading through the optimization manual it seems that mosvb is fast for > > > > small block no matter if the size is hard wired. In that case you > > > > probably want to check whetehr max_size or expected_size is known to be > > > > small rather than max_size == min_size and both being small. > > > > > > > > But it depends on what CPU really does. > > > > Honza > > > > > > For small data size, rep movsb is faster only under certain conditions. > > > We > > > can continue fine tuning rep movsb. > > > > OK, I however wonder why you need condtion maxsize=minsize. > > - If CPU is looking for movl $cst, %rcx than we probably want to be > > sure that it is not moved away fro rep ;movsb by adding fused pattern > > - If rep movsb is slower than loop for very small blocks then you want > > to set lower bound on minsize & expected size, but you do not need > > to require maxsize=minsize > > - If rep movsb is slower than sequence of moves for small blocks then > > one needs to tweak move by pieces > > - If rep movsb is slower for larger blocks than you want to test > > maxsize and expected size > > So in neither of those scenarios testing maxsize=minsize alone makes too > > much sense to me... What was the original motivation for differentiating > > between precisely known size? > > > > I am mostly curious because it is not that uncomon to have small maxsize > > because we are able to track the object size and using short sequence > > for those would be nice. > > > > Having minsize non-trivial may not be that uncommon these days either > > given that we track value ranges (and under assumption that > > memcpy/memset expanders was updated to take these into account). > > > > Hongyu has done some analysis on this. Hongyu, can you share what > you got? > > Thanks. > > -- > H.J. -- Regards, Hongyu, Wang