Hi Jiu Fu,
On Mon, Jul 13, 2020 at 07:50:28PM +0800, guojiufu wrote:
> For very small loops (< 6 insns), it would be fine to unroll 4
> times to run fast with less latency and better cache usage.
> - /* TODO: This is hardcoded to 10 right now. It can be refined, for
> - example we may
Hi,
For very small loops (< 6 insns), it would be fine to unroll 4
times to run fast with less latency and better cache usage. Like
below loops:
while (i) a[--i] = NULL; while (p < e) *d++ = *p++;
With this patch enhances, we could see some performance improvement
for some workloads(e.g. SPE