On 11 May 2017 at 12:27, Sekhar, Ashwin <ashwin.sek...@cavium.com> wrote: > > On Thu, 2017-05-11 at 04:14 +0000, Sekhar, Ashwin wrote: > ... >> > > Combining all the above comments, I made some changes on top of >> > > your >> > > patch. These changes are giving 3-4% improvement over your >> > > version. >> > > >> > > You may find the changes at >> > > https://gist.github.com/ashwinyes/34cbdd999784402c859c71613587faf >> > > c >> > > >> > Is the correct in Line 103/104, you only process one packets in the >> > last FWDSTEP packets? >> Its doing processx4_* there. So its processing 4 packets. >> >> > >> > Actually, I don't like your change in l3fwd_lpm_send_packets, >> > making >> > the simple logic complicated. And I don't think it can help to >> > improve >> > performance. :-) >> Its not making it complicated. The number of lines of code may be >> higher by may be 10 lines, but the conditions of the loops are >> simplified which reduces the number of branch instructions and helps >> the processor to go through them faster.
I suspected not much improvement we can get. >> >> If possible, please try it out on your machine. OK, I'll test. If no performance regression, I'll adopt your suggestion in v3. > > Missed out one point. > Since 2 loops are form "for (i = 0; i < FWDSTEP; i++)" i.e. looping for > constant number of iterations, compiler will easily unroll them. > > Thanks > Ashwin >> > >> > >> > > >> > > >> > > Please check it out and let me know your comments. >> > > >> > > Thanks >> > > Ashwin