On Sun, 28 Feb 2016, Alexander Duyck wrote:
> I actually found the root cause. The problem is in add32_with_carry3.
>
> > +static inline unsigned int add32_with_carry3(unsigned int a, unsigned int
> > b,
> > +unsigned int c)
> > +{
> > + asm("ad
On Sun, Feb 28, 2016 at 11:15 AM, Tom Herbert wrote:
> On Sun, Feb 28, 2016 at 10:56 AM, Alexander Duyck
> wrote:
>> On Sat, Feb 27, 2016 at 12:30 AM, Alexander Duyck
>> wrote:
+{
+ asm("lea 40f(, %[slen], 4), %%r11\n\t"
+ "clc\n\t"
+ "jmpq *%%r11\n\
I was just noticing that these two:
> +static inline unsigned long add64_with_carry(unsigned long a, unsigned long
> b)
> +{
> + asm("addq %2,%0\n\t"
> + "adcq $0,%0"
> + : "=r" (a)
> + : "0" (a), "rm" (b));
> + return a;
> +}
> +
> +static inline unsigne
On ven., 2016-02-26 at 12:03 -0800, Tom Herbert wrote:
> +
> + /*
> + * Length is greater than 64. Sum to eight byte alignment before
> + * proceeding with main loop.
> + */
> + aligned = !!((unsigned long)buff & 0x1);
> + if (aligned) {
> + unsigned int align
On Sun, Feb 28, 2016 at 10:56 AM, Alexander Duyck
wrote:
> On Sat, Feb 27, 2016 at 12:30 AM, Alexander Duyck
> wrote:
>>> +{
>>> + asm("lea 40f(, %[slen], 4), %%r11\n\t"
>>> + "clc\n\t"
>>> + "jmpq *%%r11\n\t"
>>> + "adcq 7*8(%[src]),%[res]\n\t"
>>> +
On Sat, Feb 27, 2016 at 12:30 AM, Alexander Duyck
wrote:
>> +{
>> + asm("lea 40f(, %[slen], 4), %%r11\n\t"
>> + "clc\n\t"
>> + "jmpq *%%r11\n\t"
>> + "adcq 7*8(%[src]),%[res]\n\t"
>> + "adcq 6*8(%[src]),%[res]\n\t"
>> + "adcq 5*8(%[src]),%[re
> +{
> + asm("lea 40f(, %[slen], 4), %%r11\n\t"
> + "clc\n\t"
> + "jmpq *%%r11\n\t"
> + "adcq 7*8(%[src]),%[res]\n\t"
> + "adcq 6*8(%[src]),%[res]\n\t"
> + "adcq 5*8(%[src]),%[res]\n\t"
> + "adcq 4*8(%[src]),%[res]\n\t"
> +
On Fri, Feb 26, 2016 at 2:52 PM, Alexander Duyck
wrote:
>
> I'm still not a fan of the unaligned reads. They may be okay but it
> just seems like we are going run into corner cases all over the place
> where this ends up biting us.
No.
Unaligned reads are not just "ok".
The fact is, not doing
On Fri, Feb 26, 2016 at 7:11 PM, Tom Herbert wrote:
> On Fri, Feb 26, 2016 at 2:52 PM, Alexander Duyck
> wrote:
>> On Fri, Feb 26, 2016 at 12:03 PM, Tom Herbert wrote:
>>> This patch implements performant csum_partial for x86_64. The intent is
>>> to speed up checksum calculation, particularly f
On Fri, Feb 26, 2016 at 2:52 PM, Alexander Duyck
wrote:
> On Fri, Feb 26, 2016 at 12:03 PM, Tom Herbert wrote:
>> This patch implements performant csum_partial for x86_64. The intent is
>> to speed up checksum calculation, particularly for smaller lengths such
>> as those that are present when do
On Fri, Feb 26, 2016 at 12:03 PM, Tom Herbert wrote:
> This patch implements performant csum_partial for x86_64. The intent is
> to speed up checksum calculation, particularly for smaller lengths such
> as those that are present when doing skb_postpull_rcsum when getting
> CHECKSUM_COMPLETE from d
On Fri, Feb 26, 2016 at 12:29 PM, Linus Torvalds
wrote:
> Looks ok to me.
>
> I am left wondering if the code should just do that
>
> add32_with_carry3(sum, result >> 32, result);
>
> in the caller instead - right now pretty much every return point in
> do_csum() effectively does that, with t
Looks ok to me.
I am left wondering if the code should just do that
add32_with_carry3(sum, result >> 32, result);
in the caller instead - right now pretty much every return point in
do_csum() effectively does that, with the exception of
- the 0-length case, which is presumably not really
This patch implements performant csum_partial for x86_64. The intent is
to speed up checksum calculation, particularly for smaller lengths such
as those that are present when doing skb_postpull_rcsum when getting
CHECKSUM_COMPLETE from device or after CHECKSUM_UNNECESSARY conversion.
- v4
- wen
14 matches
Mail list logo