2014-12-26 20:01 GMT+01:00 Michael Niedermayer <michae...@gmx.at>: > [...] > >> +static uint64_t F(uint64_t F_IN, uint64_t KE) >> +{ >> + uint32_t Zl, Zr; > >> + Zl = (F_IN >> 32) ^ (KE >> 32); >> + Zr = (F_IN & MASK32) ^ (KE & MASK32); > > KE ^= F_IN; > Zl = KE >> 32; > Zr = KE & MASK32; > > >> + Zl = ((SBOX1[(Zl >> 24) & MASK8] << 24) | (SBOX2[(Zl >> 16) & MASK8] << >> 16) |(SBOX3[(Zl >> 8) & MASK8] << 8) |(SBOX4[Zl & MASK8])); >> + Zr = ((SBOX2[(Zr >> 24) & MASK8] << 24) | (SBOX3[(Zr >> 16) & MASK8] << >> 16) |(SBOX4[(Zr >> 8) & MASK8] << 8) |(SBOX1[Zr & MASK8])); > > (Zl >> 24) and (Zr >> 24) are limited to 8bit they should not need > & MASK8 > > ((uint32_t)SBOX1[Zl >> 24]) << 24)
Maybe this will be useful later: on 64-bit processors, if MASK8 is a 64-bit constant, this may be faster: KE ^= F_IN; Zl = ((uint32_t)SBOX1[KE >> 56] << 24) | ((uint32_t)SBOX2[(KE >> 48) & MASK8] << 16) | ... > + Zl ^= LR32(Zr, 8); > + Zr ^= LR32(Zl, 16); > + Zl ^= RR32(Zr, 8); > + Zr ^= RR32(Zl, 8); The instructions above have a long critical path (each one depends on the previous one), and this is probably where we lose most speed at the moment. > it would also be possible to reduce the number of operations at the > expense of larger tables but iam not sure that would be a good idea On 64-bit processors, a big speedup can be obtained by computing S and P operation together, using 8 8x64 bit sboxes (a total of 16kB of data) that can be computed in the initialization phase from SBOX1...SBOX4. But all these suggestions can be implemented later. My main objection with this patch is using one big array for all subkeys. > > > [...] > >> +static const int shift1[2][6] = { >> + {0, 15, 30, 17, 17, 17}, >> + {0, 15, 15, 15, 34, 17} >> +}; >> +static const int pos1[2][6] = { >> + {0, 4, 10, 16, 18, 22}, >> + {2, 6, 8, 14, 20, 24} >> +}; >> +static const int pos2[4][4]= { >> + {0, 12, 16, 22}, >> + {6, 14, 24, 28}, >> + {2, 10, 20, 32}, >> + {4, 8, 18, 26} >> +}; >> +static const int shift2[4][5]= { >> + {0, 45, 15, 17}, >> + {15, 30, 32, 17}, >> + {0, 30, 30, 51}, >> + {15, 15, 30, 34} >> +}; > > these could be made uint8_t > > [...] > > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User > questions about the command line tools should be sent to the ffmpeg-user ML. > And questions about how to use libav* should be sent to the libav-user ML. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel