> * Reshetova, Elena <elena.reshet...@intel.com> wrote: > > > > * Reshetova, Elena <elena.reshet...@intel.com> wrote: > > > > > > > CONFIG_PAGE_TABLE_ISOLATION=n: > > > > > > > > base: Simple syscall: 0.0510 > > > > microseconds > > > > get_random_bytes(4096 bytes buffer): Simple syscall: 0.0597 > > > > microseconds > > > > > > > > So, pure speed wise get_random_bytes() with 1 page per-cpu buffer wins. > > > > > > It still adds +17% overhead to the system call path, which is sad. > > > Why is it so expensive? > > > > I guess I can experiment further with buffer size increase and/or > > using HW acceleration (I mostly played around different rdrand paths now). > > > > What would be acceptable overheard approximately (so that I know how > > much I need to squeeze this thing)? > > As much as possible? No idea, I'm sad about anything that is more than > 0%, and I'd be *really* sad about anything more than say 1-2%.
Ok, understood. > > I find it ridiculous that even with 4K blocked get_random_bytes(), which > gives us 32k bits, which with 5 bits should amortize the RNG call to > something like "once per 6553 calls", we still see 17% overhead? It's > either a measurement artifact, or something doesn't compute. If you check what happens underneath of get_random_bytes(), there is a fair amount of stuff that is going on, including reseeding CRNG if reseeding interval has passed (see _extract_crng()). It also even attempts to stir in more entropy from rdrand if avalaible: I will look into this whole construction slowly now to investigate. I did't optimize anything yet also (I take 8 bits at the time for offset), but these small optimization won't make performance impact from 17% --> 2%, so pointless for now, need a more radical shift. Best Regards, Elena.