It seems that the current implementation is much better than it was in the RHEL-based kernel.
On Thu, Jan 19, 2017 at 9:43 AM, Alexandre DERUMIER <aderum...@odiso.com> wrote: > Hi, > > I have reenable THP ( transparent_hugepage=madvise) since around 1 year > (with pve-kernel 4.2-4.4), and I don't have problem anymore like in the > past. > > I'm hosting a lot of database (mysql,sqlserver, redis, mongo,...) and I > don't have seen performance impact since I have reenable THP. > > So I think it's pretty safe to set it by default. > > > > > ----- Mail original ----- > De: "Fabian Grünbichler" <f.gruenbich...@proxmox.com> > À: "pve-devel" <pve-devel@pve.proxmox.com> > Cc: "aderumier" <aderum...@odiso.com>, "Andreas Steinel" < > a.stei...@gmail.com> > Envoyé: Jeudi 19 Janvier 2017 09:35:43 > Objet: transparent huge pages support / disk passthrough corruption > > So it seems like the recently reported problems[1] with disk pass > through using virtio-scsi(-single) are caused by a combination of Qemu > since 2.7 not handling memory fragmentation (well) and our compiled-in > default of disabling transparent huge pages on the kernel side. > > While I will investigate further and see whether this is not fixable on > the Qemu side as well, I think it would be a good idea to revisit the > decision to patch this default in[2]. > > @Andreas, Alexandre: you both where proponents of disabling THP support > back then, but the current kernel docs[3] say (emphasis mine): > > -----%<----- > Transparent Hugepage Support can be entirely disabled (*mostly for > debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to > avoid the risk of consuming more memory resources) or enabled system > wide. This can be achieved with one of: > > echo always >/sys/kernel/mm/transparent_hugepage/enabled > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > echo never >/sys/kernel/mm/transparent_hugepage/enabled > > It's also possible to limit defrag efforts in the VM to generate > hugepages in case they're not immediately free to madvise regions or > to never try to defrag memory and simply fallback to regular pages > unless hugepages are immediately available. Clearly if we spend CPU > time to defrag memory, we would expect to gain even more by the fact > we use hugepages later instead of regular pages. This isn't always > guaranteed, but it may be more likely in case the allocation is for a > MADV_HUGEPAGE region. > > echo always >/sys/kernel/mm/transparent_hugepage/defrag > echo madvise >/sys/kernel/mm/transparent_hugepage/defrag > echo never >/sys/kernel/mm/transparent_hugepage/defrag > ----->%----- > > so I think setting both enabled and defrag to "madvise" by default would > be advisable - the admin can override it (permanently with a kernel boot > parameter, or at run time with the sysfs interface) anyway if they > really know it causes performance issues. > > if you have any hard benchmark data to back up staying at "never", > please send it soon ;) preferable both with non-transparent hugepages > setup and without, and with both "always" and "madvise" for enabled and > defrag. > > running a setup that is intended for debugging purposes (see above) as > default seems strange to me (and this was probably the reason why we > needed to patch "never" as default in in the first place). while I am > not yet convinced that this solves the passthrough data corruption issue > entirely, it is very reliably reproducable with THP disabled, and not at > all so far on my test setup with THP enabled - so I propose switching > with the next kernel update, unless there are (serious) objections. > > 1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_ > scsi-regression.31471/ > 2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html > 3. https://git.kernel.org/cgit/linux/kernel/git/stable/linux- > stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95 > > _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel