dm-delay looks very interesting along those lines. https://www.enodev.fr/posts/emulate-a-slow-block-device-with-dm- delay.html
https://www.kernel.org/doc/Documentation/device-mapper/delay.txt On Tue, Feb 6, 2018 at 5:06 PM, Jason Hobbs <jason.ho...@canonical.com> wrote: > On Tue, Feb 6, 2018 at 4:50 PM, Andres Rodriguez > <andres...@ubuntu-pe.org> wrote: >> I don't have logs anymore as I have since rebuilt my environment, but I can >> confirm seeing improvements on a maas server running with high IO (note it >> was a single region/rack). >> >> see inlien: >> >> >> On Tue, Feb 6, 2018 at 5:17 PM, Jason Hobbs <jason.ho...@canonical.com> >> wrote: >> >>> Andres, it was a single test in both cases, and in both cases there was >>> almost no delay from MAAS. It's not significant enough to call it >>> positive results. >>> >>> >> Comment #93 shows there are /some/ improvements when comparing those two >> samples only, but as I have already said, we need data over time to in both >> scenarios to properly compare and determine whether the changes do make any >> material performance improvements with the current conditions of the >> samples (both samples are with a fixed io starvation on the environment). >> >> >>> Since neither of you answered yes, I'll assume the answer was no to my >>> question of whether there was anything in my logs or data that showed >>> reading the template from disk on the rack controller was the culprit, >>> and that this fix just represents a guess at what might be causing the >>> delay. >>> >> >> To be fair, your logs do not provide anything concrete to determine what's >> the culprit of the issue on the MAAS side. It provides a lot of clues, and >> we have since then determine that those issues were a result of IO >> starvation (from the VM's writing to disk). As such, the only way we can >> *really* see if the patch brings any significant performance improvements >> is to run tests in the environment were you were seeing the issues in the >> first place. > > I didn't think my logs provided anything concrete! That's because the > logging built into MAAS is not sufficient enough to do so. > > I can't break that environment to test anymore - we got it working > thanks to you guy's help and it's a production environment that needs > to keep running other tests. > > It might possible to recreate this on another maas server, using > 'stress' or a similar tool to cause disk contention. > > Jason > >> As such, if you are willing to test if these make any material difference, >> I would unfix your environment and do two runs (one without the fix, and >> one with the fix). That's the only way we can really compare and be certain >> in *your* environment. >> >>> >>> -- >>> You received this bug notification because you are subscribed to MAAS. >>> https://bugs.launchpad.net/bugs/1743249 >>> >>> Title: >>> Failed Deployment after timeout trying to retrieve grub cfg >>> >>> To manage notifications about this bug go to: >>> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions >>> >>> Launchpad-Notification-Type: bug >>> Launchpad-Bug: product=maas; milestone=2.4.x; status=New; >>> importance=Undecided; assignee=None; >>> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main; >>> status=Fix Released; importance=Medium; assignee=mathieu...@gmail.com; >>> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch >>> Launchpad-Bug-Information-Type: Public >>> Launchpad-Bug-Private: no >>> Launchpad-Bug-Security-Vulnerability: no >>> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan janitor >>> jason-hobbs mpontillo vorlon >>> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs) >>> Launchpad-Bug-Modifier: Jason Hobbs (jason-hobbs) >>> Launchpad-Message-Rationale: Subscriber (MAAS) >>> Launchpad-Message-For: andreserl >>> >> >> >> -- >> Andres Rodriguez (RoAkSoAx) >> Ubuntu Server Developer >> MSc. Telecom & Networking >> Systems Engineer >> >> -- >> You received this bug notification because you are subscribed to the bug >> report. >> https://bugs.launchpad.net/bugs/1743249 >> >> Title: >> Failed Deployment after timeout trying to retrieve grub cfg >> >> Status in MAAS: >> New >> Status in grub2 package in Ubuntu: >> Fix Released >> >> Bug description: >> A node failed to deploy after it failed to retrieve a grub.cfg from >> MAAS due to a timeout. In the logs, it's clear that the server tried >> to retrieve the grub cfg many times, over about 30 seconds: >> >> http://paste.ubuntu.com/26387256/ >> >> We see the same thing for other hosts around the same time: >> >> http://paste.ubuntu.com/26387262/ >> >> It seems like MAAS is taking way too long to respond to these >> requests. >> >> This is very similar to bug 1724677, which was happening pre- >> metldown/spectre. The only difference is we don't see "[critical] TFTP >> back-end failed" in the logs anymore. >> >> I connected to the console on this system and it had errors about >> timing out retrieving the grub-cfg, then it had an error message along >> the lines of "error not an ip" and then "double free". After I >> connected but before I could get a screenshot the system rebooted and >> was directed by maas to power off, which it did successfully after >> booting to linux. >> >> Full logs are available here: >> https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa- >> ed277a020e7c/cpe_cloud_395/infra-logs.tar >> >> This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1. >> >> To manage notifications about this bug go to: >> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1743249 Title: Failed Deployment after timeout trying to retrieve grub cfg To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs