On Tue, Feb 6, 2018 at 4:50 PM, Andres Rodriguez
<andres...@ubuntu-pe.org> wrote:
> I don't have logs anymore as I have since rebuilt my environment, but I can
> confirm seeing improvements on a maas server running with high IO (note it
> was a single region/rack).
>
> see inlien:
>
>
> On Tue, Feb 6, 2018 at 5:17 PM, Jason Hobbs <jason.ho...@canonical.com>
> wrote:
>
>> Andres, it was a single test in both cases, and in both cases there was
>> almost no delay from MAAS.  It's not significant enough to call it
>> positive results.
>>
>>
> Comment #93 shows there are /some/ improvements when comparing those two
> samples only, but as I have already said, we need data over time to in both
> scenarios to properly compare and determine whether the changes do make any
> material performance improvements with the current conditions of the
> samples (both samples are with a fixed io starvation on the environment).
>
>
>> Since neither of you answered yes, I'll assume the answer was no to my
>> question of whether there was anything in my logs or data that showed
>> reading the template from disk on the rack controller was the culprit,
>> and that this fix just represents a guess at what might be causing the
>> delay.
>>
>
> To be fair, your logs do not provide anything concrete to determine what's
> the culprit of the issue on the MAAS side. It provides a lot of clues, and
> we have since then determine that those issues were a result of IO
> starvation (from the VM's writing to disk). As such, the only way we can
> *really* see if the patch brings any significant performance improvements
> is to run tests in the environment were you were seeing the issues in the
> first place.

I didn't think my logs provided anything concrete!  That's because the
logging built into MAAS is not sufficient enough to do so.

I can't break that environment to test anymore - we got it working
thanks to you guy's help and it's a production environment that needs
to keep running other tests.

It might possible to recreate this on another maas server, using
'stress' or a similar tool to cause disk contention.

Jason

> As such, if you are willing to test if these make any material difference,
> I would unfix your environment and do two runs (one without the fix, and
> one with the fix). That's the only way we can really compare and be certain
> in *your* environment.
>
>>
>> --
>> You received this bug notification because you are subscribed to MAAS.
>> https://bugs.launchpad.net/bugs/1743249
>>
>> Title:
>>   Failed Deployment after timeout trying to retrieve grub cfg
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
>>
>> Launchpad-Notification-Type: bug
>> Launchpad-Bug: product=maas; milestone=2.4.x; status=New;
>> importance=Undecided; assignee=None;
>> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
>> status=Fix Released; importance=Medium; assignee=mathieu...@gmail.com;
>> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch
>> Launchpad-Bug-Information-Type: Public
>> Launchpad-Bug-Private: no
>> Launchpad-Bug-Security-Vulnerability: no
>> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan janitor
>> jason-hobbs mpontillo vorlon
>> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs)
>> Launchpad-Bug-Modifier: Jason Hobbs (jason-hobbs)
>> Launchpad-Message-Rationale: Subscriber (MAAS)
>> Launchpad-Message-For: andreserl
>>
>
>
> --
> Andres Rodriguez (RoAkSoAx)
> Ubuntu Server Developer
> MSc. Telecom & Networking
> Systems Engineer
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
>   Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
>   New
> Status in grub2 package in Ubuntu:
>   Fix Released
>
> Bug description:
>   A node failed to deploy after it failed to retrieve a grub.cfg from
>   MAAS due to a timeout.  In the logs, it's clear that the server tried
>   to retrieve the grub cfg many times, over about 30 seconds:
>
>   http://paste.ubuntu.com/26387256/
>
>   We see the same thing for other hosts around the same time:
>
>   http://paste.ubuntu.com/26387262/
>
>   It seems like MAAS is taking way too long to respond to these
>   requests.
>
>   This is very similar to bug 1724677, which was happening pre-
>   metldown/spectre. The only difference is we don't see "[critical] TFTP
>   back-end failed" in the logs anymore.
>
>   I connected to the console on this system and it had errors about
>   timing out retrieving the grub-cfg, then it had an error message along
>   the lines of "error not an ip" and then "double free".  After I
>   connected but before I could get a screenshot the system rebooted and
>   was directed by maas to power off, which it did successfully after
>   booting to linux.
>
>   Full logs are available here:
>   https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
>   ed277a020e7c/cpe_cloud_395/infra-logs.tar
>
>   This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1743249

Title:
  Failed Deployment after timeout trying to retrieve grub cfg

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to