LAVA scheduler spec

Paul Larson Fri, 04 Feb 2011 13:55:09 -0800

Hi Mirsad, I'm looking at the recent edits to
https://wiki.linaro.org/Platform/Validation/Specs/ValidationScheduler and
wanted to start a thread to discuss.  Would love to hear thoughts from
others as well.


We could probably use some more in the way of implementation details, but
this is starting to take shape pretty well, good work.  I have a few
comments below:

> Admin users can also cancel any scheduled jobs.
Job submitters should be allowed to cancel their own jobs too, right?

I think in general, the user stories need tweaking.  Many of them center
around automatic scheduling of jobs based on some event (adding a machine,
adding a test, etc).  Based on the updated design, this kind of logic would
be in the piece we were referring to as the driver.  The scheduler shouldn't
be making those decisions on its own, but it should provide an interface for
both humans to schedule jobs (web, cli) as well as and api for machines
(driver) to do this.

> should we avoid scheduling image tests twice because a hwpack is coming in
after images or vv.
Is this a question?  Again, I don't think that's the scheduler's call.  The
scheduler isn't deciding what tests to run, and what to run them on.  In
this case, assuming we have the resources to pull it off, running the new
image with the old, and the new hwpack would be good to do.

> Test job definition
Is this different from the job definition used by the dipatcher?  Please
tell me if I'm missing something here, but I think to schedule something,
you only really need two blobs of information:
1a. specific host to run on
   -OR-
1b. (any/every system matching given criteria)
    This one is tricky, and though it sounds really useful, my personally
feeling is that it is of questionable value.  In theory, it lets you make
more efficient use of your hardware when you have multiple identical
machines.  In practice, what I've seen on similar systems is that humans
typically know exactly which machine they want to run something on.  Where
it might really come in to play is later when we have a driver automatically
scheduling jobs for us.
2. job file - this is the piece that the job dispatcher consumes.  It could
be handwritten, machine generated, or created based on a web form where the
user selects what they want.

> Test job status
One distinction I want to make here is job status vs. test result.  A failed
test can certainly have a "complete" job status.
Incomplete, as a job status, just means that the dispatcher was unable to
finsish all the steps in the job.  For instance, a better example would be
if we had a test that required an image to be deployed, booted, and a test
run on it.  If we tried to deploy the image and hit a kernel panic on
reboot, that is an incomplete job because it never made it far enough to run
the specified test.

> Link to test results in launch-control
If we tie this closely enough with launch-control, it seems we could just
communicate the job id to the dispatcher so that it gets rolled up with the
bundle.  That way the dashboard would have a backlink to the job, and could
create the link to the bundle once it is deserialized.  Just a different
option if it's easier.  I don't see an obvious advantage to either approach.

Thanks,
Paul Larson

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

LAVA scheduler spec

Reply via email to