Re: H9 build slave is bad

Andrew Wang Mon, 06 Mar 2017 13:58:39 -0800

Thanks Allen. I wrote this little job that does what we want:

https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-clean-h9/

The bad directory is some NN metadata dir, which could come from basically
any minicluster test.

I'll leave it there so it's ready for next time. If this keeps happening on
H9, then I'm going to ask infra to reimage it. FWIW I haven't seen this on
our internal unit test runs, so it points to an H9-specific issue.

On Mon, Mar 6, 2017 at 1:26 PM, Allen Wittenauer <a...@effectivemachines.com>
wrote:

>
> > On Mar 6, 2017, at 1:17 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> >
> > Do you have a link to your old job somewhere?
>
>         Nope, but it’s trivial to write.  single job that only runs on H9
> that removes that other job’s workspace dir.  You can also try using the
> “Wipe out current workspace” button.
>
> > I'm also wondering what causes this; does this issue surface in the same
> way each time?
>
>         It’s usually a job that writes non-exec dirs and got aborted in a
> weird way, so that the chmod doesn’t trigger.  Then git can’t delete on the
> next job.  If it is that, it’s fundamentally a bug in the Hadoop unit tests.
>
> > Also wondering, should we nuke the workspace before every run, for
> improved reliability?
>
>         It would mean a clone every time, which would put a considerable
> load on the ASF git servers on busy days.
>
>

Re: H9 build slave is bad

Reply via email to