Running into this again, filed
https://issues.apache.org/jira/browse/INFRA-15194
Noticed some of the HDFS jobs succeeding even the job itself failed and
says 'see HADOOP-13591' in the end. Though not my run wasn't that luck. :(
-Xiao
On Fri, Mar 10, 2017 at 10:19 AM, Sean Busbey wrote:
> All t
All the precommit builds should be doing the correct thing now for
making sure we don't render nodes useless. They don't flag the problem
yet and someone will still need to run the "cleanup" job on nodes
broken before jenkins runs pick up the new configuration changes.
Probably best if we move to
> On Mar 9, 2017, at 2:15 PM, Andrew Wang wrote:
>
> H9 is again eating our builds.
>
H0: https://builds.apache.org/job/PreCommit-HDFS-Build/18652/console
H6: https://builds.apache.org/job/PreCommit-HDFS-Build/18646/console
H9 is again eating our builds.
I'm going to do the easy hack of removing it from HDFS precommit for now,
pending HADOOP-13951 being resolved.
On Thu, Mar 9, 2017 at 6:21 AM, Sean Busbey wrote:
> On Wed, Mar 8, 2017 at 2:04 PM, Allen Wittenauer
> wrote:
> >
> >> On Mar 8, 2017, at 9:34 AM, Sean
On Wed, Mar 8, 2017 at 2:04 PM, Allen Wittenauer
wrote:
>
>> On Mar 8, 2017, at 9:34 AM, Sean Busbey wrote:
>>
>> Is this HADOOP-13951?
>
> Almost certainly. Here's the run that broke it again:
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18591
>
> Likely something in t
> On Mar 8, 2017, at 2:53 PM, Anu Engineer wrote:
>
> Agreed, but I was under the impression that we would kill the container under
> OOM conditions and not the whole base machine.
We do not run our docker containers under a cgroup.
--
Agreed, but I was under the impression that we would kill the container under
OOM conditions and not the whole base machine.
Thanks
Anu
On 3/8/17, 2:41 PM, "Allen Wittenauer" wrote:
>
>> On Mar 8, 2017, at 2:21 PM, Anu Engineer wrote:
>>
>> Hi Allen,
>>> Likely something in the HDFS-724
> On Mar 8, 2017, at 2:21 PM, Anu Engineer wrote:
>
> Hi Allen,
>> Likely something in the HDFS-7240 branch or with this patch that's
>> doing Bad Things (tm).
>
> Thanks for bringing this to my attention, But I am surprised that a mvn
> command is able to kill a test machine.
F
Hi Allen,
> Likely something in the HDFS-7240 branch or with this patch that's
> doing Bad Things (tm).
Thanks for bringing this to my attention, But I am surprised that a mvn command
is able to kill a test machine.
I have pasted the call stack from the issue that you pointed out to be th
> On Mar 8, 2017, at 12:04 PM, Allen Wittenauer
> wrote:
>
>
>> On Mar 8, 2017, at 9:34 AM, Sean Busbey wrote:
>>
>> Is this HADOOP-13951?
>
> Almost certainly. Here's the run that broke it again:
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18591
>
> Likely somethi
> On Mar 8, 2017, at 9:34 AM, Sean Busbey wrote:
>
> Is this HADOOP-13951?
Almost certainly. Here's the run that broke it again:
https://builds.apache.org/job/PreCommit-HDFS-Build/18591
Likely something in the HDFS-7240 branch or with this patch that's
doing Bad Things (tm).
Is this HADOOP-13951?
On Tue, Mar 7, 2017 at 8:32 PM, Andrew Wang wrote:
> A little ping that H9 hit the same error again, and I'm again going to
> clean it out. One more time and I'll ask infra about either removing or
> reimaging this node.
>
> On Mon, Mar 6, 2017 at 2:12 PM, Allen Wittenauer
A little ping that H9 hit the same error again, and I'm again going to
clean it out. One more time and I'll ask infra about either removing or
reimaging this node.
On Mon, Mar 6, 2017 at 2:12 PM, Allen Wittenauer
wrote:
>
> > On Mar 6, 2017, at 1:57 PM, Andrew Wang
> wrote:
> >
> > I'll leave i
> On Mar 6, 2017, at 1:57 PM, Andrew Wang wrote:
>
> I'll leave it there so it's ready for next time. If this keeps happening on
> H9, then I'm going to ask infra to reimage it. FWIW I haven't seen this on
> our internal unit test runs, so it points to an H9-specific issue.
I’ve seen
Thanks Allen. I wrote this little job that does what we want:
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-clean-h9/
The bad directory is some NN metadata dir, which could come from basically
any minicluster test.
I'll leave it there so it's ready for next time. If this keeps happen
I also found this older JIRA I filed for H9. Either this box is suspect, or
we have a disproportionate number of our builds running on it.
https://issues.apache.org/jira/browse/INFRA-13234
On Mon, Mar 6, 2017 at 1:17 PM, Andrew Wang
wrote:
> Do you have a link to your old job somewhere?
>
> I'm
> On Mar 6, 2017, at 1:17 PM, Andrew Wang wrote:
>
> Do you have a link to your old job somewhere?
Nope, but it’s trivial to write. single job that only runs on H9 that
removes that other job’s workspace dir. You can also try using the “Wipe out
current workspace” button.
> I'm als
Do you have a link to your old job somewhere?
I'm also wondering what causes this; does this issue surface in the same
way each time?
Also wondering, should we nuke the workspace before every run, for improved
reliability?
On Mon, Mar 6, 2017 at 1:08 PM, Allen Wittenauer
wrote:
>
> > On Mar 6,
> On Mar 6, 2017, at 11:27 AM, Andrew Wang wrote:
> Looks like H9 is having problems cleaning the workspace, leading to a lot
> of silent precommit failures. I filed this INFRA JIRA:
> https://issues.apache.org/jira/browse/INFRA-13618
Have we tried writing a job that nukes the workspace on that
Hi folks,
Looks like H9 is having problems cleaning the workspace, leading to a lot
of silent precommit failures. I filed this INFRA JIRA:
https://issues.apache.org/jira/browse/INFRA-13618
It's quite possible you'll have to retrigger pending precommit runs, the
HDFS runs are pretty red.
Best,
An
20 matches
Mail list logo