The compile stage was always passing.

The timeout makes no difference, it only affects how long we wait for the download to complete. We already had significantly more data in the cache a while ago (like twice as much), so I skeptical that the amount of cached data is the problem.

On 18/06/2019 12:47, jincheng sun wrote:
Test result:
  - The test for only compile state are succeeding (I deleted some old
caches) cache size 1146.26M. See here
https://travis-ci.org/sunjincheng121/flink/caches
- timeout to 1200 test fail, get the same error, but I think maybe the
storage problem, so I delete more old cache and restart the CI. See here
https://travis-ci.org/apache/flink/builds/547136163

So now it feels like the storage size of the cache is limited. If so we can
add some cleanup logic for the old cache (I am not sure,some validation is
needed)

Best
Jincheng

jincheng sun <sunjincheng...@gmail.com> 于2019年6月18日周二 下午6:00写道:

I agree with the explanation from @Chesnay Schepler <ches...@apache.org>.  this
should be a problem with the Travis infrastructure because recently we have
not big changed the logic of Travis inside Flink.
At present, most of the failures are after the compile is completed. The
cache size is only 7.7M, which means that the JARs are not successfully
uploaded.

So here is a question:
  - Where can we check the cache storage to see if there is a problem with
the storage?

In order to try to find out some reason for the CI issue,  I do the
follows test:

  - I delete other test phases locally and test them - Test whether the
cache is uploaded normally during the compilation phase. See here
https://travis-ci.org/sunjincheng121/flink/builds/547155029
  - Increase Travis cache timeout to 1200 - Test the cache cannot be
downloaded due to cache is a timeout. (I think this test will have the same
result ) See here https://travis-ci.org/apache/flink/builds/547136163

Will feedback here after testing.

Best,
Jincheng

Chesnay Schepler <ches...@apache.org> 于2019年6月18日周二 下午3:53写道:

The problem is not that bad stuff is in the cache (which is the only
thing a cache cleaning solves), it is that the test stages don't
download the correct one.

Our compile stage uploads stuff in to the cache, and the subsequent test
builds downloads it again.

Whether the upload from the compile phase is visible to the test phase
is basically a timing thing; it depends on the visibility guarantee that
the backing infrastructure provides. So far it _usually_ worked, but
these are naturally things that may change over time.

On 18/06/2019 09:20, Jeff Zhang wrote:
If it is travis caching issue, we can file apache infra ticket and ask
them
to clean the cache.



Chesnay Schepler <ches...@apache.org> 于2019年6月18日周二 下午3:18写道:

This is (hopefully a short-lived) hiccup on the Travis caching
infrastructure.

There's nothing we can do to _fix_ it; if it persists we'll have to
rework our travis setup again to not rely on caching.

On 18/06/2019 08:34, Kurt Young wrote:
Hi dev,

I noticed that all the travis tests triggered by pull request are
failed
with the same error:

"Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
Exiting build."

Anyone have a clue on what happened and how to fix this?

Best,
Kurt



Reply via email to