Sounds good. In the course of this, we should probably extend the IOManager
that it keeps track of temp files and deletes them when a task is done.
On Thu, Feb 5, 2015 at 4:40 PM, Ufuk Celebi wrote:
> After talking to Robert and Till offline, what about the following:
>
> - We add a shutdown hoo
After talking to Robert and Till offline, what about the following:
- We add a shutdown hook to the blob library cache manager to shutdown the
blob service (just a delete call)
- As Robert pointed out, we cannot do this with the IOManager paths right
now, because they are essentially shared among
I think that process killing (HALT signal) is a very typical way in Linux
to shut down processes. It is the most robust way, since it does not
require to send any custom messages to the process.
This is sort of graceful, as the JVM gets the signal and may do a lot of
things before shutting down, s
Hmm this is not very gentleman-like to terminate the Job/TaskManagers. I'll
check how the ActorSystem behaves in case of killing the process.
Why can't we implement a more graceful termination mechanism? For example,
we could send a termination message to the JobManager and TaskManagers.
On Thu,
Thank you very much, Robert!
The problem is that the job/task manager shutdown methods are never called.
When using the scripts, the task/job manager processes get killed and
therefore shutdown methods are never called.
@Till: Do you know whether there is a mechanism in Akka to register the
actor
Hi Robert,
thanks for the info. If the TaskManager/JobManager does not shutdown
properly, i.e. killing of the process, then it is indeed the case that the
BlobManager cannot properly remove all stored files. I don't know if this
was lately the case for you. Furthermore, the files are not directly
I talked with the admins. The problem seemed to have been that the disk was
full and Flink couldn't create the directory.
Maybe the the error message should reflect if that is the cause.
While cleaning up the disk we noticed that a lot of temporary blobStore
files were not deleted by Flink after
On Thu, Feb 5, 2015 at 11:23 AM, Robert Waury
wrote:
> Hi,
>
> I can reproduce the error on my cluster.
>
> Unfortunately I can't check whether the parent directories were created on
> the different nodes since I have no way of accessing them. I start all the
> jobs from a gateway.
>
I've added
Hi,
I can reproduce the error on my cluster.
Unfortunately I can't check whether the parent directories were created on
the different nodes since I have no way of accessing them. I start all the
jobs from a gateway.
Cheers,
Robert
On Thu, Feb 5, 2015 at 11:01 AM, Ufuk Celebi wrote:
> Hey R
Hey Robert,
is this error reproducible?
I've looked into the blob store and the error occurs when the blob cache tries
to *create* a local file before requesting it from the job manager.
I will add a check to the blob store to ensure that the parent directories have
been created. Other than th
I compiled from the release-0.8 branch.
On Thu, Feb 5, 2015 at 8:55 AM, Stephan Ewen wrote:
> Hey Robert!
>
> On which version are you? 0.8 or 0.9- SNAPSHOT?
> Am 04.02.2015 14:49 schrieb "Robert Waury" :
>
> Hi,
>>
>> I'm suddenly getting FileNotFoundExceptions because the blobStore cannot
>> f
Hey Robert!
On which version are you? 0.8 or 0.9- SNAPSHOT?
Am 04.02.2015 14:49 schrieb "Robert Waury" :
> Hi,
>
> I'm suddenly getting FileNotFoundExceptions because the blobStore cannot
> find files in /tmp
>
> The job used work in the exact same setup (same versions, same cluster,
> same input
Hi,
I'm suddenly getting FileNotFoundExceptions because the blobStore cannot
find files in /tmp
The job used work in the exact same setup (same versions, same cluster,
same input files).
Flink version: 0.8 release
HDFS: 2.3.0-cdh5.1.2
Flink trace:
http://pastebin.com/SKdwp6Yt
Any idea what cou
13 matches
Mail list logo