Hi again,

Angel de Vicente <angel.vicente.garr...@gmail.com> writes:
> I'm also trying to run julia with Condor, and so far the same issue as
> you. I'm not sure if this will be your case, but in my case the problem
> seems twofold:
>
> 1- I have julia installed in a directory that is not accessible to the
> workers (for the moment, for testing I just modified by hand the
> condor.jl so that I can get to those files, and this is not a problem
> anymore). 
>
> 2- workers try to connect to the master via telnet (at port 8553), which
> in my workstation is disabled. 

Well, looking at the code in condor.jl, it is actually a random port
from 8000 to 9000. In the end, I didn't like the idea of installing
telnet just to be able to run Julia+Condor, so I tried to use ssh
instead, but I'm not getting very far. The workers get submitted, and
they try to run julia --worker, but then I get this message about
julia_worker:9009 in the Condor error files that I'm not sure where it
comes from.

,----
| Pseudo-terminal will not be allocated because stdin is not a terminal.^M
| julia_worker:9009: Command not found.
| Master process (id 1) could not connect within 60.0 seconds.
| exiting.
`----

In any case, I'm not sure how robust the Julia-Condor connection is. It
seems (correct me if I'm wrong, as I haven't been able to use it yet)
that it is based in the idea that Condor is like other workload
managers, so I would request a number of workers and then use them for a
parallel computation, assuming that they are going to be there all the
time. But the beauty of Condor is mainly that it is an opportunistic
scheduler, so I have 10000 tasks and Condor will start executing them in
whatever resources are available, perhaps only 10 workers now and
perhaps 200 workers later, and if the workers get unavailable while in
the middle of the task, then the task is automatically rescheduled to
another worker where it will start all over again (unless checkpointing
is enabled). 

If somebody can shed some light on how to get Julia+HTCondor working
properly would be great, as we have a large Condor pool at work, and it
would be very interesting to try it.

Cheers,
-- 
Ángel de Vicente
http://www.iac.es/galeria/angelv/          

Reply via email to