Hello everyone,

I have succesfully added all nodes and I can init julia like this:

[root@hd0 ~]# julia -p 2 --machinefile Beowulf
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.11 (2015-07-27 06:18 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-unknown-linux-gnu


julia> nprocs()
22


julia> nworkers()
21


julia> 


Where Beowulf file is like this:

hd1
hd2
hd3
hd4
hd5
hd6
hd7
hd8
hd9
hd10
hd11
hd12
hd13
hd14
hd15
hd16
hd17
hd18
hd19

If I change it to:

2 hd1
2 hd2
2 hd3
2 hd4
2 hd5
2 hd6
2 hd7
2 hd8
2 hd9
2 hd10
2 hd11
2 hd12
2 hd13
2 hd14
2 hd15
2 hd16
2 hd17
2 hd18
2 hd19



I get the same error I mentioned:

[root@hd0 ~]# julia -p 2 --machinefile Beowulf2
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
ssh: connect to host 2 port 22: Invalid argument
^CERROR: interrupt
 in match at ./regex.jl:119
 in parse_connection_info at multi.jl:1090
 in read_worker_host_port at multi.jl:1037
 in read_cb_response at multi.jl:1015
 in start_cluster_workers at multi.jl:1027
 in addprocs_internal at multi.jl:1234
 in addprocs at multi.jl:1244
 in process_options at ./client.jl:240
 in _start at ./client.jl:354
[root@hd0 ~]#



El viernes, 25 de septiembre de 2015, 16:42:59 (UTC-5), Ismael VC escribió:
>
> Hello everyone!
>
> I am trying to set up a Julia cluster with 20 nodes, this is the very 
> first time I've tried something like this. I have looked around for 
> examples, but documentation is not very helpful for me:
>
> *Julia can be started in parallel mode with either the -p or 
> the --machinefile options. -p n will launch an additional n worker 
> processes, while --machinefile file will launch a worker for each line in 
> file file. The machines defined in file must be accessible via a 
> passwordless ssh login, with Julia installed at the same location as the 
> current host. Each machine definition takes the 
> form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to 
> current user, port to the standard ssh port. count is the number of workers 
> to spawn on the node, and defaults to 1. The 
> optional bind-to bind_addr[:port] specifies the ip-address and port that 
> other workers should use to connect to this worker.*
>
> This is what I think I have understood so far:
>
> Ok I list the machines on a machine file, that's easy, I have a file like 
> this:
>
> n user@555.555.555.555
> n user@555.555.555.556
> n user@555.555.555.555
>
>
> *The machines defined in file must be accessible via a 
> passwordless ssh login,*
>
> This is the part that is difficult for me the most, it says that machines 
> must be accesible via paswordless ssh
>
> * with Julia installed at the same location as the current host.*
>
> I understand this as I need to install Julia en every node in the same 
> location, so I have 20 nodes, same software and hardware stacks. Does this 
> means that the nodes must be of the same operating system? the same bits 
> (32/64) only?
>
> Right now I have *20 CentOS 6.7 (64 bits)* nodes with* julia-0.3.11* 
> installed from the *generic linux binaries (64bits)*, all of them 
> installed at */opt/julia-0.3.11/bin* (added to the PATH and already 
> exported in /etc/profile)
>
> Now the plan in my mind is to use my laptop *(windows 7 64 bits, 
> julia-0.3.11 64 bits)* as master node and control the cluster with that, 
> so according to what I understand, I'll need to do (leaving password blank):
>
> ssh-keygen -t rsa
>
>
> From my Windows laptop (I plan to install Arch Linux soon), in order to 
> create my ssh key and then:
>
>
> cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys'
>
>
>
> To every node? So I have to be running the ssh server at every one of them? 
> (I understand I'll need it at the master node) This is where I simply don't 
> understand anymore, I haven't seen any tutorial, or article, or something 
> like that, just that paragraph in the manual, I know there is 
> ClusterManagers.jl but that sounds even more complicated for me right now.
>
>
> I also want to help David Sanders to set up another cluster (once I got this 
> figured out) in his lab at Science Faculty, UNAM. I promise to enhance the 
> documentation around this topic once I understand this.
>
>
> What do you guys think, do I have it all wrong?
>
>
> If anyone can help me, I'll be very grateful, thank's in advance!
>
>

Reply via email to