Hello everyone, I have succesfully added all nodes and I can init julia like this:
[root@hd0 ~]# julia -p 2 --machinefile Beowulf _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "help()" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.3.11 (2015-07-27 06:18 UTC) _/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release |__/ | x86_64-unknown-linux-gnu julia> nprocs() 22 julia> nworkers() 21 julia> Where Beowulf file is like this: hd1 hd2 hd3 hd4 hd5 hd6 hd7 hd8 hd9 hd10 hd11 hd12 hd13 hd14 hd15 hd16 hd17 hd18 hd19 If I change it to: 2 hd1 2 hd2 2 hd3 2 hd4 2 hd5 2 hd6 2 hd7 2 hd8 2 hd9 2 hd10 2 hd11 2 hd12 2 hd13 2 hd14 2 hd15 2 hd16 2 hd17 2 hd18 2 hd19 I get the same error I mentioned: [root@hd0 ~]# julia -p 2 --machinefile Beowulf2 ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ssh: connect to host 2 port 22: Invalid argument ^CERROR: interrupt in match at ./regex.jl:119 in parse_connection_info at multi.jl:1090 in read_worker_host_port at multi.jl:1037 in read_cb_response at multi.jl:1015 in start_cluster_workers at multi.jl:1027 in addprocs_internal at multi.jl:1234 in addprocs at multi.jl:1244 in process_options at ./client.jl:240 in _start at ./client.jl:354 [root@hd0 ~]# El viernes, 25 de septiembre de 2015, 16:42:59 (UTC-5), Ismael VC escribió: > > Hello everyone! > > I am trying to set up a Julia cluster with 20 nodes, this is the very > first time I've tried something like this. I have looked around for > examples, but documentation is not very helpful for me: > > *Julia can be started in parallel mode with either the -p or > the --machinefile options. -p n will launch an additional n worker > processes, while --machinefile file will launch a worker for each line in > file file. The machines defined in file must be accessible via a > passwordless ssh login, with Julia installed at the same location as the > current host. Each machine definition takes the > form [count*][user@]host[:port] [bind_addr[:port]] . user defaults to > current user, port to the standard ssh port. count is the number of workers > to spawn on the node, and defaults to 1. The > optional bind-to bind_addr[:port] specifies the ip-address and port that > other workers should use to connect to this worker.* > > This is what I think I have understood so far: > > Ok I list the machines on a machine file, that's easy, I have a file like > this: > > n user@555.555.555.555 > n user@555.555.555.556 > n user@555.555.555.555 > > > *The machines defined in file must be accessible via a > passwordless ssh login,* > > This is the part that is difficult for me the most, it says that machines > must be accesible via paswordless ssh > > * with Julia installed at the same location as the current host.* > > I understand this as I need to install Julia en every node in the same > location, so I have 20 nodes, same software and hardware stacks. Does this > means that the nodes must be of the same operating system? the same bits > (32/64) only? > > Right now I have *20 CentOS 6.7 (64 bits)* nodes with* julia-0.3.11* > installed from the *generic linux binaries (64bits)*, all of them > installed at */opt/julia-0.3.11/bin* (added to the PATH and already > exported in /etc/profile) > > Now the plan in my mind is to use my laptop *(windows 7 64 bits, > julia-0.3.11 64 bits)* as master node and control the cluster with that, > so according to what I understand, I'll need to do (leaving password blank): > > ssh-keygen -t rsa > > > From my Windows laptop (I plan to install Arch Linux soon), in order to > create my ssh key and then: > > > cat ~/.ssh/id_rsa.pub | ssh user@hostname 'cat >> .ssh/authorized_keys' > > > > To every node? So I have to be running the ssh server at every one of them? > (I understand I'll need it at the master node) This is where I simply don't > understand anymore, I haven't seen any tutorial, or article, or something > like that, just that paragraph in the manual, I know there is > ClusterManagers.jl but that sounds even more complicated for me right now. > > > I also want to help David Sanders to set up another cluster (once I got this > figured out) in his lab at Science Faculty, UNAM. I promise to enhance the > documentation around this topic once I understand this. > > > What do you guys think, do I have it all wrong? > > > If anyone can help me, I'll be very grateful, thank's in advance! > >