Ok, and what is `ps -e f` on the slave node? On the master there is usually no execd (unless some cores from the master machine should be used for processing too). - Reuti
Am 30.10.2014 um 17:47 schrieb Disny Disny: > hello Reuti ...one more thing i forget to mention in the previous mail > i check the 6445 port at the exexd hosts but nothing appears...they are not > in listening mode is that normal?? aren't they suppose to listen?? > > root@gcl2:~# netstat -nltp |grep 6445 > root@gcl2:~# netstat -nltp |grep 644 > root@gcl2:~# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > root@gcl2:~# > > many regards.. > > > On Thursday, October 30, 2014 7:42 PM, Disny Disny <disny.wo...@yahoo.com> > wrote: > > > Hello Reuti > this is the output of ps -e f > > master@sgemstr:~$ ps -e f > PID TTY STAT TIME COMMAND > 2 ? S 0:00 [kthreadd] > 3 ? S 0:00 \_ [ksoftirqd/0] > 4 ? S 0:00 \_ [kworker/0:0] > 5 ? S< 0:00 \_ [kworker/0:0H] > 7 ? S 0:00 \_ [migration/0] > 8 ? S 0:00 \_ [rcu_bh] > 9 ? S 0:00 \_ [rcuob/0] > 10 ? S 0:00 \_ [rcuob/1] > 11 ? S 0:00 \_ [rcuob/2] > 12 ? S 0:00 \_ [rcuob/3] > 13 ? S 0:00 \_ [rcuob/4] > 14 ? S 0:00 \_ [rcuob/5] > 15 ? S 0:00 \_ [rcuob/6] > 16 ? S 0:00 \_ [rcuob/7] > 17 ? S 0:00 \_ [rcu_sched] > 18 ? S 0:00 \_ [rcuos/0] > 19 ? S 0:00 \_ [rcuos/1] > 20 ? S 0:00 \_ [rcuos/2] > 21 ? S 0:00 \_ [rcuos/3] > 22 ? S 0:00 \_ [rcuos/4] > 23 ? S 0:00 \_ [rcuos/5] > 24 ? S 0:00 \_ [rcuos/6] > 25 ? S 0:00 \_ [rcuos/7] > 26 ? S 0:00 \_ [watchdog/0] > 27 ? S 0:00 \_ [watchdog/1] > 28 ? S 0:00 \_ [migration/1] > 29 ? S 0:00 \_ [ksoftirqd/1] > 30 ? S 0:00 \_ [kworker/1:0] > 31 ? S< 0:00 \_ [kworker/1:0H] > 32 ? S 0:00 \_ [watchdog/2] > 33 ? S 0:00 \_ [migration/2] > 34 ? S 0:00 \_ [ksoftirqd/2] > 35 ? S 0:00 \_ [kworker/2:0] > 36 ? S< 0:00 \_ [kworker/2:0H] > 37 ? S 0:00 \_ [watchdog/3] > 38 ? S 0:00 \_ [migration/3] > 39 ? S 0:00 \_ [ksoftirqd/3] > 40 ? S 0:00 \_ [kworker/3:0] > 41 ? S< 0:00 \_ [kworker/3:0H] > 42 ? S< 0:00 \_ [khelper] > 43 ? S 0:00 \_ [kdevtmpfs] > 44 ? S< 0:00 \_ [netns] > 45 ? S< 0:00 \_ [writeback] > 46 ? S< 0:00 \_ [kintegrityd] > 47 ? S< 0:00 \_ [bioset] > 49 ? S< 0:00 \_ [kblockd] > 50 ? S< 0:00 \_ [ata_sff] > 51 ? S 0:00 \_ [khubd] > 52 ? S< 0:00 \_ [md] > 53 ? S< 0:00 \_ [devfreq_wq] > 54 ? S 0:00 \_ [kworker/3:1] > 55 ? S 0:00 \_ [kworker/2:1] > 57 ? S 0:00 \_ [khungtaskd] > 58 ? S 0:00 \_ [kswapd0] > 59 ? SN 0:00 \_ [ksmd] > 60 ? SN 0:00 \_ [khugepaged] > 61 ? S 0:00 \_ [fsnotify_mark] > 62 ? S 0:00 \_ [ecryptfs-kthrea] > 63 ? S< 0:00 \_ [crypto] > 75 ? S< 0:00 \_ [kthrotld] > 79 ? S< 0:00 \_ [dm_bufio_cache] > 99 ? S< 0:00 \_ [deferwq] > 100 ? S< 0:00 \_ [charger_manager] > 101 ? S 0:00 \_ [kworker/0:1] > 273 ? S 0:00 \_ [scsi_eh_0] > 274 ? S 0:00 \_ [scsi_eh_1] > 275 ? S 0:00 \_ [scsi_eh_2] > 276 ? S 0:00 \_ [scsi_eh_3] > 277 ? S 0:00 \_ [scsi_eh_4] > 278 ? S 0:00 \_ [scsi_eh_5] > 281 ? S 0:00 \_ [kworker/u16:5] > 283 ? S 0:00 \_ [kworker/u16:7] > 310 ? S 0:00 \_ [jbd2/sda7-8] > 311 ? S< 0:00 \_ [ext4-rsv-conver] > 312 ? S< 0:00 \_ [ext4-unrsv-conv] > 599 ? S< 0:00 \_ [kmemstick] > 601 ? S 0:00 \_ [irq/45-mei_me] > 607 ? S< 0:00 \_ [kpsmoused] > 621 ? S< 0:00 \_ [rpciod] > 624 ? S 0:00 \_ [kworker/1:2] > 659 ? S< 0:00 \_ [ktpacpid] > 686 ? S< 0:00 \_ [cfg80211] > 695 ? S< 0:00 \_ [nfsiod] > 806 ? S< 0:00 \_ [kworker/u17:1] > 809 ? S< 0:00 \_ [hci0] > 810 ? S< 0:00 \_ [hci0] > 811 ? S< 0:00 \_ [kworker/u17:2] > 824 ? S< 0:00 \_ [hd-audio0] > 888 ? S 0:00 \_ [wl_event_handle] > 937 ? S< 0:00 \_ [ttm_swap] > 968 ? S< 0:00 \_ [krfcommd] > 1177 ? S< 0:00 \_ [nfsd4] > 1178 ? S< 0:00 \_ [nfsd4_callbacks] > 1179 ? S 0:00 \_ [lockd] > 1182 ? S 0:00 \_ [nfsd] > 1183 ? S 0:00 \_ [nfsd] > 1184 ? S 0:00 \_ [nfsd] > 1185 ? S 0:00 \_ [nfsd] > 1186 ? S 0:00 \_ [nfsd] > 1187 ? S 0:00 \_ [nfsd] > 1188 ? S 0:00 \_ [nfsd] > 1189 ? S 0:00 \_ [nfsd] > 1 ? Ss 0:00 /sbin/init > 386 ? S 0:00 upstart-udev-bridge --daemon > 388 ? Ss 0:00 /sbin/udevd --daemon > 547 ? S 0:00 \_ /sbin/udevd --daemon > 548 ? S 0:00 \_ /sbin/udevd --daemon > 632 ? Ss 0:00 /usr/sbin/sshd -D > 873 ? Sl 0:00 rsyslogd -c5 > 874 ? Ss 0:00 rpc.idmapd > 882 ? S 0:00 upstart-socket-bridge --daemon > 885 ? Ss 0:00 dbus-daemon --system --fork --activation=upstart > 920 ? Ss 0:00 /usr/sbin/bluetoothd > 922 ? Ss 0:00 rpcbind -w > 967 ? Ss 0:00 /usr/sbin/modem-manager > 983 ? S 0:00 avahi-daemon: running [sgemstr.local] > 985 ? S 0:00 \_ avahi-daemon: chroot helper > 1000 ? Ss 0:00 /usr/sbin/cupsd -F > 1004 ? Ss 0:00 rpc.statd -L > 1008 ? Ssl 0:00 NetworkManager > 2058 ? S 0:00 \_ /usr/sbin/dnsmasq --no-resolv > --keep-in-foregroun > 1016 ? Sl 0:00 /usr/lib/policykit-1/polkitd --no-debug > 1085 tty4 Ss+ 0:00 /sbin/getty -8 38400 tty4 > 1092 tty5 Ss+ 0:00 /sbin/getty -8 38400 tty5 > 1094 ? Ss 0:02 /sbin/wpa_supplicant -B -P > /run/sendsigs.omit.d/wpasu > 1113 tty2 Ss+ 0:00 /sbin/getty -8 38400 tty2 > 1114 tty3 Ss+ 0:00 /sbin/getty -8 38400 tty3 > 1116 tty6 Ss+ 0:00 /sbin/getty -8 38400 tty6 > 1122 ? Ss 0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket > 1127 ? Ss 0:00 cron > 1128 ? Ss 0:00 atd > 1134 ? Ssl 0:00 lightdm > 1172 tty7 Ssl+ 0:04 \_ /usr/bin/X :0 -auth /var/run/lightdm/root/:0 > -nol > 1599 ? Sl 0:00 \_ lightdm --session-child 12 19 > 1852 ? Ssl 0:00 \_ gnome-session --session=ubuntu > 1898 ? Ss 0:00 \_ /usr/bin/ssh-agent > /usr/bin/dbus-launch - > 1912 ? Sl 0:00 \_ > /usr/lib/gnome-settings-daemon/gnome-sett > 1936 ? S 0:00 | \_ syndaemon -i 2.0 -K -R -t > 1929 ? Sl 0:05 \_ compiz > 2037 ? Ss 0:00 | \_ /bin/sh -c > /usr/bin/compiz-decorator > 2038 ? Sl 0:00 | \_ /usr/bin/gtk-window-decorator > 1959 ? Sl 0:00 \_ nautilus -n > 1961 ? Sl 0:00 \_ bluetooth-applet > 1962 ? Sl 0:00 \_ > /usr/lib/gnome-settings-daemon/gnome-fall > 1963 ? Sl 0:00 \_ nm-applet > 1972 ? Sl 0:00 \_ > /usr/lib/policykit-1-gnome/polkit-gnome-a > 2233 ? Sl 0:00 \_ > /usr/lib/gnome-disk-utility/gdu-notificat > 2236 ? Sl 0:00 \_ telepathy-indicator > 2254 ? Sl 0:00 \_ zeitgeist-datahub > 2427 ? Sl 0:00 \_ update-notifier > 2482 ? Sl 0:00 \_ > /usr/lib/deja-dup/deja-dup/deja-dup-monit > 1135 ? Ss 0:00 /usr/sbin/irqbalance > 1175 ? Ssl 0:00 whoopsie > 1193 ? Ss 0:00 /usr/sbin/rpc.mountd --manage-gids > 1365 ? Sl 0:00 /opt/sge/bin/lx-amd64/sge_qmaster > 1405 ? Sl 0:00 /usr/lib/accountsservice/accounts-daemon > 1432 ? Sl 0:00 /usr/sbin/console-kit-daemon --no-daemon > 1555 ? Sl 0:00 /usr/lib/upower/upowerd > 1752 ? SNl 0:00 /usr/lib/rtkit/rtkit-daemon > 1768 ? Sl 0:00 /usr/lib/x86_64-linux-gnu/colord/colord > 1841 ? Sl 0:00 /usr/bin/gnome-keyring-daemon --daemonize --login > 1901 ? S 0:00 /usr/bin/dbus-launch --exit-with-session > gnome-sessio > 1902 ? Ss 0:00 //bin/dbus-daemon --fork --print-pid 5 > --print-addres > 1920 ? S 0:00 /usr/lib/gvfs/gvfsd > 1922 ? Sl 0:00 /usr/lib/gvfs//gvfs-fuse-daemon -f > /home/master/.gvfs > 1941 ? S<l 0:00 /usr/bin/pulseaudio --start --log-target=syslog > 1946 ? S 0:00 \_ /usr/lib/pulseaudio/pulse/gconf-helper > 1943 ? S 0:00 /usr/lib/x86_64-linux-gnu/gconf/gconfd-2 > 1948 ? S 0:00 /usr/lib/gvfs/gvfsd-metadata > 1971 ? S 0:00 /usr/lib/gvfs/gvfs-gdu-volume-monitor > 1979 ? Sl 0:00 /usr/lib/udisks/udisks-daemon > 1980 ? S 0:00 \_ udisks-daemon: not polling any devices > 1984 ? Sl 0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor > 1987 ? S 0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor > 2003 ? Sl 0:00 /usr/lib/notify-osd/notify-osd > 2007 ? S 0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.5 > /org/gtk/gvf > 2011 ? Sl 0:00 /usr/bin/gnome-screensaver --no-daemon > 2016 ? S 0:00 /usr/lib/gvfs/gvfsd-burn --spawner :1.5 > /org/gtk/gvfs > 2019 ? Sl 0:00 /usr/lib/bamf/bamfdaemon > 2043 ? Sl 0:00 /usr/lib/unity/unity-panel-service > 2045 ? Sl 0:00 /usr/lib/indicator-appmenu/hud-service > 2064 ? Sl 0:00 > /usr/lib/indicator-session/indicator-session-service > 2066 ? Sl 0:00 > /usr/lib/indicator-datetime/indicator-datetime-servic > 2068 ? Sl 0:00 > /usr/lib/indicator-messages/indicator-messages-servic > 2070 ? Sl 0:00 /usr/lib/indicator-sound/indicator-sound-service > 2079 ? Sl 0:00 > /usr/lib/indicator-printers/indicator-printers-servic > 2080 ? Sl 0:00 > /usr/lib/indicator-application/indicator-application- > 2109 ? S 0:00 /usr/lib/geoclue/geoclue-master > 2112 ? Sl 0:00 /usr/lib/ubuntu-geoip/ubuntu-geoip-provider > 2190 ? S 0:00 /opt/sge/bin/lx-amd64/sge_shadowd > 2226 tty1 Ss+ 0:00 /sbin/getty -8 38400 tty1 > 2243 ? Sl 0:00 /usr/lib/telepathy/mission-control-5 > 2248 ? Sl 0:00 /usr/lib/gnome-online-accounts/goa-daemon > 2262 ? Sl 0:00 /usr/bin/zeitgeist-daemon > 2268 ? Sl 0:00 /usr/lib/zeitgeist/zeitgeist-fts > 2276 ? S 0:00 \_ /bin/cat > 2287 ? Sl 0:00 > /usr/lib/unity-lens-applications/unity-applications-d > 2290 ? Sl 0:00 /usr/bin/python > /usr/lib/unity-lens-video/unity-lens- > 2292 ? Sl 0:00 /usr/lib/unity-lens-files/unity-files-daemon > 2293 ? Sl 0:00 /usr/lib/unity-lens-music/unity-music-daemon > 2322 ? Sl 0:01 gnome-terminal > 2327 ? S 0:00 \_ gnome-pty-helper > 2329 pts/0 Ss 0:00 \_ bash > 2616 pts/0 R+ 0:00 \_ ps -e f > 2385 ? Sl 0:00 /usr/bin/python > /usr/lib/unity-scope-video-remote/uni > 2387 ? Sl 0:00 /usr/lib/unity-lens-music/unity-musicstore-daemon > 2438 ? S 0:00 /usr/bin/python > /usr/lib/system-service/system-servic > 2442 ? SNl 0:04 /usr/bin/python /usr/bin/update-manager > --no-focus-on > 2448 ? Sl 0:00 /usr/lib/dconf/dconf-service > 2463 ? SN 0:01 /usr/bin/python /usr/sbin/aptd > > > and about the port if it is open is listening state meaning there is no > problem with the port?? > root@sgemstr:~# netstat -nltp |grep 644 > tcp 0 0 0.0.0.0:6444 0.0.0.0:* LISTEN > 1365/sge_qmaster > > finally about the firewall i never change anything or give a role and this is > the output : > root@sgemstr:~# iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > is't that mean i have an empty iptables?? > > many regards.. > > > > On Wednesday, October 29, 2014 9:01 PM, Reuti <re...@staff.uni-marburg.de> > wrote: > > > Please keep the list posted. > > Am 29.10.2014 um 18:47 schrieb Disny Disny: > > > Hello Reuti > > this is the output of qhost and qstat -f but i don't know what it means so > > i'm hoping you can help > > > > kind regards.. > > > > root@sgemstr:~# qhost > > HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT > > MEMUSE SWAPTO SWAPUS > > ---------------------------------------------------------------------------------------------- > > global - - - - - - - - > > - - > > gcl1 lx-amd64 4 1 4 4 - 3.8G > > - 6.7G - > > gcl2 lx-amd64 4 1 4 4 - 3.7G > > - 3.8G - > > gcl3 lx-amd64 4 1 4 4 - 1.9G > > - 6.7G - > > shdwgcl4 lx-amd64 4 1 4 4 - 3.8G > > - 3.8G - > > root@sgemstr:~# qstat -f > > queuename qtype resv/used/tot. np_load arch > > states > > --------------------------------------------------------------------------------- > > all.q@gcl1 BIP 0/0/4 -NA- lx-amd64 au > > --------------------------------------------------------------------------------- > > all.q@gcl2 BIP 0/0/4 -NA- lx-amd64 au > > --------------------------------------------------------------------------------- > > all.q@gcl3 BIP 0/0/4 -NA- lx-amd64 au > > --------------------------------------------------------------------------------- > > all.q@shdwgcl4 BIP 0/0/4 -NA- lx-amd64 au > > This looks like there is no communication between the qmaster and the execds. > Checking the output of: > > $ ps -e f > > shows the `sgemaster` resp. `sgexecd` running on the systems? Do you have a > firewall in place? Maybe the port 6444 and 6445 needs to be opened. > > > -- Reuti > > > > > > ############################################################################ > > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS > > ############################################################################ > > 4 0.00000 Sleeper root qw 10/23/2014 09:20:09 1 > > root@sgemstr:~# > > > > > > On Thursday, October 23, 2014 6:38 PM, Reuti <re...@staff.uni-marburg.de> > > wrote: > > > > > > Please check in `qhost` resp. `qstat -f` the state of the machines, i.e. > > whether the execd can be reached by returning a suitable value for the > > machines. - Reuti > > > > Am 23.10.2014 um 17:35 schrieb Disny Disny: > > > > > Yes during the exec installation it added a startup script but is there > > > other startup i need to add to it manually?? > > > > > > > > > From: Reuti <re...@staff.uni-marburg.de>; > > > To: Disny Disny <disny.wo...@yahoo.com>; > > > Cc: grid Engine Mailing List <users@gridengine.org>; > > > Subject: Re: Queue instances dropped > > > Sent: Thu, Oct 23, 2014 3:29:58 PM > > > > > > Am 23.10.2014 um 17:23 schrieb Disny Disny: > > > > > > > > > > I have a problem with Sge ..after installing the cluster everything > > > > wotked fine but when i shut down the pcs and in other time i start them > > > > and try to submit ajob i got this message : > > > > queue instance "all.q@gcl2" droped because It is temprerly not available > > > > > > > > queue instance "all.q@gcl3" droped because It is temprerly not available > > > > > > > > queue instance "all.q@shdwgcl4" droped because It is temprerly not > > > > available > > > > > > > > queue instance "all.q@gcl1" droped because It is temprerly not available > > > > all queues are dropped because of overload or full. > > > > I appreaciate any help. > > > > > > > > > Are the execd's running on the ndoes - maybe they need to be added to > > > your startup mechanism to do it automatically in case you shutdown the > > > machines? > > > > > > -- Reuti > > > > > > > > > > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users