Ok, and what is `ps -e f` on the slave node? On the master there is usually no 
execd (unless some cores from the master machine should be used for processing 
too). - Reuti

Am 30.10.2014 um 17:47 schrieb Disny Disny:

> hello Reuti ...one more thing i forget to mention in the previous mail 
> i check the 6445 port at the exexd hosts but nothing appears...they are not 
> in listening mode is that normal?? aren't they suppose to listen??
> 
> root@gcl2:~# netstat -nltp |grep 6445
> root@gcl2:~# netstat -nltp |grep 644
> root@gcl2:~# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination        
>  
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination        
>  
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination        
> root@gcl2:~#
> 
> many regards.. 
> 
> 
> On Thursday, October 30, 2014 7:42 PM, Disny Disny <disny.wo...@yahoo.com> 
> wrote:
> 
> 
> Hello Reuti 
> this is the output of ps -e f
> 
> master@sgemstr:~$ ps -e f
>   PID TTY      STAT   TIME COMMAND
>     2 ?        S      0:00 [kthreadd]
>     3 ?        S      0:00  \_ [ksoftirqd/0]
>     4 ?        S      0:00  \_ [kworker/0:0]
>     5 ?        S<     0:00  \_ [kworker/0:0H]
>     7 ?        S      0:00  \_ [migration/0]
>     8 ?        S      0:00  \_ [rcu_bh]
>     9 ?        S      0:00  \_ [rcuob/0]
>    10 ?        S      0:00  \_ [rcuob/1]
>    11 ?        S      0:00  \_ [rcuob/2]
>    12 ?        S      0:00  \_ [rcuob/3]
>    13 ?        S      0:00  \_ [rcuob/4]
>    14 ?        S      0:00  \_ [rcuob/5]
>    15 ?        S      0:00  \_ [rcuob/6]
>    16 ?        S      0:00  \_ [rcuob/7]
>    17 ?        S      0:00  \_ [rcu_sched]
>    18 ?        S      0:00  \_ [rcuos/0]
>    19 ?        S      0:00  \_ [rcuos/1]
>    20 ?        S      0:00  \_ [rcuos/2]
>    21 ?        S      0:00  \_ [rcuos/3]
>    22 ?        S      0:00  \_ [rcuos/4]
>    23 ?        S      0:00  \_ [rcuos/5]
>    24 ?        S      0:00  \_ [rcuos/6]
>    25 ?        S      0:00  \_ [rcuos/7]
>    26 ?        S      0:00  \_ [watchdog/0]
>    27 ?        S      0:00  \_ [watchdog/1]
>    28 ?        S      0:00  \_ [migration/1]
>    29 ?        S      0:00  \_ [ksoftirqd/1]
>    30 ?        S      0:00  \_ [kworker/1:0]
>    31 ?        S<     0:00  \_ [kworker/1:0H]
>    32 ?        S      0:00  \_ [watchdog/2]
>    33 ?        S      0:00  \_ [migration/2]
>    34 ?        S      0:00  \_ [ksoftirqd/2]
>    35 ?        S      0:00  \_ [kworker/2:0]
>    36 ?        S<     0:00  \_ [kworker/2:0H]
>    37 ?        S      0:00  \_ [watchdog/3]
>    38 ?        S      0:00  \_ [migration/3]
>    39 ?        S      0:00  \_ [ksoftirqd/3]
>    40 ?        S      0:00  \_ [kworker/3:0]
>    41 ?        S<     0:00  \_ [kworker/3:0H]
>    42 ?        S<     0:00  \_ [khelper]
>    43 ?        S      0:00  \_ [kdevtmpfs]
>    44 ?        S<     0:00  \_ [netns]
>    45 ?        S<     0:00  \_ [writeback]
>    46 ?        S<     0:00  \_ [kintegrityd]
>    47 ?        S<     0:00  \_ [bioset]
>    49 ?        S<     0:00  \_ [kblockd]
>    50 ?        S<     0:00  \_ [ata_sff]
>    51 ?        S      0:00  \_ [khubd]
>    52 ?        S<     0:00  \_ [md]
>    53 ?        S<     0:00  \_ [devfreq_wq]
>    54 ?        S      0:00  \_ [kworker/3:1]
>    55 ?        S      0:00  \_ [kworker/2:1]
>    57 ?        S      0:00  \_ [khungtaskd]
>    58 ?        S      0:00  \_ [kswapd0]
>    59 ?        SN     0:00  \_ [ksmd]
>    60 ?        SN     0:00  \_ [khugepaged]
>    61 ?        S      0:00  \_ [fsnotify_mark]
>    62 ?        S      0:00  \_ [ecryptfs-kthrea]
>    63 ?        S<     0:00  \_ [crypto]
>    75 ?        S<     0:00  \_ [kthrotld]
>    79 ?        S<     0:00  \_ [dm_bufio_cache]
>    99 ?        S<     0:00  \_ [deferwq]
>   100 ?        S<     0:00  \_ [charger_manager]
>   101 ?        S      0:00  \_ [kworker/0:1]
>   273 ?        S      0:00  \_ [scsi_eh_0]
>   274 ?        S      0:00  \_ [scsi_eh_1]
>   275 ?        S      0:00  \_ [scsi_eh_2]
>   276 ?        S      0:00  \_ [scsi_eh_3]
>   277 ?        S      0:00  \_ [scsi_eh_4]
>   278 ?        S      0:00  \_ [scsi_eh_5]
>   281 ?        S      0:00  \_ [kworker/u16:5]
>   283 ?        S      0:00  \_ [kworker/u16:7]
>   310 ?        S      0:00  \_ [jbd2/sda7-8]
>   311 ?        S<     0:00  \_ [ext4-rsv-conver]
>   312 ?        S<     0:00  \_ [ext4-unrsv-conv]
>   599 ?        S<     0:00  \_ [kmemstick]
> 601 ?        S      0:00  \_ [irq/45-mei_me]
>   607 ?        S<     0:00  \_ [kpsmoused]
>   621 ?        S<     0:00  \_ [rpciod]
>   624 ?        S      0:00  \_ [kworker/1:2]
>   659 ?        S<     0:00  \_ [ktpacpid]
>   686 ?        S<     0:00  \_ [cfg80211]
>   695 ?        S<     0:00  \_ [nfsiod]
>   806 ?        S<     0:00  \_ [kworker/u17:1]
>   809 ?        S<     0:00  \_ [hci0]
>   810 ?        S<     0:00  \_ [hci0]
>   811 ?        S<     0:00  \_ [kworker/u17:2]
>   824 ?        S<     0:00  \_ [hd-audio0]
>   888 ?        S      0:00  \_ [wl_event_handle]
>   937 ?        S<     0:00  \_ [ttm_swap]
>   968 ?        S<     0:00  \_ [krfcommd]
>  1177 ?        S<     0:00  \_ [nfsd4]
>  1178 ?        S<     0:00  \_ [nfsd4_callbacks]
>  1179 ?        S      0:00  \_ [lockd]
>  1182 ?        S      0:00  \_ [nfsd]
>  1183 ?        S      0:00  \_ [nfsd]
>  1184 ?        S      0:00  \_ [nfsd]
>  1185 ?        S      0:00  \_ [nfsd]
>  1186 ?        S      0:00  \_ [nfsd]
>  1187 ?        S      0:00  \_ [nfsd]
>  1188 ?        S      0:00  \_ [nfsd]
>  1189 ?        S      0:00  \_ [nfsd]
>     1 ?        Ss     0:00 /sbin/init
>   386 ?        S      0:00 upstart-udev-bridge --daemon
>   388 ?        Ss     0:00 /sbin/udevd --daemon
>   547 ?        S      0:00  \_ /sbin/udevd --daemon
>   548 ?        S      0:00  \_ /sbin/udevd --daemon
>   632 ?        Ss     0:00 /usr/sbin/sshd -D
>   873 ?        Sl     0:00 rsyslogd -c5
>   874 ?        Ss     0:00 rpc.idmapd
>   882 ?        S      0:00 upstart-socket-bridge --daemon
>   885 ?        Ss     0:00 dbus-daemon --system --fork --activation=upstart
>   920 ?        Ss     0:00 /usr/sbin/bluetoothd
>   922 ?        Ss     0:00 rpcbind -w
>   967 ?        Ss     0:00 /usr/sbin/modem-manager
>   983 ?        S      0:00 avahi-daemon: running [sgemstr.local]
>   985 ?        S      0:00  \_ avahi-daemon: chroot helper
>  1000 ?        Ss     0:00 /usr/sbin/cupsd -F
>  1004 ?        Ss     0:00 rpc.statd -L
>  1008 ?        Ssl    0:00 NetworkManager
>  2058 ?        S      0:00  \_ /usr/sbin/dnsmasq --no-resolv 
> --keep-in-foregroun
>  1016 ?        Sl     0:00 /usr/lib/policykit-1/polkitd --no-debug
>  1085 tty4     Ss+    0:00 /sbin/getty -8 38400 tty4
>  1092 tty5     Ss+    0:00 /sbin/getty -8 38400 tty5
>  1094 ?        Ss     0:02 /sbin/wpa_supplicant -B -P 
> /run/sendsigs.omit.d/wpasu
>  1113 tty2     Ss+    0:00 /sbin/getty -8 38400 tty2
>  1114 tty3     Ss+    0:00 /sbin/getty -8 38400 tty3
>  1116 tty6     Ss+    0:00 /sbin/getty -8 38400 tty6
>  1122 ?        Ss     0:00 acpid -c /etc/acpi/events -s /var/run/acpid.socket
>  1127 ?        Ss     0:00 cron
>  1128 ?        Ss     0:00 atd
>  1134 ?        Ssl    0:00 lightdm
>  1172 tty7     Ssl+   0:04  \_ /usr/bin/X :0 -auth /var/run/lightdm/root/:0 
> -nol
>  1599 ?        Sl     0:00  \_ lightdm --session-child 12 19
>  1852 ?        Ssl    0:00      \_ gnome-session --session=ubuntu
>  1898 ?        Ss     0:00          \_ /usr/bin/ssh-agent 
> /usr/bin/dbus-launch -
>  1912 ?        Sl     0:00          \_ 
> /usr/lib/gnome-settings-daemon/gnome-sett
>  1936 ?        S      0:00          |   \_ syndaemon -i 2.0 -K -R -t
>  1929 ?        Sl     0:05          \_ compiz
>  2037 ?        Ss     0:00          |   \_ /bin/sh -c 
> /usr/bin/compiz-decorator
>  2038 ?        Sl     0:00          |       \_ /usr/bin/gtk-window-decorator
>  1959 ?        Sl     0:00          \_ nautilus -n
>  1961 ?        Sl     0:00          \_ bluetooth-applet
>  1962 ?        Sl     0:00          \_ 
> /usr/lib/gnome-settings-daemon/gnome-fall
>  1963 ?        Sl     0:00          \_ nm-applet
>  1972 ?        Sl     0:00          \_ 
> /usr/lib/policykit-1-gnome/polkit-gnome-a
>  2233 ?        Sl     0:00          \_ 
> /usr/lib/gnome-disk-utility/gdu-notificat
>  2236 ?        Sl     0:00          \_ telepathy-indicator
>  2254 ?        Sl     0:00          \_ zeitgeist-datahub
>  2427 ?        Sl     0:00          \_ update-notifier
>  2482 ?        Sl     0:00          \_ 
> /usr/lib/deja-dup/deja-dup/deja-dup-monit
>  1135 ?        Ss     0:00 /usr/sbin/irqbalance
>  1175 ?        Ssl    0:00 whoopsie
>  1193 ?        Ss     0:00 /usr/sbin/rpc.mountd --manage-gids
>  1365 ?        Sl     0:00 /opt/sge/bin/lx-amd64/sge_qmaster
>  1405 ?        Sl     0:00 /usr/lib/accountsservice/accounts-daemon
>  1432 ?        Sl     0:00 /usr/sbin/console-kit-daemon --no-daemon
>  1555 ?        Sl     0:00 /usr/lib/upower/upowerd
>  1752 ?        SNl    0:00 /usr/lib/rtkit/rtkit-daemon
>  1768 ?        Sl     0:00 /usr/lib/x86_64-linux-gnu/colord/colord
>  1841 ?        Sl     0:00 /usr/bin/gnome-keyring-daemon --daemonize --login
>  1901 ?        S      0:00 /usr/bin/dbus-launch --exit-with-session 
> gnome-sessio
>  1902 ?        Ss     0:00 //bin/dbus-daemon --fork --print-pid 5 
> --print-addres
>  1920 ?        S      0:00 /usr/lib/gvfs/gvfsd
>  1922 ?        Sl     0:00 /usr/lib/gvfs//gvfs-fuse-daemon -f 
> /home/master/.gvfs
>  1941 ?        S<l    0:00 /usr/bin/pulseaudio --start --log-target=syslog
>  1946 ?        S      0:00  \_ /usr/lib/pulseaudio/pulse/gconf-helper
>  1943 ?        S      0:00 /usr/lib/x86_64-linux-gnu/gconf/gconfd-2
>  1948 ?        S      0:00 /usr/lib/gvfs/gvfsd-metadata
>  1971 ?        S      0:00 /usr/lib/gvfs/gvfs-gdu-volume-monitor
>  1979 ?        Sl     0:00 /usr/lib/udisks/udisks-daemon
>  1980 ?        S      0:00  \_ udisks-daemon: not polling any devices
>  1984 ?        Sl     0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor
>  1987 ?        S      0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor
>  2003 ?        Sl     0:00 /usr/lib/notify-osd/notify-osd
>  2007 ?        S      0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.5 
> /org/gtk/gvf
>  2011 ?        Sl     0:00 /usr/bin/gnome-screensaver --no-daemon
>  2016 ?        S      0:00 /usr/lib/gvfs/gvfsd-burn --spawner :1.5 
> /org/gtk/gvfs
>  2019 ?        Sl     0:00 /usr/lib/bamf/bamfdaemon
>  2043 ?        Sl     0:00 /usr/lib/unity/unity-panel-service
> 2045 ?        Sl     0:00 /usr/lib/indicator-appmenu/hud-service
>  2064 ?        Sl     0:00 
> /usr/lib/indicator-session/indicator-session-service
>  2066 ?        Sl     0:00 
> /usr/lib/indicator-datetime/indicator-datetime-servic
>  2068 ?        Sl     0:00 
> /usr/lib/indicator-messages/indicator-messages-servic
>  2070 ?        Sl     0:00 /usr/lib/indicator-sound/indicator-sound-service
>  2079 ?        Sl     0:00 
> /usr/lib/indicator-printers/indicator-printers-servic
>  2080 ?        Sl     0:00 
> /usr/lib/indicator-application/indicator-application-
>  2109 ?        S      0:00 /usr/lib/geoclue/geoclue-master
>  2112 ?        Sl     0:00 /usr/lib/ubuntu-geoip/ubuntu-geoip-provider
>  2190 ?        S      0:00 /opt/sge/bin/lx-amd64/sge_shadowd
>  2226 tty1     Ss+    0:00 /sbin/getty -8 38400 tty1
>  2243 ?        Sl     0:00 /usr/lib/telepathy/mission-control-5
>  2248 ?        Sl     0:00 /usr/lib/gnome-online-accounts/goa-daemon
>  2262 ?        Sl     0:00 /usr/bin/zeitgeist-daemon
>  2268 ?        Sl     0:00 /usr/lib/zeitgeist/zeitgeist-fts
>  2276 ?        S      0:00  \_ /bin/cat
>  2287 ?        Sl     0:00 
> /usr/lib/unity-lens-applications/unity-applications-d
>  2290 ?        Sl     0:00 /usr/bin/python 
> /usr/lib/unity-lens-video/unity-lens-
>  2292 ?        Sl     0:00 /usr/lib/unity-lens-files/unity-files-daemon
>  2293 ?        Sl     0:00 /usr/lib/unity-lens-music/unity-music-daemon
>  2322 ?        Sl     0:01 gnome-terminal
>  2327 ?        S      0:00  \_ gnome-pty-helper
>  2329 pts/0    Ss     0:00  \_ bash
>  2616 pts/0    R+     0:00      \_ ps -e f
>  2385 ?        Sl     0:00 /usr/bin/python 
> /usr/lib/unity-scope-video-remote/uni
>  2387 ?        Sl     0:00 /usr/lib/unity-lens-music/unity-musicstore-daemon
>  2438 ?        S      0:00 /usr/bin/python 
> /usr/lib/system-service/system-servic
>  2442 ?        SNl    0:04 /usr/bin/python /usr/bin/update-manager 
> --no-focus-on
>  2448 ?        Sl     0:00 /usr/lib/dconf/dconf-service
>  2463 ?        SN     0:01 /usr/bin/python /usr/sbin/aptd
> 
> 
> and about the port if it is open is listening state meaning there is no 
> problem with the port??
> root@sgemstr:~# netstat -nltp |grep 644
> tcp        0      0 0.0.0.0:6444            0.0.0.0:*               LISTEN    
>   1365/sge_qmaster
> 
> finally about the firewall i never change anything or give a role and this is 
> the output :
> root@sgemstr:~# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination        
>  
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination        
>  
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination         
> 
> is't that mean i have an empty iptables??
> 
> many regards..
> 
> 
> 
> On Wednesday, October 29, 2014 9:01 PM, Reuti <re...@staff.uni-marburg.de> 
> wrote:
> 
> 
> Please keep the list posted.
> 
> Am 29.10.2014 um 18:47 schrieb Disny Disny:
> 
> > Hello Reuti
> > this is the output of qhost and qstat -f but i don't know what it means so 
> > i'm hoping you can help
> > 
> > kind regards.. 
> > 
> > root@sgemstr:~# qhost
> > HOSTNAME                ARCH        NCPU NSOC NCOR NTHR NLOAD  MEMTOT  
> > MEMUSE  SWAPTO  SWAPUS
> > ----------------------------------------------------------------------------------------------
> > global                  -              -    -    -    -    -      -      -  
> >     -      -
> > gcl1                    lx-amd64        4    1    4    4    -    3.8G      
> > -    6.7G      -
> > gcl2                    lx-amd64        4    1    4    4    -    3.7G      
> > -    3.8G      -
> > gcl3                    lx-amd64        4    1    4    4    -    1.9G      
> > -    6.7G      -
> > shdwgcl4                lx-amd64        4    1    4    4    -    3.8G      
> > -    3.8G      -
> > root@sgemstr:~# qstat -f
> > queuename                      qtype resv/used/tot. np_load  arch          
> > states
> > ---------------------------------------------------------------------------------
> > all.q@gcl1                    BIP  0/0/4          -NA-    lx-amd64      au
> > ---------------------------------------------------------------------------------
> > all.q@gcl2                    BIP  0/0/4          -NA-    lx-amd64      au
> > ---------------------------------------------------------------------------------
> > all.q@gcl3                    BIP  0/0/4          -NA-    lx-amd64      au
> > ---------------------------------------------------------------------------------
> > all.q@shdwgcl4                BIP  0/0/4          -NA-    lx-amd64      au
> 
> This looks like there is no communication between the qmaster and the execds. 
> Checking the output of:
> 
> $ ps -e f
> 
> shows the `sgemaster` resp. `sgexecd` running on the systems? Do you have a 
> firewall in place? Maybe the port 6444 and 6445 needs to be opened.
> 
> 
> -- Reuti
> 
> 
> >  
> > ############################################################################
> >  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
> > ############################################################################
> >      4 0.00000 Sleeper    root        qw    10/23/2014 09:20:09    1      
> > root@sgemstr:~# 
> > 
> > 
> > On Thursday, October 23, 2014 6:38 PM, Reuti <re...@staff.uni-marburg.de> 
> > wrote:
> > 
> > 
> > Please check in `qhost` resp. `qstat -f` the state of the machines, i.e. 
> > whether the execd can be reached by returning a suitable value for the 
> > machines. - Reuti
> > 
> > Am 23.10.2014 um 17:35 schrieb Disny Disny:
> > 
> > > Yes during the exec installation it added a startup script but is there 
> > > other startup i need to add to it manually??
> > > 
> > > 
> > > From: Reuti <re...@staff.uni-marburg.de>; 
> > > To: Disny Disny <disny.wo...@yahoo.com>; 
> > > Cc: grid Engine Mailing List <users@gridengine.org>; 
> > > Subject: Re: Queue instances dropped 
> > > Sent: Thu, Oct 23, 2014 3:29:58 PM 
> > > 
> > > Am 23.10.2014 um 17:23 schrieb Disny Disny:
> > > 
> > > 
> > > > I have a problem with Sge ..after installing the cluster everything 
> > > > wotked fine but when i shut down the pcs and in other time i start them 
> > > > and try to submit ajob i got this message :
> > > > queue instance "all.q@gcl2" droped because It is temprerly not available
> > > > 
> > > > queue instance "all.q@gcl3" droped because It is temprerly not available
> > > > 
> > > > queue instance "all.q@shdwgcl4" droped because It is temprerly not 
> > > > available
> > > > 
> > > > queue instance "all.q@gcl1" droped because It is temprerly not available
> > > > all queues are dropped because of overload or full.
> > > > I appreaciate any help.
> > > 
> > > 
> > > Are the execd's running on the ndoes - maybe they need to be added to 
> > > your startup mechanism to do it automatically in case you shutdown the 
> > > machines?
> > > 
> > > -- Reuti
> > 
> > 
> > 
> 
> 
> 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to