Hi,

For whatever weird reason "kill -3 <jboss-pid>” produced NO output in 
console.log.
After downgrading vdsm with backport fix and restart of oVier engine problem 
fixed. I did restart of node several times - everything seem to be fine for now.

Current content of console.log.
WARNING: -jaxpmodule is deprecated and may be removed in a future release
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.jackson.map.util.ClassUtil 
(jar:file:/usr/share/ovirt-engine-wildfly/modules/system/layers/base/org/codehaus/jackson/jackson-mapper-asl/main/jackson-mapper-asl-1.9.13.redhat-00007.jar!/)
 to constructor java.util.Collections$EmptyMap()
WARNING: Please consider reporting this to the maintainers of 
org.codehaus.jackson.map.util.ClassUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release



> On 9 Aug 2021, at 18:28, Artur Socha <[email protected]> wrote:
> 
> You can use that one or this 'simplified' short version 
> https://access.redhat.com/solutions/3227681 
> <https://access.redhat.com/solutions/3227681>
> 
> Artur
> 
> On Mon, Aug 9, 2021 at 5:01 PM Andrei Verovski <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi
> 
> 
> Should  I use threaddump_linux.sh.tar.gz ?
> from:
> 
> https://access.redhat.com/solutions/18178 
> <https://access.redhat.com/solutions/18178>
> 
> 
> > On 9 Aug 2021, at 17:56, Artur Socha <[email protected] 
> > <mailto:[email protected]>> wrote:
> > 
> > Actually you could even make 3 thread dumps in 30second intervals. 
> > Artur
> > 
> > On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <[email protected] 
> > <mailto:[email protected]>> wrote:
> > Unfortunately I don't see anything wrong in both engine and vdsm logs. 
> > There is one last thing that comes to my mind that you try - restart engine 
> > service. That is exactly the case I have been investigating. 
> > But before restarting I would like to ask you, if possible, for a java 
> > (jvm) thread dump. 
> > The procedure is as follows:
> > 1)  find jboss pid  ie.
> > $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
> > 2) trigger thread dump
> > $ kill -3 <jboss-pid>
> > 3)  thread dump logs can be found at /var/log/ovirt-engine/console.log
> > 
> > And then restart engine service to check if that helps.
> > 
> > Artur
> > 
> > 
> > On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <[email protected] 
> > <mailto:[email protected]>> wrote:
> > Hi, Artur,
> > 
> > Small update with vdsm status, forgot to include in previous post.
> > 
> > I partially fixed problem with VDSM start.
> > 
> > Bug "Failed to create session: Start job for unit user-0.slice failed with 
> > ‘canceled’”
> > is being described here
> > https://bugzilla.redhat.com/show_bug.cgi?id=1967962 
> > <https://bugzilla.redhat.com/show_bug.cgi?id=1967962>
> > and fix seem to be available here, so I have downgraded systemd with 
> > backport fix:
> > http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/
> >  
> > <http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/>
> > 
> > Now vdsmd service starts successfully, but node14 still cannot be activated 
> > because of same error. This is quite strange, before restart on Friday node 
> > just worked. There were no upgrades, nothing, just restart.
> > 
> > [root@node14 ~]# service vdsmd status
> > Redirecting to /bin/systemctl status vdsmd.service
> > ● vdsmd.service - Virtual Desktop Server Manager
> >    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
> > preset: disabled)
> >    Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s ago
> >   Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh 
> > --pre-start (code=exited, status=0/SUCCESS)
> >  Main PID: 4130 (vdsmd)
> >     Tasks: 41 (limit: 615525)
> >    Memory: 59.5M
> >    CGroup: /system.slice/vdsmd.service
> >            └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd
> > 
> > Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > prepare_transient_repository
> > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > syslog_available
> > Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > nwfilter
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > dummybr
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > tune_system
> > Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > test_space
> > Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> > test_lo
> > Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server 
> > Manager.
> > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error: 
> > [Errno 111] Connection refused
> > Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM stats 
> > will be missing. Error:
> > 
> > 
> > [root@node14]# firewall-cmd --list-all
> > public (active)
> >   target: default
> >   icmp-block-inversion: no
> >   interfaces: DMZ_node14 eno1 eno2 ovirtmgmt
> >   sources: 
> >   services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio 
> > ovirt-vmconsole rpc-bind snmp ssh vdsm
> >   ports: 2301/tcp 2381/tcp 22/tcp 6081/udp
> >   protocols: 
> >   forward: no
> >   masquerade: no
> >   forward-ports: 
> >   source-ports: 
> >   icmp-blocks: 
> >   rich rules: 
> > [root@node14 andrei]# 
> > 
> > 
> > vdsm-client Host getStats and vdsm-client Host getCapabilities attached.
> > 
> > 
> > 
> > 
> >> On 9 Aug 2021, at 13:18, Artur Socha <[email protected] 
> >> <mailto:[email protected]>> wrote:
> >> 
> >> Thanks for the logs.  I am checking them at the moment. I have noticed so 
> >> far that node14 is serving NFS share which had been marked as problematic 
> >> (probably because of the downtime during the migration) but it has 
> >> recovered. 
> >> 
> >> In the meantime, is is possible to get some meaningful results when  
> >> calling:
> >> $ vdsm-client Host getStats
> >> and 
> >> $ vdsm-client Host getCapabilities 
> >> on node14?
> >> 
> >> What  is the state for vdsmd service when running systemctl status vdsmd? 
> >> One other thing to rule out is the networking/firewall. Here the list of 
> >> the ports to be open for the host (the documentation is for hosted engine, 
> >> but it applies for standalone setup as well):  
> >> https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy
> >>  
> >> <https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy>
> >> 
> >> btw. I have been hunting for the rare and hard to recreate bug for quite a 
> >> long time (without success yet) so any reported connectivity issues 
> >> between the manager and hosts are super interesting to me. 
> >> 
> >> Artur
> >> 
> >> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1@***.lv> wrote:
> >> Hi, Artur,
> >> 
> >> 
> >> Thanks for assistance. Zipped engine starting from the day of upgrade 
> >> attached.
> >> Restart via SSH from oVirt Web GUI works.
> >> oVirt engine runs on dedicated server, not hosted engine.
> >> 
> >> 
> >> 
> >> 
> >>> On 9 Aug 2021, at 11:24, Artur Socha <[email protected] 
> >>> <mailto:[email protected]>> wrote:
> >>> 
> >>> Hi Andrei,
> >>> Could you also post a relevant piece of engine.log? I don't have high 
> >>> expectations to find the answer there but  I just want  to be sure of it.
> >>> VDSM.log does not show any trace of error from the vdsm point of view. 
> >>> For example it looks like it started correctly and subscribed to 
> >>> receiving commands from the engine (yet that does not mean I connected to 
> >>> it - only in listening mode). 
> >>> 
> >>> Can you confirm that 'SSH restart' from UI works - by 'works' I mean the 
> >>> host is actually restarted after a few minutes and there are no ssh 
> >>> related (public key etc) errors in engine.log?
> >>> 
> >>> Artur
> >>> 
> >>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1@***.lv> wrote:
> >>> Hi,
> >>> 
> >>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with 
> >>> CentOS 8 stream).
> >>> After replacing server rack router switch and restart got this error I 
> >>> can’t recover from:
> >>> 
> >>> VDSM node14 command Get Host Capabilities failed: Message timeout which 
> >>> can be caused by communication issues
> >>> 
> >>> vdsm-network running fine, but vdsmd can’t start on node14 for whatever 
> >>> reason. All other nodes running fine.
> >>> 
> >>> Aug 09 10:24:12 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsmd_init_common.sh[4825]: vdsm: Running dummybr
> >>> Aug 09 10:24:13 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsmd_init_common.sh[4825]: vdsm: Running tune_system
> >>> Aug 09 10:24:13 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsmd_init_common.sh[4825]: vdsm: Running test_space
> >>> Aug 09 10:24:13 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsmd_init_common.sh[4825]: vdsm: Running test_lo
> >>> Aug 09 10:24:13 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> systemd[1]: Started Virtual Desktop Server Manager.
> >>> Aug 09 10:24:16 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> sudo[7721]: pam_systemd(sudo:session): Failed to create session: Start 
> >>> job for unit user-0.slice failed with 'canceled'
> >>> Aug 09 10:24:16 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> sudo[7721]: pam_unix(sudo:session): session opened for user root by 
> >>> (uid=0)
> >>> Aug 09 10:24:16 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> sudo[7721]: pam_unix(sudo:session): session closed for user root
> >>> Aug 09 10:24:17 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsm[6754]: WARN MOM not available. Error: [Errno 2] No such file or 
> >>> directory
> >>> Aug 09 10:24:17 node14.mydomain.lv <http://node14.mydomain.lv/> 
> >>> vdsm[6754]: WARN MOM not available, KSM stats will be missing. Error:
> >>> 
> >>> 
> >>> In web gui -> Management I can’t do anything with the host except 
> >>> restart. Stop aborts with error, all other commands are gray-ed out.
> >>> Status is “Unassigned”. Host is answering to pings as usual.
> >>> vdsm.log (from node14) attached.
> >>> 
> >>> Thanks in advance for any help.
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Users mailing list -- [email protected] <mailto:[email protected]>
> >>> To unsubscribe send an email to [email protected] 
> >>> <mailto:[email protected]>
> >>> Privacy Statement: https://www.ovirt.org/privacy-policy.html 
> >>> <https://www.ovirt.org/privacy-policy.html>
> >>> oVirt Code of Conduct: 
> >>> https://www.ovirt.org/community/about/community-guidelines/ 
> >>> <https://www.ovirt.org/community/about/community-guidelines/>
> >>> List Archives: 
> >>> https://lists.ovirt.org/archives/list/[email protected]/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/
> >>>  
> >>> <https://lists.ovirt.org/archives/list/[email protected]/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/>
> >>> 
> >>> 
> >>> -- 
> >>> Artur Socha
> >>> Senior Software Engineer, RHV
> >>> Red Hat
> >> 
> >> 
> >> 
> >> -- 
> >> Artur Socha
> >> Senior Software Engineer, RHV
> >> Red Hat
> > 
> > 
> > 
> > -- 
> > Artur Socha
> > Senior Software Engineer, RHV
> > Red Hat
> > 
> > 
> > -- 
> > Artur Socha
> > Senior Software Engineer, RHV
> > Red Hat
> 
> 
> 
> -- 
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/5CXP4SWTMAMHJAS6X56GSVBULGVR23D7/

Reply via email to