On 03/03/2014 05:27 PM, Martin Sivak wrote:
The agent is required. In fact it contains all the logic.

Thanks a lot for the information.
I patched the agent and now it is running fine:

# service ovirt-ha-agent status
ovirt-ha-agent (pid 12385) is running...

Btw, after starting the ha-agent the broken lock-file was fixed, too.
So hosted engine is working fine now.
Martin, Didi, thanks a lot for your help!


Regards,
René



--
Martin Sivák
[email protected]
Red Hat Czech
RHEV-M SLA / Brno, CZ

----- Original Message -----
On 03/03/2014 02:05 PM, Martin Sivak wrote:
Hi René,

# python --version
Python 2.6.6

Then I guess the traceback is my fault...

See http://gerrit.ovirt.org/#/c/25269/ for the fix. I will try to get it
into the soonest release possible.


Thanks. Do I have to patch the files manually or is ovirt-ha-agent not
strictly required for hosted engine? Some features like restarting
engine on 2nd node want work if ovirt-ha-agent isn't working, I guess.


I can't see a full filesystem here:


Me neither. Is everything Read-Write? Read-Only FS might report no space
left as well in some cases. Other than that, I do not know.

No, I can write to all disks.
Btw, the same error message occurs on both nodes...


Regards,
René



Regards
--
Martin Sivák
[email protected]
Red Hat Czech
RHEV-M SLA / Brno, CZ

----- Original Message -----
On 03/03/2014 12:05 PM, Martin Sivak wrote:
Hi René,

thanks for the report.

TypeError: super() argument 1 must be type, not classobj
What Python version are you using?

# python --version
Python 2.6.6


You can debug a crash of this version of ha-agent using:

/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon --pdb

This gives me the same information as in vdsm.log


But this exception is trying to tell you that
FSMLoggerAdapter(logging.LoggerAdapter) does not have object in the
ancestor list. And that is very weird.

It can be related to the disk space issues.

libvirtError: Failed to acquire lock: No space left on device

Check the free space on all your devices, including /tmp and /var. Or
post
the output of "df -h" command here

I can't see a full filesystem here:

# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/vg0-lv_root  5.0G  1.1G  3.6G  24% /
tmpfs                     16G     0   16G   0% /dev/shm
/dev/sda1                243M   45M  185M  20% /boot
/dev/mapper/vg0-lv_data  281G   21G  261G   8% /data
/dev/mapper/vg0-lv_tmp   2.0G   69M  1.9G   4% /tmp
/dev/mapper/vg0-lv_var   5.0G  384M  4.3G   9% /var
ovirt-host01:/engine     281G   21G  261G   8%
/rhev/data-center/mnt/ovirt-host01:_engine


Thanks,
René



Regards

--
Martin Sivák
[email protected]
Red Hat Czech
RHEV-M SLA / Brno, CZ

----- Original Message -----
Il 03/03/2014 11:33, René Koch ha scritto:
Hi,

I have some issues with hosted engine (oVirt 3.4 prerelease repo on
CentOS
6.5).
My setups is the following:
2 hosts (will be 4 in the future) with 4 GlusterFS shares:
- engine (for hosted engine)
- iso (for ISO domain)
- ovirt (oVirt storage domain)

I had a split-brain situation today (after rebooting both nodes) on
hosted-engine.lockspace file on engine GlusterFS volume which I
resolved.

How did you solved it? By switching to NFS only?


hosted engine used engine share via NFS (TCP) as glusterfs isn't
supported
for oVirt hosted engine, yet. I'll switch to GlusterFS as soon as oVirt
will support it (I hope this will be soon as RHEV 3.3 is already
supporting
GlusterFS for hosted engine).


First of all ovirt-ha-agent fails to start on both nodes:

# service ovirt-ha-agent start
Starting ovirt-ha-agent:                                   [  OK  ]
# service ovirt-ha-agent status
ovirt-ha-agent dead but subsys locked


MainThread::INFO::2014-03-03
11:20:39,539::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
ovirt-hosted-engine-ha agent 1.1.0 started
MainThread::INFO::2014-03-03
11:20:39,590::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found
certificate common name: 10.0.200.101
MainThread::CRITICAL::2014-03-03
11:20:39,590::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
Could not start ha-agent
Traceback (most recent call last):
     File
     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
     line 97, in run
       self._run_agent()
     File
     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
     line 154, in _run_agent
       hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
     File
     
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
     line 152, in __init__
       "STOP_VM": self._stop_engine_vm
     File
     
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
     line 56, in __init__
       logger, actions)
     File
     
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
     line 93, in __init__
       self._logger = FSMLoggerAdapter(logger, self)
     File
     
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
     line 16, in __init__
       super(FSMLoggerAdapter, self).__init__(logger, None)
TypeError: super() argument 1 must be type, not classobj



If I want to start my hosted engine, I receive the following error in
vdsm
logs, which makes absolutly no sense to me, as there is plenty of disk
space available:

Thread-62::DEBUG::2014-03-03
11:24:46,282::libvirtconnection::124::root::(wrapper) Unknown
libvirterror: ecode: 38 edom: 42 level: 2 message: Failed
to acquire lock: No space left on device

seems like a vdsm failure in starting monitor the hosted engine storage
domain.
Can you attach vdsm logs?



Thread-62::DEBUG::2014-03-03
11:24:46,282::vm::2252::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released
Thread-62::ERROR::2014-03-03
11:24:46,283::vm::2278::vm.Vm::(_startUnderlyingVm)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process
failed
Traceback (most recent call last):
     File "/usr/share/vdsm/vm.py", line 2238, in _startUnderlyingVm
       self._run()
     File "/usr/share/vdsm/vm.py", line 3159, in _run
       self._connection.createXML(domxml, flags),
     File
     "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
     line
     92, in wrapper
       ret = f(*args, **kwargs)
     File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in
     createXML
       if ret is None:raise libvirtError('virDomainCreateXML() failed',
       conn=self)
libvirtError: Failed to acquire lock: No space left on device
Thread-62::DEBUG::2014-03-03
11:24:46,286::vm::2720::vm.Vm::(setDownStatus)
vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down:
Failed
to acquire lock: No space left on device

# df -h | grep engine
ovirt-host01:/engine     281G   21G  261G   8%
/rhev/data-center/mnt/ovirt-host01:_engine

# sudo -u vdsm dd if=/dev/zero
of=/rhev/data-center/mnt/ovirt-host01:_engine/2851af27-8744-445d-9fb1-a0d083c8dc82/images/0e4d270f-2f7e-4b2b-847f-f114a4ba9bdc/test
bs=512 count=100
100+0 records in
100+0 records out
51200 bytes (51 kB) copied, 0.0230566 s, 2.2 MB/s


Could you give me some information on how to fix the ovirt-ha-agent and
then hosted-engine storage issue? Thanks a lot.

Btw, I had some issues during installation which I will explain in
separate
emails.




--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community
collaboration.
See how it works at redhat.com



_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to