Thanks for the advice. I found the problem and got it resolved. During the
agent (with debug enabled per you suggestion) did a tail/grep using the UUID of
primary storage and discovered that during the mount/add to libvrt process it
was getting an I/O error on a UUID of a QCOW2 volume. Below is a snippet form
the tail/grep. So I stopped the agent, mounted primary storage manually and
tried to copy that file from the log. Sure enough I got an IO error. I then
copied some other random small files and they were OK so it appeared that this
one volume was corrupt.
I looked up the volume UUID in the volumes table and found the instance it
belonged to which was a stopped VR. I destroyed the VR and started the agent.
I still got the IO error because the volume was still there (probably hadn't
gone thru the expunge process yet). I stopped the agent, manually moved the
file to a temp directory and then started the agent. Everything worked
normally then. It added the primary storage and started to turn on VRs. I
then restarted the agents on all hosts and all started working again.
It behaved as if during the process of adding the pool to libvirt all of the
volumes are examined to get information about it I suppose. Because this one
volume was corrupt that prevented the pool from being added. At least that is
my theory.
I do still have one problem. The system VMs are stuck in a starting state. I
think due to timing of the agent restarts. When I look on the host they are
"starting" on I don't see them with the "virsh list" command. I am going to
give them time just in case it's a work load issue but if they are still
starting after an hour or so I will probably change the database status for
them to stop, then recreated them again.
Thanks for the help!
Here is the agent log snippet:
----
tail -f /var/log/cloudstack/agent/agent.log | grep
"c3991ea2\-b702\-3b1b\-bfc5\-69cb7d928554"
2016-04-16 10:43:00,245 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null)
(logid:30562dd3) Request:Seq 46-5281314988022038529: { Cmd , MgmtId:
345049993464, via: 46, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":5,"uuid":"c3991ea2-b702-3b1b-bfc5-69cb7d928554","host":"gv0cl1.pod1.aus1.centex.rsitex.com","path":"/gv0cl1","port":24007,"type":"Gluster"},"localPath":"/mnt//c3991ea2-b702-3b1b-bfc5-69cb7d928554","wait":0}}]
}
2016-04-16 10:43:00,318 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-1:null) (logid:30562dd3) Attempting to create storage
pool c3991ea2-b702-3b1b-bfc5-69cb7d928554 (Gluster) in libvirt
2016-04-16 10:43:00,322 WARN [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-1:null) (logid:30562dd3) Storage pool
c3991ea2-b702-3b1b-bfc5-69cb7d928554 was not found running in libvirt. Need to
create it.
2016-04-16 10:43:00,322 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-1:null) (logid:30562dd3) Didn't find an existing storage
pool c3991ea2-b702-3b1b-bfc5-69cb7d928554 by UUID, checking for pools with
duplicate paths
2016-04-16 10:43:00,325 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-1:null) (logid:30562dd3) Attempting to create storage
pool c3991ea2-b702-3b1b-bfc5-69cb7d928554
<name>c3991ea2-b702-3b1b-bfc5-69cb7d928554</name>
<uuid>c3991ea2-b702-3b1b-bfc5-69cb7d928554</uuid>
<path>/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554</path>
2016-04-16 10:43:00,775 ERROR [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-1:null) (logid:30562dd3) org.libvirt.LibvirtException:
cannot read header
'/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254':
Input/output error
org.libvirt.LibvirtException: cannot read header
'/mnt/c3991ea2-b702-3b1b-bfc5-69cb7d928554/52a6130f-266e-4667-8ca2-89f932a0b254':
Input/output error
----
Richard Klein <[email protected]>
RSI
5426 Guadalupe, Suite 100
Austin TX 78751
RSI Help Desk: (512) 334-3334
Phone: (512) 275-0358
Fax: (512) 328-3410
> -----Original Message-----
> From: Simon Weller [mailto:[email protected]]
> Sent: Friday, April 15, 2016 8:47 PM
> To: [email protected]
> Subject: Re: Primary storage not mounted on hosts?
>
> Richard,
>
> The Cloudstack-agent should populate the libvirt pool-list when it starts up.
> Have you tried restarting libvirtd and then restarting the Cloudstack-agent?
>
> You may want to turn up debugging on the agent so you get some more detail
> on what's going on.
> You can do this by modifying /etc/cloudstack/agent/log4j-cloud.xml
> See this wiki article for more details:
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/KVM+agent+debug
>
> - Si
>
> ________________________________________
> From: Richard Klein (RSI) <[email protected]>
> Sent: Friday, April 15, 2016 6:54 PM
> To: [email protected]
> Subject: Primary storage not mounted on hosts?
>
> I am not sure what happened but our primary storage, which is Gluster, on all
> our hosts is not mounted anymore. When I do "virsh pool-list" on any host I
> only see the local pool. Gluster is working fine and there are no problems
> with
> it because I can mount the Gluster volume manually on any of the hosts and
> see the primary storage. Instances that are running can write data to the
> local
> volume and pull data from it. But if a VM is stopped it can't start again.
> I get
> the "Unable to create a New VM - Error message: Unable to start instance due
> to Unable to get answer that is of class com.cloud.agent.api.StartAnswer"
> that I
> have seen a thread in this mailing list and I am sure its primary storage
> related.
>
> The agent logs on the hosts are issuing the following log snippets which
> confirm its looking for primary storage:
>
> 2016-04-15 18:42:34,838 INFO [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-3:null) (logid:ad8ec05a) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:19,006 INFO [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
> 2016-04-15 18:45:49,010 INFO [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-1:null) (logid:4c396753) Trying to fetch storage pool
> c3991ea2-b702-3b1b-bfc5-69cb7d928554 from libvirt
>
> The c3991ea2-b702-3b1b-bfc5-69cb7d928554 is the UUID of our primary
> storage.
>
> We did have some secondary storage issues (NFS) that caused some NFS
> mounts to secondary storage to hang. The only way to recover was to reboot
> the host. There were 2 host affected so I put each host in maintenance mode,
> rebooted and then canceled maintenance mode. I did this one host at a time.
> It seems like ever since this has happened I have had issues.
>
> Is there a way to get the primary storage remounted and added to libvirt pool-
> list while keeping the VMs up and running? At this point the only idea I
> have to
> recover is to power off all VMs, disable primary storage then enable it again.
> This is a little extreme and is a last resort but I don't know what other
> options I
> have.
>
> Any suggestions?
>
>
> Richard Klein <[email protected]>
> RSI
> 5426 Guadalupe, Suite 100
> Austin TX 78751
>