Re: Ubuntu 14.04 + qemu 2.5 vs. ACS 4.8 advice needed

2017-07-21 Thread Dmytro Shevchenko

Here is the error that we caught with Libvirt 1.3.1:

2017-07-21 11:46:27,501 WARN  [kvm.resource.LibvirtComputingResource] 
(agentRequest-Handler-1:null)  Plug Nic failed due to 
org.libvirt.LibvirtException: internal error: unable to execute QEMU command 
'device_add': Hot-plugged device without ROM bar can't have an option ROM
org.libvirt.LibvirtException: internal error: unable to execute QEMU command 
'device_add': Hot-plugged device without ROM bar can't have an option ROM
at org.libvirt.ErrorHandler.processError(Unknown Source)
at org.libvirt.Connect.processError(Unknown Source)


On 21.07.17 16:18, Andrija Panic wrote:

Hi all,

we are using ACS 4.8, on Ubuntu 14.04 (stock version qemu/libvirt -
2.0.0/1.2.2) and we have issues during live migration of "busy" VMs (high
RAM change rate), so we plan to introduce additional flag+global parameter
in ACS, which will auto converge flag libvirt to enable auto convergence
(thanks Mike. T :) )

qemu 1.6+ supports auto convergence, but before 2.5 it's mostly useless
(based on docs and my own testing)

Since we cant upgrade to Ubuntu 16.04 before ACS 4.9 or 4.10 release, we
are forced to play with qemu/libvirt verisons, so I used Ubuntu OpenStack
repo for "MITAKA" release, which provides QEMU 2.5 / LIBVIRT 1.3.1 (auto
convergence works like charm).


Now, we are facing different issues that ACS send command to libvirt, but
libvirt basically do some silly things and cant start some VMs at all, etc.

Anyone running ACS 4.8 with qemu 2.5+ and libvirt 1.3 , any advice how to
proceed (our developers are already debugging things, but this takes time) ?

Any info is really appreciated.

Thanks !





Re: Ubuntu 14.04 + qemu 2.5 vs. ACS 4.8 advice needed

2017-07-24 Thread Dmytro Shevchenko

Hello Wido,

we found source of this bug and also found that patch already implemented:

https://github.com/apache/cloudstack/commit/9dcfbceae71865b4de1a4744ceac8f48255733a2

sorry for inaccuracy, initially we tested on 4.5 release


On 24.07.17 14:56, Wido den Hollander wrote:

Op 21 juli 2017 om 17:57 schreef Dmytro Shevchenko :


Here is the error that we caught with Libvirt 1.3.1:

2017-07-21 11:46:27,501 WARN  [kvm.resource.LibvirtComputingResource] 
(agentRequest-Handler-1:null)  Plug Nic failed due to 
org.libvirt.LibvirtException: internal error: unable to execute QEMU command 
'device_add': Hot-plugged device without ROM bar can't have an option ROM
org.libvirt.LibvirtException: internal error: unable to execute QEMU command 
'device_add': Hot-plugged device without ROM bar can't have an option ROM
  at org.libvirt.ErrorHandler.processError(Unknown Source)
  at org.libvirt.Connect.processError(Unknown Source)


Can you show where this happens? Is that on VM start? If so, you see the XML in 
the agent.log and start then fails?

This seems like a libvirt 1.3.1 thing we might need to take a look at.

Wido


On 21.07.17 16:18, Andrija Panic wrote:

Hi all,

we are using ACS 4.8, on Ubuntu 14.04 (stock version qemu/libvirt -
2.0.0/1.2.2) and we have issues during live migration of "busy" VMs (high
RAM change rate), so we plan to introduce additional flag+global parameter
in ACS, which will auto converge flag libvirt to enable auto convergence
(thanks Mike. T :) )

qemu 1.6+ supports auto convergence, but before 2.5 it's mostly useless
(based on docs and my own testing)

Since we cant upgrade to Ubuntu 16.04 before ACS 4.9 or 4.10 release, we
are forced to play with qemu/libvirt verisons, so I used Ubuntu OpenStack
repo for "MITAKA" release, which provides QEMU 2.5 / LIBVIRT 1.3.1 (auto
convergence works like charm).


Now, we are facing different issues that ACS send command to libvirt, but
libvirt basically do some silly things and cant start some VMs at all, etc.

Anyone running ACS 4.8 with qemu 2.5+ and libvirt 1.3 , any advice how to
proceed (our developers are already debugging things, but this takes time) ?

Any info is really appreciated.

Thanks !





Re: Marvin Install Issue

2017-07-27 Thread Dmytro Shevchenko

Catch same issue. I'm using virtualenv and here is my requirements.txt:

https://dev.mysql.com/get/Downloads/Connector-Python/mysql-connector-python-2.1.6.tar.gz
  Marvin
  nose-timer

working fine.

On 27/07/17 05:16, Tutkowski, Mike wrote:

Hi everyone,

I am having trouble installing Marvin on Ubuntu 14.04 from master.

It’s complaining that it’s having trouble with mysql-connector-python.

mtutkowski@mike-ubuntu:~/cloudstack/cloudstack$ sudo pip install --upgrade 
tools/marvin/dist/Marvin-*.tar.gz
Unpacking ./tools/marvin/dist/Marvin-4.11.0.0-SNAPSHOT.tar.gz
   Running setup.py (path:/tmp/pip-5URDXT-build/setup.py) egg_info for package 
from 
file:///home/mtutkowski/cloudstack/cloudstack/tools/marvin/dist/Marvin-4.11.0.0-SNAPSHOT.tar.gz
 /usr/local/lib/python2.7/dist-packages/setuptools/dist.py:340: 
UserWarning: The version specified ('4.11.0.0-SNAPSHOT') is an invalid version, 
this may not work as expected with newer versions of setuptools, pip, and PyPI. 
Please see PEP 440 for more details.
   "details." % self.metadata.version

 warning: no files found matching '*.txt' under directory 'docs'
Could not find any downloads that satisfy the requirement 
mysql-connector-python>=1.1.6 in /usr/lib/python2.7/dist-packages (from 
Marvin==4.11.0.0-SNAPSHOT)
Downloading/unpacking mysql-connector-python>=1.1.6 (from 
Marvin==4.11.0.0-SNAPSHOT)
Cleaning up...
No distributions at all found for mysql-connector-python>=1.1.6 in 
/usr/lib/python2.7/dist-packages (from Marvin==4.11.0.0-SNAPSHOT)
Storing debug log for failure in /home/mtutkowski/.pip/pip.log

But it seems to be installed:

python-mysql.connector/trusty,now 1.1.6-1 all [installed]

Thoughts?

Thanks!
Mike


--
Best regards
Dmytro Shevchenko
dshevchenko.m...@gmail.com
skype: demonsh_mk
+380(66)2426648



[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2015-12-12 Thread dmytro-shevchenko
GitHub user dmytro-shevchenko opened a pull request:

https://github.com/apache/cloudstack/pull/1230

CLOUDSTACK-8302: Removing snapshots on RBD

Snapshot removing implemented if primary datastore is RBD
https://issues.apache.org/jira/browse/CLOUDSTACK-8302

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SafeSwissCloud/cloudstack CLOUDSTACK-8302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/1230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1230


commit 32df243422d5985c8631686a0b2a29835f804079
Author: Amin 
Date:   2015-11-30T13:39:50Z

Merge pull request #1 from apache/master

Volume snapshot lifecycle patch on RBD as primary storage

commit c4d749f3b41090c47d98431aef67fa769642c798
Author: Dmytro Shevchenko 
Date:   2015-12-10T15:07:58Z

Merge from apache/master

commit a47b9300e414039734017fde0a41d162b54891f2
Author: Dmytro Shevchenko 
Date:   2015-12-12T14:44:09Z

https://issues.apache.org/jira/browse/CLOUDSTACK-8302
Snapshot removing implemented on RBD




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2015-12-14 Thread dmytro-shevchenko
Github user dmytro-shevchenko commented on the pull request:

https://github.com/apache/cloudstack/pull/1230#issuecomment-164459723
  
Commits are squashed and message updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2015-12-19 Thread dmytro-shevchenko
Github user dmytro-shevchenko commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/1230#discussion_r48097934
  
--- Diff: 
plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java
 ---
@@ -1274,7 +1274,45 @@ public Answer createVolumeFromSnapshot(final 
CopyCommand cmd) {
 
 @Override
 public Answer deleteSnapshot(final DeleteCommand cmd) {
-return new Answer(cmd);
+try {
+SnapshotObjectTO snapshotTO = (SnapshotObjectTO) cmd.getData();
+PrimaryDataStoreTO primaryStore = (PrimaryDataStoreTO) 
snapshotTO.getDataStore();
+VolumeObjectTO volume = snapshotTO.getVolume();
+String snapshotFullPath = snapshotTO.getPath();
+String snapshotName = 
snapshotFullPath.substring(snapshotFullPath.lastIndexOf("/") + 1);
+KVMStoragePool primaryPool = 
storagePoolMgr.getStoragePool(primaryStore.getPoolType(), 
primaryStore.getUuid());
+KVMPhysicalDisk disk = 
storagePoolMgr.getPhysicalDisk(primaryStore.getPoolType(), 
primaryStore.getUuid(), volume.getPath());
+if (primaryPool.getType() == StoragePoolType.RBD) {
+Rados r = new Rados(primaryPool.getAuthUserName());
+r.confSet("mon_host", primaryPool.getSourceHost() + ":" + 
primaryPool.getSourcePort());
+r.confSet("key", primaryPool.getAuthSecret());
+r.confSet("client_mount_timeout", "30");
+r.connect();
+s_logger.debug("Succesfully connected to Ceph cluster at " 
+ r.confGet("mon_host"));
+IoCTX io = r.ioCtxCreate(primaryPool.getSourceDir());
+Rbd rbd = new Rbd(io);
+RbdImage image = rbd.open(disk.getName());
+try {
+s_logger.info("Attempting to remove RBD snapshot " + 
disk.getName() + "@" + snapshotName);
+if (image.snapIsProtected(snapshotName)) {
+s_logger.debug("Unprotecting snapshot " + 
snapshotFullPath);
+image.snapUnprotect(snapshotName);
+}
+image.snapRemove(snapshotName);
+s_logger.info("Snapshot " + snapshotFullPath + " 
successfully removed.");
+} finally {
+rbd.close(image);
+r.ioCtxDestroy(io);
+}
+} else {
+s_logger.warn("Operation not implemented!");
+throw new InternalErrorException("Operation not 
implemented!");
--- End diff --

Storage pool type added to messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2015-12-21 Thread dmytro-shevchenko
Github user dmytro-shevchenko commented on the pull request:

https://github.com/apache/cloudstack/pull/1230#issuecomment-166261539
  
Full snapshot path added into log messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2016-02-02 Thread dmytro-shevchenko
Github user dmytro-shevchenko commented on the pull request:

https://github.com/apache/cloudstack/pull/1230#issuecomment-178890524
  
Done, rebased with 4.9xx master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: ACS 4.5 - volume snapshots NOT removed from CEPH (only from Secondaryt NFS and DB)

2015-09-10 Thread Dmytro Shevchenko
>
>>>> BTW why is 16 snapshot limits hardcoded - any reason for that ?
>>>>
>>>> Not cleaning snapshots on CEPH and trying to delete volume
after having
>>>> more than 16 snapshtos in CEPH = Agent crashing on KVM
side...and some
>> VMs
>>>> being rebooted etc - which means downtime :|
>>>>
>>>> Thanks,
>>>>
>>>> On 9 September 2015 at 22:05, Simon Weller mailto:swel...@ena.com>> wrote:
>>>>
>>>>> Andrija,
>>>>>
>>>>> The Ceph snapshot deletion is not currently implemented.
>>>>>
>>>>> See: https://issues.apache.org/jira/browse/CLOUDSTACK-8302
>>>>>
>>>>> - Si
>>>>>
>>>>> 
>>>>> From: Andrija Panic 
>>>>> Sent: Wednesday, September 9, 2015 3:03 PM
>>>>> To: dev@cloudstack.apache.org
<mailto:dev@cloudstack.apache.org>; us...@cloudstack.apache.org
<mailto:us...@cloudstack.apache.org>
>>>>> Subject: ACS 4.5 - volume snapshots NOT removed from CEPH
(only from
>>>>> Secondaryt NFS and DB)
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> we enounter issue in ACS 4.5.1 (perhaps other versions also
affected) -
>>>>> when we delete some snapshot (volume snapshot) in ACS, ACS
marks it as
>>>>> deleted in DB, deletes from NFS Secondary Storage but fails
to delete
>>>>> snapshot on CEPH primary storage (doesn even try to delete
it AFAIK)
>>>>>
>>>>> So we end up having 5 live snapshots in DB (just example)
but actually
>> in
>>>>> CEPH there are more than i.e. 16 snapshots.
>>>>>
>>>>> More of the issue, when ACS agent tries to obtain list of
snapshots
>> from
>>>>> CEPH for some volume or so - if number of snapshots is over
16, it
>> raises
>>>>> exception  (and perhaps this is the reason Agent crashed for
us - need
>> to
>>>>> check with my colegues who are investigatin this in
details). This
>> number
>>>>> 16 is for whatever reasons hardcoded in ACS code.
>>>>>
>>>>> Wondering if anyone experienced this, or have any info - we
plan to
>> try to
>>>>> fix this, and I will inlcude my dev colegues here, but we
might need
>> some
>>>>> help at least for guidance or-
>>>>>
>>>>> Any help is really apreaciated or at list confirmation that
this is
>> known
>>>>> issue etc.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>>
>>>>> Andrija Panić
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Andrija Panić
>>>>
>>>
>>>
>>>
>>
>
>
>




--

Andrija Panić


--
---
Best regards
Dmytro Shevchenko
dshevchenko.m...@gmail.com
skype: demonsh_mk
+380(66)2426648



Re: ACS 4.5 - volume snapshots NOT removed from CEPH (only from Secondaryt NFS and DB)

2015-09-10 Thread Dmytro Shevchenko
>
>>>> BTW why is 16 snapshot limits hardcoded - any reason for that ?
>>>>
>>>> Not cleaning snapshots on CEPH and trying to delete volume
after having
>>>> more than 16 snapshtos in CEPH = Agent crashing on KVM
side...and some
>> VMs
>>>> being rebooted etc - which means downtime :|
>>>>
>>>> Thanks,
>>>>
>>>> On 9 September 2015 at 22:05, Simon Weller mailto:swel...@ena.com>> wrote:
>>>>
>>>>> Andrija,
>>>>>
>>>>> The Ceph snapshot deletion is not currently implemented.
>>>>>
>>>>> See: https://issues.apache.org/jira/browse/CLOUDSTACK-8302
>>>>>
>>>>> - Si
>>>>>
>>>>> 
>>>>> From: Andrija Panic mailto:andrija.pa...@gmail.com>>
>>>>> Sent: Wednesday, September 9, 2015 3:03 PM
>>>>> To: dev@cloudstack.apache.org
<mailto:dev@cloudstack.apache.org>; us...@cloudstack.apache.org
<mailto:us...@cloudstack.apache.org>
>>>>> Subject: ACS 4.5 - volume snapshots NOT removed from CEPH
(only from
>>>>> Secondaryt NFS and DB)
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> we enounter issue in ACS 4.5.1 (perhaps other versions also
affected) -
>>>>> when we delete some snapshot (volume snapshot) in ACS, ACS
marks it as
>>>>> deleted in DB, deletes from NFS Secondary Storage but fails
to delete
>>>>> snapshot on CEPH primary storage (doesn even try to delete
it AFAIK)
>>>>>
>>>>> So we end up having 5 live snapshots in DB (just example)
but actually
>> in
>>>>> CEPH there are more than i.e. 16 snapshots.
>>>>>
>>>>> More of the issue, when ACS agent tries to obtain list of
snapshots
>> from
>>>>> CEPH for some volume or so - if number of snapshots is over
16, it
>> raises
>>>>> exception  (and perhaps this is the reason Agent crashed for
us - need
>> to
>>>>> check with my colegues who are investigatin this in
details). This
>> number
>>>>> 16 is for whatever reasons hardcoded in ACS code.
>>>>>
>>>>> Wondering if anyone experienced this, or have any info - we
plan to
>> try to
>>>>> fix this, and I will inlcude my dev colegues here, but we
might need
>> some
>>>>> help at least for guidance or-
>>>>>
>>>>> Any help is really apreaciated or at list confirmation that
this is
>> known
>>>>> issue etc.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>>
>>>>> Andrija Panić
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Andrija Panić
>>>>
>>>
>>>
>>>
>>
>
>
>




--

Andrija Panić


--
---
Best regards
Dmytro Shevchenko
dshevchenko.m...@gmail.com
skype: demonsh_mk
+380(66)2426648



Re: ACS 4.5 - volume snapshots NOT removed from CEPH (only from Secondaryt NFS and DB)

2015-09-11 Thread Dmytro Shevchenko
Thanks a lot Wido! Any chance to find out why management server decided 
that it lost connection to agent after that exceptions? It's not so 
critical as this bug with 16 snapshots, but during last week we catch 
situation when Agent failed unprotect snapshot, rise exception and this 
is was a reason of disconnection a bit later after that. (It is not 
clear why CS decided remove that volume, it was template with one 'gold' 
snapshot with several active clones)


On 09/11/2015 03:20 PM, Wido den Hollander wrote:


On 11-09-15 10:19, Wido den Hollander wrote:


On 10-09-15 23:15, Andrija Panic wrote:

Wido,

could you folow maybe what my colegue Dmytro just sent ?


Yes, seems logical.


Its not only matter of question fixing rados-java (16 snaps limit)  - it
seems that for any RBD exception, ACS will freak out...


No, a RbdException will be caught, but the Rados Bindings shouldn't
throw NegativeArraySizeException in any case.

That's the main problem.


Seems to be fixed with this commit:
https://github.com/ceph/rados-java/commit/5584f3961c95d998d2a9eff947a5b7b4d4ba0b64

Just tested it with 256 snapshots:

---
  T E S T S
---
Running com.ceph.rbd.TestRbd
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 521.014 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

The bindings should now be capable of listing more then 16 snapshots.

You can build the bindings manually and replace rados.jar on your
running systems.

For 4.6 I'll try to get the updated rados-java included.

Wido


Wido


THx

On 10 September 2015 at 17:06, Dmytro Shevchenko <
dmytro.shevche...@safeswisscloud.com> wrote:


Hello everyone, some clarification about this. Configuration:
CS: 4.5.1
Primary storage: Ceph

Actually we have 2 separate bugs:

1. When you remove some volume with more then 16 snapshots (doesn't matter
destroyed or active - they always present on Ceph), on next storage garbage
collector cycle it invoke 'deletePhysicalDisk' from
LibvirtStorageAdaptor.java. On line 854 we calling list snapshots from
external rados-java library and getting exception.

https://github.com/apache/cloudstack/blob/4.5.1/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java#L854
This exception do not catching in current function, but Agent DO NOT CRASH
at this moment and continue working fine. Agent form proper answer to
server and send it, text in answer - java stack trace. Log from Agent side:

2015-09-10 02:32:35,312 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-4:null) Trying to fetch storage pool
33ebaf83-5d09-3038-b63b-742e759a
992e from libvirt
2015-09-10 02:32:35,431 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-4:null) Attempting to remove volume
4c6a2092-056c-4446-a2ca-d6bba9f7f
7f8 from pool 33ebaf83-5d09-3038-b63b-742e759a992e
2015-09-10 02:32:35,431 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-4:null) Unprotecting and Removing RBD snapshots of
image cloudstack-storage
/4c6a2092-056c-4446-a2ca-d6bba9f7f7f8 prior to removing the image
2015-09-10 02:32:35,436 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-4:null) Succesfully connected to Ceph cluster at
10.10.1.26:6789
2015-09-10 02:32:35,454 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-4:null) Fetching list of snapshots of RBD image
cloudstack-storage/4c6a2092
-056c-4446-a2ca-d6bba9f7f7f8
2015-09-10 02:32:35,457 WARN  [cloud.agent.Agent]
(agentRequest-Handler-4:null) Caught:
java.lang.NegativeArraySizeException
 at com.ceph.rbd.RbdImage.snapList(Unknown Source)
 at
com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)
 at
com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)
 at
com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.java:1206)
2015-09-10 02:32:35,458 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-4:null) Seq 1-1743737480722513946:  { Ans: , MgmtId:
90520739779588, via: 1, Ver: v1,
  Flags: 10,
[{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NegativeArraySizeException\n\tat
com.ceph.rbd.RbdImage.snapList(Unknown Sourc
e)\n\tat
com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.deletePhysicalDisk(LibvirtStorageAdaptor.java:854)\n\tat
com.cloud.hypervisor.kvm.storage.Lib
virtStoragePool.deletePhysicalDisk(LibvirtStoragePool.java:175)\n\tat
com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.deleteVolume(KVMStorageProcessor.j
ava:1206)\n\tat
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:124)\n\tat
com.cloud.storage.re.

so this volume and it snapshots never will be removed.


2. Second

Re: ACS 4.5 - volume snapshots NOT removed from CEPH (only from Secondaryt NFS and DB)

2015-09-17 Thread Dmytro Shevchenko
Nice work. I compiled and install new version into local maven 
repository, but now I can't compile Cloudstack with this library. I 
changed dependency version in pom file to new, but got next exception 
while compiling 'cloud-plugin-hypervisor-kvm':


Konsole output
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile 
(default-compile) on project cloud-plugin-hypervisor-kvm: Compilation 
failure: Co

mpilation failure:
[ERROR] Picked up JAVA_TOOL_OPTIONS: 
-javaagent:/usr/share/java/jayatanaag.jar
[ERROR] 
/home/dmytro.shevchenko/test/cloudstack/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java:[80,21] 
error: cannot find symbol

[ERROR] symbol:   class RadosException
[ERROR] location: package com.ceph.rados

After investigation I found that class RadosException was moved to 
Konsole output 'exceptions' subdirectory, but in 
LibvirtComputingResource.java it import as " import 
com.ceph.rados.RadosException;". Question is next, if I want compile 
some release version with this new changes, which way would be 
preferred? change import path LibvirtComputingResource.java and other 
places where this class is used to 
"com.ceph.rados.exceptions.RadosException"?



--
Best regards
Dmytro Shevchenko
dmytro.shevche...@safeswisscloud.com
skype: demonsh_mk



On 09/15/2015 03:11 PM, Wido den Hollander wrote:


On 15-09-15 13:56, Dmytro Shevchenko wrote:

Hello Wido, I saw you updated this code again. Maybe you know, what
procedure for rebuilding this library in Apache maven repository?
Because here http://repo.maven.apache.org/maven2/com/ceph/rados/ still
present only old 0.1.4 version and it's impossible recompile Cloudstack
with new patches.  Of cause we can download source code from Github,
compile it and replace 'jar' file on production, but this is dirty hack
and not acceptable for 'continues integration'.


It's up to me to do a new release of rados-java and I haven't done that
yet since I wanted to know for sure if the code works.

While writing some code for libvirt yesterday I came up with a better
solution for rados-java as well.

https://www.redhat.com/archives/libvir-list/2015-September/msg00458.html

For now you can replace 'rados.jar' on the production systems, but for
4.6 I want to make sure we depend on a new, to be released, version of
rados-java.

Wido


---
Best regards
Dmytro Shevchenko
dshevchenko.m...@gmail.com
skype: demonsh_mk


On 09/12/2015 06:16 PM, Wido den Hollander wrote:

On 09/11/2015 05:08 PM, Andrija Panic wrote:

THx a lot Wido !!! - we will patch this - For my understanding - is this
"temorary"solution - since it raises limit to 256 snaps ? Or am I
wrong ? I
mean, since we dont stil have proper snapshots removal etc, so after
i.e.
3-6months we will again have 256 snapshots of a single volume on CEPH ?


No, it will also work with >256 snapshots. I've tested it with 256 and
that worked fine. I see no reason why it won't work with 1024 or 2048
for example.


BTW we also have other exception, that causes same consequences - agent
disocnnecting and VMs going down...
As Dmytro explained, unprotecting snapshot causes same consequence...

  From my understanding, any RBD exception, might cause Agent to
disconnect
(or actually mgmt server to disconnect agent)...

Any clue on this, recommendation?


No, I don't have a clue. It could be that the job hangs somewhere inside
the Agent due to a uncaught exception though.


Thx a lot for fixing rados-java stuff !


You're welcome!

Wido


Andrija

On 11 September 2015 at 15:28, Wido den Hollander
wrote:


On 11-09-15 14:43, Dmytro Shevchenko wrote:

Thanks a lot Wido! Any chance to find out why management server
decided
that it lost connection to agent after that exceptions? It's not so
critical as this bug with 16 snapshots, but during last week we catch
situation when Agent failed unprotect snapshot, rise exception and
this
is was a reason of disconnection a bit later after that. (It is not
clear why CS decided remove that volume, it was template with one
'gold'
snapshot with several active clones)


No, I didn't look at CS at all. I just spend the day improving the
RADOS
bindings.

Wido


On 09/11/2015 03:20 PM, Wido den Hollander wrote:

On 11-09-15 10:19, Wido den Hollander wrote:

On 10-09-15 23:15, Andrija Panic wrote:

Wido,

could you folow maybe what my colegue Dmytro just sent ?


Yes, seems logical.


Its not only matter of question fixing rados-java (16 snaps limit)
- it
seems that for any RBD exception, ACS will freak out...


No, a RbdException will be caught, but the Rados Bindings shouldn't
throw NegativeArraySizeException in any case.

That's the main problem.


Seems to be fixed with this commit:


https://github.com/ceph/rados-java/co

[GitHub] cloudstack pull request: CLOUDSTACK-8302: Removing snapshots on RB...

2016-04-25 Thread dmytro-shevchenko
Github user dmytro-shevchenko commented on the pull request:

https://github.com/apache/cloudstack/pull/1230#issuecomment-214497874
  
Rebase with Master done, pom.xml file updated. 
Also I perform a small modification in code, during testing I found one 
issue: in 'snapshot_store_ref' table all snapshots from one volume was linked 
between each other as Parent->Child using field 'parent_snapshot_id'. If you 
removing one of previous snapshot and wait for 'storage.cleanup.interval' 
period,  it lead to NullPointerException when you creating new snapshot, 
because Cloudstack trying to build all this snapshot relations before. Before 
this patch this field was always set to '0' (no parent). From Cloudstack point 
of view all snapshots on Ceph not connected (Ceph care about this on his own 
level). 
So, in file 
engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/XenserverSnapshotStrategy.java:
 
I moved this block:
`SnapshotDataStoreVO snapshotDataStoreVO = 
snapshotStoreDao.findByStoreSnapshot(primaryStore.getRole(), 
primaryStore.getId(), snapshot.getId());
if (snapshotDataStoreVO != null) {
snapshotDataStoreVO.setParentSnapshotId(0L);
snapshotStoreDao.update(snapshotDataStoreVO.getId(), 
snapshotDataStoreVO);
}`
from the condition: ...primaryStore).getPoolType() != StoragePoolType.RBD
and it will be executed in any way, as previously. Please review this part 
of code, if this is good solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Domain listing takes too much time

2017-01-27 Thread Dmytro Shevchenko

Hi All,

I've noticed that domain listing become too long in new releases, at 
least it related with "listDomains" command. Comparing between 4.5 and 
4.8, 4.9 releases (near 200 domains in list):



4.5.1:
# time cloudmonkey list domains filter=name | grep name | wc -l
212
real0m0.804s
user0m0.257s
sys0m0.059s


4.8.0.1 and 4.9:
# time cloudmonkey list domains filter=name | grep name | wc -l
203
real0m11.478s
user0m0.203s
sys0m0.047s

Less that 1 second in 4.5 and near 10 seconds  since 4.8. This is not 
Cloudmonkey issue, same bug can be reproduced via GUI, when you are 
trying open search form in Instances, domain list is empty for ~10sec, 
that is confusing users. Screenshot: 
https://s29.postimg.org/66d7le9gn/domain_bug.png.
After small investigation I found that DB query finished almost 
immediately, but problem in this loop and call 
"ApiDBUtils.newDomainResponse", it consume to much time for each domain 
(4.9.2.0 version): 
https://github.com/apache/cloudstack/blob/4.9.2.0/server/src/com/cloud/api/query/ViewResponseHelper.java#L353


Any ideas how to fix this correctly?


Best regards,
Dmytro



No removed date for templates after account cleanup

2017-02-18 Thread Dmytro Shevchenko

Hi all,

Is this a bug when you removing account, all templates marked as 
Inactive, but 'removed' column = NULL in vm_template table?


CS version: 4.8.0.1

--
Best regards
Dmytro Shevchenko