[ovirt-users] Re: Scheduling a Snapshot of a Gluster volume not working within Ovirt

Mark Betham Tue, 15 May 2018 00:22:03 -0700

Hi Sahina,

Many thanks for your response.


I have now raised a bug against this issue.  For your reference it is bug 
#1578257 - https://bugzilla.redhat.com/show_bug.cgi?id=1578257 
<https://bugzilla.redhat.com/show_bug.cgi?id=1578257>

I will enable debuging today as requested and attach the logs to the bug report.

Many thanks,

Mark Betham


> On 14 May 2018, at 12:34, Sahina Bose <[email protected]> wrote:
> 
> 
> 
> On Mon, May 14, 2018 at 4:07 PM, Mark Betham <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Sahina,
> 
> Many thanks for your response and apologies for my delay in getting back to 
> you.
> 
> 
>> How was the schedule created - is this using the Remote Data Sync Setup 
>> under Storage domain?
> 
> 
> Ovirt is configured in ‘Gluster’ mode, no VM support.  When snapshotting we 
> are taking a snapshot of the full Gluster volume.
> 
> To configure the snapshot schedule I did the following;
> Login to Ovirt WebUI
> From left hand menu select ‘Storage’ and ‘Volumes'
> I then selected the volume I wanted to snapshot by clicking on the link 
> within the ‘Name’ column
> From here I selected the ‘Snapshots’ tab
> From the top menu options I selected the drop down ‘Snapshot’
> From the drop down options I selected ‘New’
> A new window appeared titled ‘Create/Schedule Snapshot’
> I entered a snapshot prefix and description into the available fields and 
> selected the ‘Schedule’ page
> On the schedule page I selected ‘Minute’ from the ‘Recurrence’ drop down
> Set ‘Interval’ to every ’30’ minutes
> Changed timezone to ‘Europe/London=(GMT+00:00) London Standard Time’
> Left value in ‘Start Schedule by’ at default value
> Set schedule to ‘No End Date’
> Click 'OK'
> 
> Interestingly I get the following message on the ‘Create/Schedule Snapshot’ 
> page before clicking on OK;
> Frequent creation of snapshots would overload the cluster
> Gluster CLI based snapshot scheduling is enabled. It would be disabled once 
> volume snapshots scheduled from UI.
> 
> What is interesting is that I have not enabled 'Gluster CLI based snapshot 
> scheduling’.
> 
> After clicking OK I am returned to the Volume Snapshots tab.
> 
> From this point I get no snapshots created according to the schedule set.
> 
> At the time of clicking OK in the WebUI to enable the schedule I get the 
> following in the engine log;
> 2018-05-14 09:24:11,068Z WARN  
> [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (default task-128) 
> [85d0b16f-2c0c-464f-bbf1-682c062a4871] The message key 
> 'ScheduleGlusterVolumeSnapshot' is missing from 'bundles/ExecutionMessages'
> 2018-05-14 09:24:11,090Z INFO  
> [org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] 
> (default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Before acquiring 
> and wait lock 
> 'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
>  sharedLocks=''}'
> 2018-05-14 09:24:11,090Z INFO  
> [org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] 
> (default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Lock-wait acquired 
> to object 
> 'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
>  sharedLocks=''}'
> 2018-05-14 09:24:11,111Z INFO  
> [org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] 
> (default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Running command: 
> ScheduleGlusterVolumeSnapshotCommand internal: false. Entities affected :  
> ID: 712da1df-4c11-405a-8fb6-f99aebc185c1 Type: GlusterVolumeAction group 
> MANIPULATE_GLUSTER_VOLUME with role type ADMIN
> 2018-05-14 09:24:11,148Z INFO  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] EVENT_ID: 
> GLUSTER_VOLUME_SNAPSHOT_SCHEDULED(4,134), Snapshots scheduled on volume 
> glustervol0 of cluster NOSS-LD5.
> 2018-05-14 09:24:11,156Z INFO  
> [org.ovirt.engine.core.bll.gluster.ScheduleGlusterVolumeSnapshotCommand] 
> (default task-128) [85d0b16f-2c0c-464f-bbf1-682c062a4871] Lock freed to 
> object 
> 'EngineLock:{exclusiveLocks='[712da1df-4c11-405a-8fb6-f99aebc185c1=GLUSTER_SNAPSHOT]',
>  sharedLocks=''}'
> 
>> Could you please provide the engine.log from the time the schedule was setup 
>> and including the time the schedule was supposed to run?
> 
> 
> The original log file is no longer present, so I removed the old schedule and 
> created a new schedule, as per the instructions above, earlier today.  I have 
> therefor attached the engine log from today.  The new schedule, which was set 
> to run every 30 minutes, has not produced any snapshots after around 2 hours.
> 
> Please let me know if you require any further information.
> 
> 
> I see the following messages in logs: 
> 2018-05-14 04:30:00,018Z ERROR [org.ovirt.engine.core.utils.timer.JobWrapper] 
> (QuartzOvirtDBScheduler9) [d0c31a9] Failed to invoke scheduled method 
> onTimer: null
> 
> Can you log a bug - and we will dig into this further.
> 
> To speed thing up, if you could enable debug logs (I think using 
> https://www.ovirt.org/develop/developer-guide/engine/engine-development-environment/#enable-debug-log---restart-required
>  
> <https://www.ovirt.org/develop/developer-guide/engine/engine-development-environment/#enable-debug-log---restart-required>)
>  , and attach the exception that would help a lot
> 
> 
> Many thanks,
> 
> Mark Betham.
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>> 
>> 
>> On Thu, May 3, 2018 at 4:37 PM, Mark Betham <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Ovirt community,
>> 
>> I am hoping you will be able to help with a problem I am experiencing when 
>> trying to schedule a snapshot of my Gluster volumes using the Ovirt portal.
>> 
>> Below is an overview of the environment;
>> 
>> I have an Ovirt instance running which is managing our Gluster storage.  We 
>> are running Ovirt version "4.2.2.6-1.el7.centos", Gluster version 
>> "glusterfs-3.13.2-2.el7" on a base OS of "CentOS Linux release 7.4.1708 
>> (Core)", Kernel "3.10.0 - 693.21.1.el7.x86_64", VDSM version 
>> "vdsm-4.20.23-1.el7.centos".  All of the versions of software are the latest 
>> release and have been fully patched where necessary.
>> 
>> Ovirt has been installed and configured in "Gluster" mode only, no 
>> virtualisation.  The Ovirt platform runs from one of the Gluster storage 
>> nodes.
>> 
>> Gluster runs with 2 clusters, each located at a different physical site (UK 
>> and DE).  Each of the storage clusters contain 3 storage nodes.  Each 
>> storage cluster contains a single  gluster volume.  The Gluster volume is 3 
>> * Replicated.  The Gluster volume runs on top of a LVM thin vol which has 
>> been provisioned with a XFS filesystem.  The system is running a Geo-rep 
>> between the 2 geo-diverse clusters.
>> 
>> The host servers running at the primary site are of specification 1 * 
>> Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz (8 core with HT), 64GB Ram, LSI 
>> MegaRAID SAS 9271 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise 
>> drives configured in a RAID 10 array to give 6.52TB of useable space.  The 
>> host servers running at the secondary site are of specification 1 * Intel(R) 
>> Xeon(R) CPU E3-1271 v3 @ 3.60GHz (8 core with HT), 32GB Ram, LSI MegaRAID 
>> SAS 9260 with bbu and cache, 8 * SAS 10K 2.5" 1.8TB enterprise drives 
>> configured in a RAID 10 array to give 6.52TB of useable space.  The 
>> secondary site is for DR use only.
>> 
>> When I first starting experiencing the issue and was unable to resolve it, I 
>> carried out a full rebuild from scratch across the two storage clusters.  I 
>> had spent some time troubleshooting the issue but felt it worthwhile to 
>> ensure I had a clean platform, void of any potential issues which may be 
>> there due to some of the previous work carried out.  The platform was 
>> rebuilt and data re-ingested.  It is probably worth mentioning that this 
>> environment will become our new production platform, we will be migrating 
>> data and services to this new platform from our existing Gluster storage 
>> cluster.  The date for the migration activity is getting closer so available 
>> time has become an issue and will not permit another full rebuild of the 
>> platform without impacting delivery date.
>> 
>> After the rebuild with both storage clusters online, available and managed 
>> within the Ovirt platform I conducted some basic commissioning checks and I 
>> found no issues.  The next step I took at this point was to setup the 
>> Geo-replication.  This was brought online with no issues and data was seen 
>> to be synchronised without any problems.  At this point the data 
>> re-ingestion was started and the new data was synchronised by the 
>> Geo-replication.
>> 
>> The first step in bringing the snapshot schedule online was to validate that 
>> snapshots could be taken outside of the scheduler.  Taking a manual snapshot 
>> via the OVirt portal worked without issue.  Several were taken on both 
>> primary and secondary clusters.  At this point a schedule was created on the 
>> primary site cluster via the Ovirt portal to create a snapshot of the 
>> storage at hourly intervals.  The schedule was created successfully however 
>> no snapshots were ever created.  Examining the logs did not show anything 
>> which I believed was a direct result of the faulty schedule but it is quite 
>> possible I missed something.
>> 
>> How was the schedule created - is this using the Remote Data Sync Setup 
>> under Storage domain?
>> 
>> 
>> I reviewed many online articles, bug reports and application manuals in 
>> relation to snapshotting.  There were several loosely related support 
>> articles around snapshotting but none of the recommendations seemed to work. 
>>  I did the same with manuals and again nothing that seemed to work.  What I 
>> did find were several references to running snapshots along with 
>> geo-replication and that the geo-replication should be paused when creating. 
>>  So I removed all existing references to any snapshot schedule, paused the 
>> Geo-repl and recreated the snapshot schedule.  The schedule was never 
>> actioned and no snapshots were created.  Removed Geo-repl entirely, remove 
>> all schedules and carried out a reboot of the entire platform.  When the 
>> system was fully back online and no pending heal operations the schedule was 
>> re-added for the primary site only.  No difference in the results and no 
>> snapshots were created from the schedule.
>> 
>> I have now reached the point where I feel I require assistance and hence 
>> this email request.
>> 
>> If you require any further data then please let me know and I will do my 
>> best to get it for you.
>> 
>> Could you please provide the engine.log from the time the schedule was setup 
>> and including the time the schedule was supposed to run?
>> 
>> 
>> 
>> Any help you can give would be greatly appreciated.
>> 
>> Many thanks,
>> 
>> Mark Betham
>> 
>> _______________________________________________
>> Users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.ovirt.org/mailman/listinfo/users 
>> <http://lists.ovirt.org/mailman/listinfo/users>
>> 
>> 
> 
> 
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:

[ovirt-users] Re: Scheduling a Snapshot of a Gluster volume not working within Ovirt

Reply via email to