Hi Kern,

Thank you for the info!  We're using MySQL 5.6 Percona Server, Release
68.0, Revision 656.

Would this setting cause the problem?
innodb_lock_wait_timeout = 100

Is it too high or too low or has no bearing on the problem?

Thanks again,
-craig


On Thu, Aug 6, 2015 at 9:26 AM, Kern Sibbald <k...@sibbald.com> wrote:

> On 06.08.2015 18:46, Bryn Hughes wrote:
>
> I think what Kern is getting at is that your database is what threw the
> error, not Bacula.  Whatever DB you are using is what is having the issue.
>
>
> Yes.  That is exactly what I was implying.
>
> The rest of this is directed to Craig:
> If you are using MariaDB (I have no indication that you are), please be
> aware that it may be a very good database, maybe even better than MySQL,
> but Bacula is built and tested against MySQL, and if you use binaries that
> were built for MySQL, you could run into problems by using MariaDB.  Even
> if your binaries were explicitly built with MariaDB, it may not be
> compatible with the way Bacula works.  Bacula has a tendency to push
> databases to the extreme, and it works well with MySQL and PostgreSQL, but
> possibly not with other databases.  I bring up MariaDB because it has been
> mentioned in another posting to this list.
>
> I would be very surprised if your problem has anything to do with Accurate
> -- the database routines know nothing about accurate and none of the data
> is different.  It is more likely due to the VM environment or to some build
> or version problem with MySQL (or MariaDB).
>
> Best regards,
> Kern
>
>
> Bryn
>
> On 2015-08-06 09:11 AM, Craig Shiroma wrote:
>
> Hi Kern,
>
> Thank you very much for the reply!  Would you have any suggestions on what
> may be causing this problem or how I can debug it?  Obviously, I'm
> encountering deadlocks when accurate backup runs on some of our hosts and
> we want to use accurate backup on all of our hosts if possible.
>
> Warmest regards,
> -craig
>
> On Thu, Aug 6, 2015 at 12:11 AM, Kern Sibbald <k...@sibbald.com> wrote:
>
>> On 06.08.2015 10:15, Craig Shiroma wrote:
>>
>> Hello again,
>>
>> I just thought I'd update this post with more information in hopes of
>> getting some explanation for the deadlocks.
>>
>> I ran with Accurate backup on our test VMs (RHEL) for a couple of days
>> and got the same errors on some VMs that were running accurate and some
>> that were not.  These hosts were running concurrently.  I would say 90% of
>> the hosts that were configured to use Accurate finished successfully.
>> However, there were a few that failed with the deadlock error -- some that
>> were configured to use accurate and some that were not configured to use
>> accurate.  Also, on all of these, a second job started for each of the
>> affected hosts right after Bacula detected the deadlock even though it said
>> a reschedule would happen 3600 seconds later (the 3600 seconds is correct).
>>
>> Tonight, I disabled accurate on all hosts and the deadlocks did not
>> happen.  No errors were detected and all the backups finished successfully.
>>
>> Some questions...
>> 1.  Can I back up multiple hosts concurrently with some hosts configured
>> to use accurate and some configured not to use accurate?  Or, is it an all
>> or none thing, meaning all hosts that run concurrently must either be using
>> accurate backup or not using accurate backup (cannot mix the two)?
>>
>> 2. It seems like the hosts that get out of the starting gate first are
>> the ones affected.  I am configured to run 50 jobs concurrently.  Again, no
>> problems with accurate turned off on all hosts for months now.
>>
>> 3. Why is Bacula spinning off a new job right away after it detects the
>> deadlock for each affected job instead of waiting until the rescheduled job
>> runs?  I verified that there were no duplicate jobs in the queue before the
>> backups started running, no jobs were running before the start of the
>> backups, and I did not start any of these backups manually to cause a
>> second job to appear.
>>
>>
>> Bacula is not aware of any SQL internal deadlocks.
>>
>>
>> From the INNODB Monitor output:
>>
>> TRANSACTION:
>> TRANSACTION 208788977, ACTIVE 1 sec setting auto-inc lock
>> mysql tables in use 4, locked 4
>> 9 lock struct(s), heap size 1184, 5 row lock(s)
>> MySQL thread id 50808, OS thread handle 0x7f8f2c3b4700, query id 29558637
>> <host> 192.168.10.99 bacula Sending data
>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
>> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
>> Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN
>> Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
>> Filename.Name)
>> WAITING FOR THIS LOCK TO BE GRANTED:
>> TABLE LOCK table `bacula`.`File` trx id 208788977 lock mode AUTO-INC
>> waiting
>> WE ROLL BACK TRANSACTION (2)
>>
>> I am running Bacula 7.0.5 on RHEL 6.6 x64 with Director, Storage and
>> Catalog running on separate RHEL 6.6 hosts.  Our clients are RHEL 6's, 5's
>> and Windows Servers 2008 and 2012R2.
>>
>> Any help would be much appreciated.
>>
>> Warmest regards,
>> -craig
>>
>> On Tue, Aug 4, 2015 at 1:56 PM, Craig Shiroma <shiroma.crai...@gmail.com>
>> wrote:
>>
>>> BTW, I suppose there could've been two jobs for the host(s) in
>>> scheduling queue.  If this was the case, is there a way to find out after
>>> the fact?  If this did actually happen, what could cause duplicate jobs to
>>> be scheduled on the same day at the same time?  I know no one manually ran
>>> the jobs in question.  Again, this only was a problem for a few of the jobs
>>> that ran last night, not all of them and some to do accurate backup and
>>> some not.
>>>
>>> Regards,
>>> -craig
>>>
>>> On Tue, Aug 4, 2015 at 9:27 AM, Craig Shiroma <shiroma.crai...@gmail.com
>>> > wrote:
>>>
>>>> Hello,
>>>>
>>>> I had a few backups fail last night with the following error:
>>>>
>>>> 2015-08-03 18:02:46bacula-dir JobId 123984: b INTO File (FileIndex,
>>>> JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex,
>>>> batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5,
>>>> batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN
>>>> Filename ON (batch.Name = Filename.Name): ERR=Deadlock found when trying to
>>>> get lock; try restarting transaction
>>>>
>>>> The only thing I did yesterday was switch a bunch of backups to use
>>>> Accurate backup and restart bacula-dir and bacula-sd after that.  However,
>>>> the above problem also occurred on some hosts that was not set to use
>>>> Accurate backup.  From the log, it seems like two jobs for this host was
>>>> scheduled to run at 18:00 because the second job started and found a
>>>> duplicate job (job 123984) and canceled the backup.  I know there were no
>>>> jobs running before 18:00 so 123984 was not an old job still running.  Same
>>>> with the other jobs that were canceled because of the above situation.
>>>>
>>>> Anyway, does anyone have an idea what would cause this, especially how
>>>> the second job got shot into the system.  After the deadlock error, Bacula
>>>> said it would reschedule the job.  However the second job started right
>>>> after the deadlock error instead of one hour later which makes me think
>>>> that there were two jobs for this host scheduled to run at 18:00.
>>>>
>>>> Thank you in advance,
>>>> -craig
>>>>
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Bacula-users mailing 
>> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Bacula-users mailing 
> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Bacula-users mailing 
> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to