Re: [Bacula-users] Doubts about Bacula

Radosław Korzeniewski Tue, 23 Apr 2019 00:28:32 -0700

Hello,

pt., 19 kwi 2019 o 20:25 Heitor Faria <hei...@bacula.com.br> napisał(a):


> Hello Radoslaw,
>
> Hello,
>
> pt., 19 kwi 2019 o 13:28 Heitor Faria <hei...@bacula.com.br> napisał(a):
>
>> Hello Radoslaw,
>>
>> Speaking of Bacula HA, I've been deploying a scenario with relative
>>> success.
>>> Primary Director & SD have copy jobs routines to a Secondary Remote SD
>>> that also has an independent working Director.
>>
>>
>> It sounds to me as a Disaster Recovery solution and absolutely no High
>> Availability.
>>
>> Is there any difference?
>>
>
> The difference is HUGE!!!!
>
>
>> For me there are two Disaster Recovery categories, Backup and
>> Replication. HA falls in the second category.
>>
>
> Disaster Recovery is a part of more general Business Continuity Plan. BCP
> describes what to do when something wrong happens to our business and
> consist of a number of procedures and performances executed in hard times.
> DR focus on recovery only.
> What is a disaster? Do a single disk failure is a disaster? Do a single
> network adapter or single server or single rack failures are disasters? Do
> a single Datacenter failure is a disaster? And what are availability
> levels? How does it compares?
>
> We were discussing concepts, used by Dell/EMC Certification and the best
> scientific literacture on the topic.
>

I'm using concepts from Veritas (i.e. Resilience Enterprise) and its
Certification, so .... :)
Update: I checked linked paper and it uses concepts I see as a disaster
recovery solution and what a surprise it names it as Disaster Recovery...
(check below)


> I don't see how policies, use cases or plans affect that.
> Anyway, having director redundancy, as in the original proposal, allows
> Backup and Restore Services HA,
>

Yes, the HA is different then DR. Thank you.


> since both would be almost always online (even lacking the failed running
> jobs redistribution, as pointed by Dimitri).
>
> First of all a backup is one of the services managed by any IT
> departments. So as a service it should run without problems and maintain a
> good availability level. Just take a look for maintaining Oracle RDBMS with
> the best backup and recovery solution using Bacula Oracle SBT Plugin. With
> this plugin you can setup a two kinds of backups: online database files
> backup and archived logs backups. Together allow for perfect
> Point-In-Time-Recovery. The first one can be executed once a day, once a
> week, etc. but the second one should be executed as frequent as it is
> possible to maintain the best RPO possible.
>
> I see this as the Disaster Recovery levels or dimensions [T. Wood, E.
> Cecchet, K. K. Ramakrishnan, P. J. Shenoy, J. E. van der Merwe, and A.
> Venkataramani, “Disaster Recovery as a Cloud Service: Economic Benefits &
> Deployment Challenges.,” *HotCloud*, vol. 10, pp. 8–15, 2010.]:
>

I checked this paper and it prove my point of view on what DR is and what
is HA... in every single word. In a few minutes I thought that all I
learned about High Availability and Disaster Recovery in my >20 years of
Enterprise experience was redefined backwards. :) I see, not yet.

What I see in your post: every time you describe a great DR solution you
does not name it DR but you name it HA which is not true.

"Speaking of Bacula HA, I've been deploying a scenario with relative
success.
Primary Director & SD have copy jobs routines to a Secondary Remote SD..."


>
> Data level: Security of application data
> System level: Reducing recovery time as short as possible
> Application level: Application continuity
>


> To achieve this you have to maintain a backup service as highly available
> as possible with eliminating SPOF (single point in failure). For above
> breakdowns you have to multiple components, i.e. bring two network
> adapters, create a RAID, create a cluster, put every cluster node in a
> separate rack, etc. All this allow you to achieve a High Availability
> service with zero data loss in case of failover. For Datacenter it is
> always a different story! If you need to failover a datacenter then you
> always lost your data! This is because Bacula replication is asynchronous,
> so it is not possible to have up to date archives on both sides at any
> given time.
>
> You will always have a lag. On the other hand, you can implement a block
> level replication which could be synchronous, but this kind of solution do
> not work with tapes and when synchronous it has a huge impact on
> performance. In most cases synchronous block level replication on large
> scale and long distances requires a lot of cash! Synchronous block level
> replication should never be used as a part of Backup DR solution, because a
> single block corruption can leads to whole filesystem corruption and lost
> of archive volumes! So, back to asynchronous Bacula replication - did I
> mention it will create a lag, so your RPO > 0. :)
>
> This is true for most recent backups, but there are ways of mitigating
> this (redundant jobs, simultaneous backup to two different jobs (if ever
> developed)).
> Syncronous or Asyncronous replication will always have = 1 RPO, the only
> difference is the data outdating.
>

I see we have a very different dictionary here, so we cannot get the same
conclusion. In my dictionary RPO = Recovery Point Objective means at what
Point-In-Time I can recover my data. It is a shame that in a such strict
science as IT on two different world locations using the same language we
have a such difference in words and statements meanings.

I can cite the RPO definition used in linked paper:

*"Recovery Point Objective (RPO): The RPO of a DR system represents the
point in time of the most recent backup prior to any failure. The necessary
RPO is generally a business decision—for some applications absolutely no
data can be lost (RPO=0), requiring continuous synchronous replication to
be used, while for other applications, the acceptable data loss could range
from a few seconds to hours or even days."*

As I understand it: RPO defines the acceptable data loss (data outdating)
in any DR system and it can range from RPO=0 for continuous synchronous
replication solution up to a few seconds, hours or even days for others.

> In any HA solution you would assure that your services are running the
>> highest uptime possible and this kind of solution in most cases is
>> implemented with clusters. In this case you can loose currently uncommitted
>> data (running jobs) but your services are ready to proceed next jobs as
>> soon as possible.
>>
>> I disagree a little bit. Replication purpose is provide the better
>> possible RTO.
>>
>
> So, lets compare:
> Shared storage Cluster HA: RPO - no data loss; RTO - automatic failover,
> seconds from failure detection to recovery;
> Asynchronous Replication in Bacula: RPO - hours, minutes the best, in most
> cases single day; RTO - manual switchover - hours;
>
> Disagree. Director redudancy provides near zero RTO to the backup and
> restore service.
>

Why do you disagree to the same statement?
- RTO - automatic failover, seconds from failure detection to recovery;
vs.
- ... near zero RTO to the backup and restore service;

In my opinion near zero falls into seconds timeline, no?

I can cite the RTO definition used in linked paper:
*"Recovery Time Objective (RTO): The RTO is an orthogonal business decision
that speciﬁes a bound on how long it can take for an application to come
back online after a failure occurs."*


> High Availability solutions focus on Service levels and are not designed
> to handle disasters.
>
> A power supply failis a disaster. =)
>

SPOF? If you design a critical service then you have to follow the path of
avoiding SPOF. Single power supply failed? Not a problem as we have a
redundant power supply. So it is not a disaster at all. You can manage it
transparently. The same applies to other hardware components.

> Disaster Recovery solutions focus on disasters and are not designed for
> fast and easy backup service switchover. Different solutions for different
> purposes. The Enterprise want them both!
>
> For obvious reasons, Bacula cannot re-distribute a failed backup job yet
>> (perhaps never will), but I don't think it is necessarelly a problem for
>> Replication.
>>
>> HA implementation in Bacula is extremely straightforward when using a
>> shared storage clustering solution.
>>
>>
>>> Both director can access the Secondary SD.
>>> An Admin Job with a Shell Script daily bscans all volumes in to the
>>> Secondary Director and its catalog.
>>> All bscanned volumes comes with the Archived status, so they are
>>> basically Read-Only.
>>> Advantage: you can restore jobs from both environments, any time. =>
>>> http://bacula.us/bacula-server-and-backups-replication-for-high-availability/
>>> Perhaps, a "bscan all" bconsole command would be a nice feature to sync
>>> all disk based volumes to catalog and improve the proccess a little bit
>>> more.
>>>
>>
>> This is a Disaster Recovery solution. A One-Way Failover. :)
>>
>> Pot8to, potato. =)
>> From Bacula perspective that's what we have today.
>>
>
> What?
> When you implement Bacula in the shared storage cluster, you can failover
> backup service from node to node in any direction in just a seconds.
>
> You will have running backups outdate anyway.
>

Yes! But any few minutes in the future scheduled backups can execute
without a problem. You have to restart backups run during failure only. A
lot of enterprise customers value this kind of solution.


>
> Radoslaw: of course my proposal doesn't work for all case scenarios - far
> from that. It is conceptual and provocative.
>
Bscan needs to be improved to have an option to skip already synched
> volumes option (perhaps a volume metadata hash comparison? don't know).
> Also, Volume names wildcard or any way to easily select multiple volumes,
> maybe even allow bscan to be called from bconsole.
>

Heitor - I'm not criticizing your proposal. I do point your mistake about
what DR is and what is HA. Thats all.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Doubts about Bacula

Reply via email to