Firstly, thanks for the response. I appreciate your time to feedback to
me.

On Tue, 2011-02-08 at 18:44 +0100, Eric Bollengier wrote:
> Hello Matthew,
> 
> Le jeudi 20 janvier 2011 15:23:21, Matthew Ife a écrit :
> > The patch below adds native support for quotas in bacula 5.0.3.
> 
> Thanks for your interest, this feature is really interesting for big hosting 
> providers.
> 
> > We resell bacula and require a method to prevent resource abuse of the
> > service we provide. We have used another method to do this in the past
> > (Max Volume Jobs and Max Volume Bytes). However, this solution does not
> > scale to our needs, leads to avoidable delays in backups due to waiting
> > for volumes to become free and is difficult to administratively manage.
> 
> An other quick way to handle this is to have a RunScript that checks for the 
> quota when the job starts, and prevents or not the job to run. Perhaps more 
> flexible, but not so well integrated.

That might of saved me a couple of weekends of coding if I had thought
of that :). I suspect because I was focused on the problem of what to do
when a job runs this was never something I considered palatable.

> 
> > Personally I believe volume management should not be responsible for this
> > type of work, and that it belongs in job management. Thus the patch below
> > provides this functionality.
> > 
> > Quotas attempt to mostly emulate the functionality of filesystem quotas,
> > providing soft and hard limits with grace times.
> > 
> > I've tried to document where I thin necessary. I would like the patch to be
> > looked into and criticized - I am not a developer by trade and imagine
> > there is some things that could be done better! I am open to suggestions.
> 
> No problem, you code is clean (next time, I will appreciate a git diff or
> diff -Naur which are easier to read)
> 

I'll bear that in mind when I look at changes.

> > A "Quota" is determined by the sum total of all JobBytes values within the
> > JobRetention period. Thus increasing your job retention increases the
> > scope for which quota is evaluated. In addition deleting or purging your
> > jobs has the effect to modifying the reported quota value.
> > 
> > Quotas are checked at two spots. The first is when the job is started.
> > Quotas are checked and if you exceed quota your job is terminated before
> > connections to bacula-fd are initialized.
> 
> Ok, nice.
> 
> > The second place they are
> > checked is in the Job Monitor job. Every minute those jobs which have
> > quotas enabled are checked again for their quotas against their running
> > jobs.
> 
> 
> I don't like the idea to spawn a storage daemon connexion every minutes for 
> every jobs in the watchdog, this is a big overhead, what are the benefits to 
> check the quota during the job running time?
> I can understand that it's more "strict", but the cost associated seems to be 
> very high for a very small benefit. For example, you can have cases where the 
> job will be almost completed and you will cancel it for few bytes and have to 
> restart the whole job later, this job will use resources and won't be usable 
> for restore (directly).

Well, its this only occurs on those that have quota enabled. But I agree
doing this incurs a lot of cost in complexity of implementation. 

I've tried to make this as passive as possible. The SD connection needed
to do job status doesn't signal a fatal end if the connection does not
work. Rather it returns without updating the JobBytes value. What this
does do though is potentialy fill up the max-sd connections, although
the worst case I can envision is a job being delayed from starting
because of it.

I like the idea of performing this check in case, for example; the job
being ran is a full - your quota is almost full and the new job run now
takes up 200 extra gigs of space.

Is the idea of perhaps adding a per-client config option to enable
in-job quota checks (perhaps being ran every five minutes instead) more
digestable, or is the scheme fundamentally problematic?

> 
> If you absolutely don't want to use more resources than expected, you can 
> probably estimate the size of the next job by looking previous job in the 
> catalog.

From a hosted provider experience I am not sure explaining to one of our
customers that we didnt take a backup because it *might* be over is a
very tenable position for us to be in really. Other suggestions like
perhaps doing something with an estimate job might work though..

> 
> 
> > This is done by taking the sum total of all jobs so far, plus the
> > bytes value (as reported by the storage daemon for that job) and
> > performing the quota check against that.
> > 
> > To enable this facility in the Client resource of a bacula config file the
> > following four items have been introduced.
> > 
> > Hard Quota - (takes a byte size) The absolute ceiling limit of size a total
> > quota can be. Soft Quota   - (takes a byte size) The limit of quota you
> > can have once you have exceeded your grace period. Soft Quota Grace Period
> > - (takes a time period) The amount of grace period time you are permitted
> > before soft quotas are enforced. Strict Quotas - (takes yes or no) This
> > changes the behavior of quotas. When in the soft quota you can 'burst' up
> > to the hard limit or grace time (as per filesystem quotas). When strict is
> > off (the default) the client ends up receiving the total quota they
> > bursted up to as their new soft quota. If strict is turned on, then the
> > true soft quota is enforced the next time a backup is attempted to be ran.
> 
> Sorry, I'm a bit lost with the "Strict Quotas" keyword, can you explain it 
> again with some examples ?

Yes, effectively this is a throwback to something we do as a company. So
it might end up being entirely superfluous and be removed.

A client has a 50G soft quota, 200G hard quota. A 7 day job retention
and a 2 day grace period.

Date  Type  Size  Note
1st     F   45G   (45 total) In quota
2nd     I   10G   55G Warning issued, over softquota. Grace day 1.
3rd     I   15G   70G Warning issued, over softquota. Grace day 2.

With strict quotas off..

4th     I   15G   0G  Backup failed. Their permitted quota is now a
maximum of 70G.

With strict quotas on.

4th     I   15G   0G  Backup failed. Their permitted quota is now a
maximum of 50G.


So with strict quotas on, just like standard filesystem quotas once
grace has expired you must not exceed your soft quota. With strict
quotas off once grace expires you must not exceed the total quota you
used until your grace expired. Hope this helps. I'm happy to remove this
as an inclusion if it simply doesnt really have any effective benefit. I
agree its scope is quite limited.

Perhaps 'Strict Quotas' is a more relevent term for permitting SD checks
of quotas such as I mentioned above? :)


> 
> > In addition, I added a "purge -> quotas" option in bconsole to reset quotas
> > (this effectively resets the grace time so exceeding soft quota starts the
> > grace timer again).
> 
> Ok
> 
> > Onto the code. Now, I'm certainly open to criticisms and suggestions on how
> > to improve this. There are a few things I consciously did which might need
> > to change.
> 
> Your code is rather clean, I see few tweaks that can wait, but this is a very 
> good start. A very important point is to have a simple test for this feature. 
> You can adapt an existing regress script and ensure that your code does the 
> right thing, I can help you on this part. First, take a look to 
> regress/tests/prune-test for example, it shows how to add directives and how 
> to check results)

I'll have a look. Might have more questions in regards to this at some
point..

> 
> > Firstly, the nature of this code means that locking inside of it must be
> > avoided. Thus, I have had to rewrite some functions already used elsewhere
> > in bacula omitting the locks and working around potential indefinite
> > blocks.
> 
> This part is associated with the SD pooling code, which is not essential 
> (IMHO)
> 
> > These have been added to a quota.c file along with the actual
> > quota checking code. For the most part they duplicate already used
> > functions but I felt updating existing functions to support what work I
> > was doing was dangerous. This might not be the best approach and I'm open
> > to suggestions here.
> 
> What about group of clients?  I can imagine that some users will want to 
> allocate quota per "group" of clients (or by Pool), and not client by client 
> basis.   Is it something that should be considered now? (IMHO, it can wait)

From a design perspective I think this is a good idea, but I'd prefer
the config options to stay out of the pool or storage resources as I
think quota management is a job related feature not a volume related
one. Perhaps adding it to a Job resource, or JobDef resouce instead?
Theres no relevent SQL support for that currently so something would
have to be produced at a later date.

> 
> >From what I can see, the Quota table can be merged with the Client table.
> 

Yes, this was me not wanting to fiddle with other tabling arrangements
in bacula. I felt it was obtrusive unless I got a second opinion on
it :). Thanks.

> > The majority of quota checking code is in quota.c, there is some stuff in
> > other header files and some stuff in the cats directory used for database
> > access to quota check with.
> > 
> > This patch is running on our own network of 300 backup servers spread
> > amongst 1200 (so far) clients using it. We are gradually cranking up the
> > usage week by week. So far this patch works flawlessly. Other tortures
> > I've done include running every job with quotas on the backup server
> > immediately (about 15) to see if it would cope. So far things have been
> > alright.
> > 
> > This does need more testing, particularly on postgres and sqlite3 which I
> > have not done.
> >
> > I also don't know how this code would cope on one director
> > running 1000 concurrent jobs (this is something we intend to be doing in
> > the coming months).
> 
> With the SD pooling code, I can imagine that if you have a network problem, 
> it 
> can lead to strange things.

I would be greatly appreciative if you could expand here please :).

> 
> > OK, enough rambling on. Heres the patch below, I apologize for the size.
> > Feedback is greatly appreciated. Again, Im not a developer so if anyone
> > knows how I can make the patch put quota.c/quota.h in the correct
> > directory (not the one I originally diffed from) I would be much obliged!
> 
> Having specific files for quota is ok, no worry. Once we made a decision on 
> the pooling part, it will be easier to decide.

If the pooling isnt necessary. Its probably easier to put the functions
somewhere else like in job the files.

> 
> To use git, first clone the git repo from sourceforge, create your own branch 
> from the Branch-5.1 code, add your changes, and generate diffs (you can also 
> publish them on github for example).
> 
> git clone git://bacula.git.sourceforge.net/gitroot/bacula/bacula
> cd bacula
> git checkout -b quota origin/Branch-5.1
> 
> git add ...
> git commit
> 
> git diff origin/Branch-5.1
> 
> 
> You can find information on our git usage on:
> http://www.bacula.org/5.1.x-manuals/en/developers/developers/Git_Usage.html
> 
> Bye
> 

Thanks for your input.


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to