Re: [Opencast Matterhorn] Matterhorn 1.3.x: job_arguments table gets very large

Tobias Wunden Wed, 16 Jan 2013 06:43:12 -0800

Rüdiger,

The main reason for the job arguments growing is that the mediapackage is a 
common job argument. We were just recently discussing this problem internally 
at Entwine and were looking at the existing options to solve this problem. 
Unfortunately, there are a number of issues that prevent a simple solution, and 
we are convinced that this needs a closer look with more time at hand than what 
is left for 1.4.

One alterternative we considered was providing the (large) job arguments as 
downloads on the working file repository and passing urls as the arguments 
instead of the actual data (references instead of values). The same would be 
necessary for the job's return value (payload). However, this approach has two 
major drawbacks:

1) All REST endpoints and their remote implementations would need to be changed 
to accept URLs instead of values. An additional caveat would be that now you 
need to put your values on an HTTP server (or service) first before you can 
start using the REST docs for manual operation/debugging.

2) Once the job has gone through (succeeded or failed), there has to be proper 
cleanup of the values (likely a removal of the artifacts from the workging file 
repository), and now your operations are incomplete (they only contain the 
references but not the values). This would then be the same as removing the 
operations from the job arguments table in the database.

All this means that you are likely to run into the same problem with 1.4 as 
well, while switching to a different database engine may or may not help. Until 
then, scheduled removal of the jobs is our best bet and could easily be 
implemented as a configurable option as part of a 1.4.1.

I would welcome thoughts and comments from others on how to solve the problem. 
On thing that is of interest to me: are you using database indices on the job 
table (and others?).

Tobias

On 16.01.2013, at 13:47, Ruediger Rolf <rr...@uni-osnabrueck.de> wrote:

> Hi list,
> 
> I want to raise a problem with our 1.3 production system and ask if we will 
> run into this in 1.4 too.
> 
> After nearly a year in production and around 750 recordings our database got 
> extremly large ( > 2GB). The reason is mainly that the table job_arguments 
> got very large (>1.7GB). This results in a mysql database that is running on 
> 100% load permanently.
> 
> We reported this in MH-9342 [1] and I've seen that Stephen Marquard has 
> report the same in MH-9031 as I just noticed.
> I did my testing that I can delete the job_arguments without doing any harm, 
> and Tobias seems to be on the same opinon in his comment on MH-9031.
> 
> So my question would be: is this addressed or even fixed in 1.4 that we 
> delete finished jobs in the DB? This bug won't come up in our QA testing, as 
> we will not process several hundred jobs there. But it will be a serious 
> problem for any production system, although there is an easy fix for this.
> 
> Thanks
> Rüdiger
> 
> [1] http://opencast.jira.com/browse/MH-9342
> [2] http://opencast.jira.com/browse/MH-9031

_______________________________________________
Matterhorn mailing list
Matterhorn@opencastproject.org
http://lists.opencastproject.org/mailman/listinfo/matterhorn

To unsubscribe please email
matterhorn-unsubscr...@opencastproject.org
_______________________________________________

Re: [Opencast Matterhorn] Matterhorn 1.3.x: job_arguments table gets very large

Reply via email to