Looking at the current list of bugs in Jira, it becomes obvious that a number 
of them is centered around archival. One problem is that the archive service, 
unless correctly configured, will try to archive things that shouldn't be 
archived, such as the files that have been copied to the streaming server 
featuring an rtmp:// url. But to allow for retraction some elements--namely 
those that represent distributions--have to be retained in the media package 
even though their content is not subject to archival.

Then at the moment it is very difficult for the archive to figure out whether a 
media package has been distributed, and if so, to which distribution channels. 
In order to move Matterhorn towards a media management system it is an 
indispensable feature to make this clearly determinable.

Another goal of this proposal is to unify the concept of distribution. In the 
current situation Engage is handled quite differently than e.g. YouTube where 
separate steps for copying and publishing to search have to be done. The goal 
is to have a 1:1 relationship between a distribution channel, its distribution 
service and associated workflow operation handler and the corresponding entry 
in the media package.

PROBLEM

The reason for these problems is that we are currently using a concept called 
"derived tracks" to detect distribution status, for example if there is a 
track, that is derived from the source track (the ingested media), and whose 
url starts with the download url, then we know that the media package has been 
distributed to the download server. This is not ideal for many reasons, with 
the most prominent ones being:

- the download artifact may be used by different presentations of the 
mediapackage, e. g. the Media Module, a video repository connected through 
OAI-PMH metadata harvesting, RSS/ATOM feeds etc. So what does retracting this 
mediapackage really mean, or what does it mean for the representations if the 
media is removed from the download server?

- the admin ui that wants to provide the administrator with the link to the 
"final product" (i. e. the representation of the media package on the Engage UI 
or on YouTube) needs to have in depth knowledge about these representations, 
for example it needs to know that youtube tracks have a URL starting with 
youtube.com, so it would determine distribution status to youtube by going 
through the mediapackage, looking at all tracks to find one with a matching url.

PROPOSED SOLUTION

We are proposing a solution to all these problems that allows Matterhorn to 
indicate to the administrator which channels a certain MediaPackage has been 
distributed to without the need for the admin ui to have knowledge about 
specific track properties for a given distribution channel.

A new element is introduced to the Mediapackage called "<presentation>" that 
identifies the distribution channel as well as the url that is used to consume 
the distributed artifact. This url can point to e.g. a web page with the 
embedded video in case of channels like Engage/Player or YouTube or a feed URL 
in case of an RSS feed. 

Which elements have been actually used to make up a distribution and to keep 
track of them in order to allow for retraction now lies completely whithin the 
responsibility of the distribution channel. To support some simple data storage 
right in the media package the new element features a simple key/value 
dictionary. These key/value pairs may also be used to implement efficient 
storage and retraction strategies.

<mediapackage>
 ...
 <presentations>

   <presentation id="p-1" channel="youtube">
     <uri>http://www.youtube.com/watch?v=D1R-jKKp3NA</uri>
     <mimetype>text/html</mimetype>
     <!-- the dictionary is freely managed by the distribution channel and may 
take arbitrary key/value data -->
     <dict>
       <value key="access-token">D1R-jKKp3NA</value>
     </dict>
   </presentation>

   <presentation id="p-2" channel="engage">
     
<uri>http://downloads.myinstitution.edu/engage/ui/watch.html?id=123123s</uri>
     <mimetype>text/html</mimetype>
   </presentation>

   <presentation id="p-3" channel="feeds">
     <uri>http://downloads.myinstitution.edu/feeds/entries/342345</uri>
     <mimetype>application/rss+xml</mimetype>
   </presentation>

 </presentations>
 --
</mediapackage>

By adding this element to a media package, it would immediately be obvious to 
which channels it has been distributed to (and how it can be reached in that 
channel), and rather than creating data structures in the mediapackage and 
using those to derive the distribution status and guess the actual 
representation, the entry points into consumption of the mediapackage are 
clearly defined.

This change doesn't touch existing data structures and will therefore not 
impact existing functionality. But it will allow us to close out all of the 
remaining bugs that are related to archival and retraction.

Looking forward to your +/-1's and/or comments.

Tobias
_______________________________________________
Matterhorn mailing list
Matterhorn@opencastproject.org
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
matterhorn-unsubscr...@opencastproject.org
_______________________________________________

Reply via email to