Looking at the current list of bugs in Jira, it becomes obvious that a number of them is centered around archival. One problem is that the archive service, unless correctly configured, will try to archive things that shouldn't be archived, such as the files that have been copied to the streaming server featuring an rtmp:// url. But to allow for retraction some elements--namely those that represent distributions--have to be retained in the media package even though their content is not subject to archival.
Then at the moment it is very difficult for the archive to figure out whether a media package has been distributed, and if so, to which distribution channels. In order to move Matterhorn towards a media management system it is an indispensable feature to make this clearly determinable. Another goal of this proposal is to unify the concept of distribution. In the current situation Engage is handled quite differently than e.g. YouTube where separate steps for copying and publishing to search have to be done. The goal is to have a 1:1 relationship between a distribution channel, its distribution service and associated workflow operation handler and the corresponding entry in the media package. PROBLEM The reason for these problems is that we are currently using a concept called "derived tracks" to detect distribution status, for example if there is a track, that is derived from the source track (the ingested media), and whose url starts with the download url, then we know that the media package has been distributed to the download server. This is not ideal for many reasons, with the most prominent ones being: - the download artifact may be used by different presentations of the mediapackage, e. g. the Media Module, a video repository connected through OAI-PMH metadata harvesting, RSS/ATOM feeds etc. So what does retracting this mediapackage really mean, or what does it mean for the representations if the media is removed from the download server? - the admin ui that wants to provide the administrator with the link to the "final product" (i. e. the representation of the media package on the Engage UI or on YouTube) needs to have in depth knowledge about these representations, for example it needs to know that youtube tracks have a URL starting with youtube.com, so it would determine distribution status to youtube by going through the mediapackage, looking at all tracks to find one with a matching url. PROPOSED SOLUTION We are proposing a solution to all these problems that allows Matterhorn to indicate to the administrator which channels a certain MediaPackage has been distributed to without the need for the admin ui to have knowledge about specific track properties for a given distribution channel. A new element is introduced to the Mediapackage called "<presentation>" that identifies the distribution channel as well as the url that is used to consume the distributed artifact. This url can point to e.g. a web page with the embedded video in case of channels like Engage/Player or YouTube or a feed URL in case of an RSS feed. Which elements have been actually used to make up a distribution and to keep track of them in order to allow for retraction now lies completely whithin the responsibility of the distribution channel. To support some simple data storage right in the media package the new element features a simple key/value dictionary. These key/value pairs may also be used to implement efficient storage and retraction strategies. <mediapackage> ... <presentations> <presentation id="p-1" channel="youtube"> <uri>http://www.youtube.com/watch?v=D1R-jKKp3NA</uri> <mimetype>text/html</mimetype> <!-- the dictionary is freely managed by the distribution channel and may take arbitrary key/value data --> <dict> <value key="access-token">D1R-jKKp3NA</value> </dict> </presentation> <presentation id="p-2" channel="engage"> <uri>http://downloads.myinstitution.edu/engage/ui/watch.html?id=123123s</uri> <mimetype>text/html</mimetype> </presentation> <presentation id="p-3" channel="feeds"> <uri>http://downloads.myinstitution.edu/feeds/entries/342345</uri> <mimetype>application/rss+xml</mimetype> </presentation> </presentations> -- </mediapackage> By adding this element to a media package, it would immediately be obvious to which channels it has been distributed to (and how it can be reached in that channel), and rather than creating data structures in the mediapackage and using those to derive the distribution status and guess the actual representation, the entry points into consumption of the mediapackage are clearly defined. This change doesn't touch existing data structures and will therefore not impact existing functionality. But it will allow us to close out all of the remaining bugs that are related to archival and retraction. Looking forward to your +/-1's and/or comments. Tobias _______________________________________________ Matterhorn mailing list Matterhorn@opencastproject.org http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email matterhorn-unsubscr...@opencastproject.org _______________________________________________