Hi all,

On Friday at the Un-Conference, I mentioned how we had series catalog data loss 
from capture agents that unintentionally modified the series catalog.The 
capture agent parses the series catalog received from Matterhorn into a 
structure that is not able preserve all the elements. The unparsed elements are 
lost when the series catalog is reconstructed for its return trip to Matterhorn 
via ingest.

I request feedback from the community on Matterhorn policy for catalog updates 
by the ingest service. My desire is to maintain the ability for Matterhorn to 
create new catalogs on ingest, but not allow updates to series or episode 
catalogs using the ingest service. Capture agents that create episode and 
series fresh from the capture agent interfaces would continue to be created in 
Matterhorn. But, capture agents that support editing the series and episode 
catalogs would need to use the recording and series REST endpoints directly.

I put a potential patch http://opencast.jira.com/browse/MH-928  (just 5 lines 
new code, adds 1 try catch and 1 extra method call, see second  diff below). 
But there is a conflict from the header comments (see first diff below). The 
comment conflict is "potentially modified dublin core document" series data 
should be honored by the ingest service. This is the policy that needs 
clarification from the community.

// "-" signifies a current line proposed to be removed, "+" signifies the 
replacement
//  The IngestServiceImpl.updateSeries() comment change:

-   * Updates the persistent representation of a series based on a potentially 
modified dublin core document.
+   * Only create a series if it does not exist. This ensures that the series 
exists for the ingested 
+   * package's reference to isPartOf, yet prevents more current data in the 
system from being overwritten.

//  The IngestServiceImpl.updateSeries() code change proposal:

-          seriesService.updateSeries(dc);
+          try {
+            // test if series exists
+            seriesService.getSeries(id); 
+            logger.debug("Series id {} already exists. Ignoring series catalog 
from ingest.", id);
+          } catch (NotFoundException nf) { 
+            // safe to create series
+            seriesService.updateSeries(dc);
+            isCreated = true;
+            logger.info("Ingest created new series with id {} ", id);
+          }

FYI - From previous code commit comments, the reason for series update/create 
at this early point in the ingest process is that ingests associated to a 
series but found in Matterhorn, puts the ingest in error. The embedded series 
catalog cannot be processed at the regular ingest zip file processing, because 
the threads launched there cannot flush the new series in time for the 
package's isPartOf to find the series match. The ingest fails if the series is 
not created prior to processing the ingest zip.

During the Friday Matterhorn Un-conference, there was an idea of de-coupling 
rich metadata from Matterhorn versus enabling Matterhorn to accommodate it. It 
would be helpful to investigate how Matterhorn could accommodate at least the 
transport and integrity of rich metadata so it can be harvested as supplemental 
data to the event media. This would prevent the requirement of placing 
Matterhorn in a metadata infrastructure. It allows Matterhorn to be more self 
sufficient and self-standing.

- Karen
_______________________________________________
Matterhorn mailing list
Matterhorn@opencastproject.org
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
matterhorn-unsubscr...@opencastproject.org
_______________________________________________

Reply via email to