Kostas Stamatis created DS-1226:
-----------------------------------

             Summary: Batch import from basic bibliographic formats (Endnote, 
BibTex, RIS, TSV, CSV)
                 Key: DS-1226
                 URL: https://jira.duraspace.org/browse/DS-1226
             Project: DSpace
          Issue Type: New Feature
          Components: DSpace API
            Reporter: Kostas Stamatis
         Attachments: biblio-transformation-engine-0.8.jar, import-patch.diff, 
jbibtex-r45.jar, README.txt

This proposed extension (implemented by National Documentation Centre/EKT - 
http://www.ekt.gr) allows the batch import of metadata (and/or bitstreams) to 
DSpace using the import script and the Biblio-Transformation-Engine tool. The 
input format can be any bibliographic format (the specific patch includes 
support for Endnote, RIS, BibTex, TSV and CSV formats).

The biblio transformation engine 
(http://code.google.com/p/biblio-transformation-engine/) is an open source java 
framework developed by the Hellenic National Documentation Centre (EKT, 
www.ekt.gr) and consists of programmatic APIs for filtering and modifying 
records that are retrieved from various types of data sources (eg. databases, 
files, legacy data sources) as well as for outputing them in appropriate 
standards formats (eg. database files, txt, xml, Excel). The framework includes 
independent abstract modules that are executed seperately, offering in many 
cases alternative choices to the user depending of the input data set, the 
transformation workflow that needs to be executed and the output format that 
needs to be generated.

Thus, the attached patch, adds support for utilizing the 
Biblio-Transformation-Engine in the DSpace batch import procedure where the 
user only needs to specify the mapping between the input metadata and DSpace 
metadata. Default mapping are also provided for the default DSpace Dublin Core 
metadata schema.


USEFULNESS
---------------------
Suppose a researcher of your institute provides you with a file with his/her 
publications that you need to import in the repository. Supposing that the 
format of the file is one the following: CSV, TSV, Endnote, BibTex, RIS 
(formats that are commonly used for bibliographic metadata) using only one 
command you can import all the records to the DSpace repository while in 
parallel, configuration files apply in order to control which metadata is 
imported and in which DC (or any other schema of the DSpace repository) field 
it maps.

For those who know well the use of the Biblio-Transformation-Engine, this 
extension is powerful given that they can write their own DataLoaders in order 
to support more input formats. Filtering of records as well as modifying the 
metadata is also possible with very little effort (using Biblio transformation 
engine's filters and modifiers). The same applies for the addition of 
bitstreams in the records.


CONFIGURATION FILES
---------------------------------------
Since Bibilio-transformation-Engine supports Spring, the only configurations 
that the user must work with are the Spring XML files for the Dependency 
Injection. These files are located within "config" directory and the user can 
specify in them the mapping between input metadata and DSpace Dublin Core 
schema (or any other schema users have in their repository)


EXTERNAL LIBRARIES
-----------------------------------
This extension makes use of three external java libraries:
a) jbibtex, a java library for reading bibtex files (under BSD licence - 
http://www.linfo.org/bsdlicense.html)
b) opencsv, a java library for reading csv files (under Apache License V2.0 - 
http://www.apache.org/licenses/LICENSE-2.0)
c) biblio-transformation-engine, a java library for metadata transformation, 
fitlering and modification (under European Union Public Licence (EUPL) License, 
http://www.osor.eu/eupl/european-union-public-licence-eupl-v.1.1)


HOW TO RUN
----------------------

In the import script, there is a new option (-b) to import using the 
Biblio-Transformation-Engine and an option -i to declare the type of the input 
format. All the other options are the same. Option -s points to a file (and not 
a directory as it used to) that is the file of the input data.

Thus, to import metadata from the various input format use the following 
commands:

for BibTex input: ./dspace import -b -m mapFile -e [email protected] -c 
123456789/1 -s /DATA/export-bibtex -i bibtex
for csv input: ./dspace import -b -m mapFile -e [email protected] -c 
123456789/1 -s /DATA/export-csv -i csv
for tsv input: ./dspace import -b -m mapFile -e [email protected] -c 
123456789/1 -s /DATA/export-tsv -i tsv
for ris input: ./dspace import -b -m mapFile -e [email protected] -c 
123456789/1 -s /DATA/export-ris -i ris
for endnote input: ./dspace import -b -m mapFile -e [email protected] -c 
123456789/1 -s /DATA/export-endnote -i endnote

(-e must be a valid email of a DSpace user and -c must be the collection handle 
the items will be imported)

Before you run the commands, feel free to change the configuration files 
(config/spring-bibtex2dspace.xml, config/spring-csv2dspace.xml, 
config/spring-tsv2dspace.xml, config/spring-ris2dspace.xml, 
config/spring-endnote2dspace.xml) in order to specify the mapping of the input 
format to the DC metadata schema of DSpace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to