Kostas Stamatis created DS-1226:
-----------------------------------
Summary: Batch import from basic bibliographic formats (Endnote,
BibTex, RIS, TSV, CSV)
Key: DS-1226
URL: https://jira.duraspace.org/browse/DS-1226
Project: DSpace
Issue Type: New Feature
Components: DSpace API
Reporter: Kostas Stamatis
Attachments: biblio-transformation-engine-0.8.jar, import-patch.diff,
jbibtex-r45.jar, README.txt
This proposed extension (implemented by National Documentation Centre/EKT -
http://www.ekt.gr) allows the batch import of metadata (and/or bitstreams) to
DSpace using the import script and the Biblio-Transformation-Engine tool. The
input format can be any bibliographic format (the specific patch includes
support for Endnote, RIS, BibTex, TSV and CSV formats).
The biblio transformation engine
(http://code.google.com/p/biblio-transformation-engine/) is an open source java
framework developed by the Hellenic National Documentation Centre (EKT,
www.ekt.gr) and consists of programmatic APIs for filtering and modifying
records that are retrieved from various types of data sources (eg. databases,
files, legacy data sources) as well as for outputing them in appropriate
standards formats (eg. database files, txt, xml, Excel). The framework includes
independent abstract modules that are executed seperately, offering in many
cases alternative choices to the user depending of the input data set, the
transformation workflow that needs to be executed and the output format that
needs to be generated.
Thus, the attached patch, adds support for utilizing the
Biblio-Transformation-Engine in the DSpace batch import procedure where the
user only needs to specify the mapping between the input metadata and DSpace
metadata. Default mapping are also provided for the default DSpace Dublin Core
metadata schema.
USEFULNESS
---------------------
Suppose a researcher of your institute provides you with a file with his/her
publications that you need to import in the repository. Supposing that the
format of the file is one the following: CSV, TSV, Endnote, BibTex, RIS
(formats that are commonly used for bibliographic metadata) using only one
command you can import all the records to the DSpace repository while in
parallel, configuration files apply in order to control which metadata is
imported and in which DC (or any other schema of the DSpace repository) field
it maps.
For those who know well the use of the Biblio-Transformation-Engine, this
extension is powerful given that they can write their own DataLoaders in order
to support more input formats. Filtering of records as well as modifying the
metadata is also possible with very little effort (using Biblio transformation
engine's filters and modifiers). The same applies for the addition of
bitstreams in the records.
CONFIGURATION FILES
---------------------------------------
Since Bibilio-transformation-Engine supports Spring, the only configurations
that the user must work with are the Spring XML files for the Dependency
Injection. These files are located within "config" directory and the user can
specify in them the mapping between input metadata and DSpace Dublin Core
schema (or any other schema users have in their repository)
EXTERNAL LIBRARIES
-----------------------------------
This extension makes use of three external java libraries:
a) jbibtex, a java library for reading bibtex files (under BSD licence -
http://www.linfo.org/bsdlicense.html)
b) opencsv, a java library for reading csv files (under Apache License V2.0 -
http://www.apache.org/licenses/LICENSE-2.0)
c) biblio-transformation-engine, a java library for metadata transformation,
fitlering and modification (under European Union Public Licence (EUPL) License,
http://www.osor.eu/eupl/european-union-public-licence-eupl-v.1.1)
HOW TO RUN
----------------------
In the import script, there is a new option (-b) to import using the
Biblio-Transformation-Engine and an option -i to declare the type of the input
format. All the other options are the same. Option -s points to a file (and not
a directory as it used to) that is the file of the input data.
Thus, to import metadata from the various input format use the following
commands:
for BibTex input: ./dspace import -b -m mapFile -e [email protected] -c
123456789/1 -s /DATA/export-bibtex -i bibtex
for csv input: ./dspace import -b -m mapFile -e [email protected] -c
123456789/1 -s /DATA/export-csv -i csv
for tsv input: ./dspace import -b -m mapFile -e [email protected] -c
123456789/1 -s /DATA/export-tsv -i tsv
for ris input: ./dspace import -b -m mapFile -e [email protected] -c
123456789/1 -s /DATA/export-ris -i ris
for endnote input: ./dspace import -b -m mapFile -e [email protected] -c
123456789/1 -s /DATA/export-endnote -i endnote
(-e must be a valid email of a DSpace user and -c must be the collection handle
the items will be imported)
Before you run the commands, feel free to change the configuration files
(config/spring-bibtex2dspace.xml, config/spring-csv2dspace.xml,
config/spring-tsv2dspace.xml, config/spring-ris2dspace.xml,
config/spring-endnote2dspace.xml) in order to specify the mapping of the input
format to the DC metadata schema of DSpace.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel