Mark and all:
Even if the proposed patch doesn't fit in with the current architecture of the
system, I think it would be useful to make a binary easily available with the
fast import code.
Graham made some excellent points yesterday evening. I'm paraphrasing and may
have muddled this a bit, but:
- Just because a system has been made faster in one area doesn't mean
it's now scalable
- A gigantic system may break or become unusable in other areas and
need other adjustments - for example, search indexes may need to be sharded.
Making the fast import tool available, at least as an option, would give
organizations one means of quickly loading large amounts of their data into
test systems so that they can start to poke at prototypes of gigantic systems
and see where they might break.
I know that there are people with data collection, testing, and research skills
at organizations that have access to large amounts of data, and experience with
the DSpace system, who could justify spending staff resources on identifying
the scalability issues if they could show a gigantic system now. This fast
import tool would help them produce the giant test system.
Can the fast importer be made readily available somewhere as an aid to
identifying and testing scalability issues in the current and future versions
of DSpace?
thanks,
keith
----- Original Message -----
From: "Mark Diggory" <[email protected]>
To: "Simon Brown" <[email protected]>, [email protected]
Sent: Wednesday, January 27, 2010 6:32:48 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Dspace-devel] [DSJ] Commented: (DS-470) Batch import times
increase drastically as repository size increases; patch to mitigate the
problem
We discuss it because we seek to maintain an appropriate separation of
concerns in our architecture. And because Graham usually challenges us
to look at aspects of that architecture that are important. What is
under discussion is not that performance can't be improved by your
patch, you've identified a very important issue in batch processing.
We are discussing architecturally if we want to alter the
Context/EventManager framework and expose calls to pruneIndex. We
want to be careful to avoid exposing too much of the internals of the
Browse system outside in the application architecture.
Excellent work on finding a means to improve DSpace performance.
Cheers,
Mark
--
Mark R. Diggory
Head of U.S. Operations - @mire
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel