Re: [Dspace-devel] Different strategies to implement metadata browse with Discovery

Mark Diggory Wed, 25 Aug 2010 12:25:09 -0700

Andrea,

I would like to talk a bit about the design direction of Discovery and
make some recommendation for how you should approach working on the
JSPUI

1.) Per parity with the Browse system, we do not consider parity
important, but have seen requests to have similar browse
functionality.  We would rather not impose the original browse api as
a design requirement on discovery, instead allowing Faceted Search and
Term Completion be the primary navigation method for restricting the
resultset.  It is the case that OPAC like browse paging through lists
of terms is not an efficient means of navigation to a wanted set of
results nor very useful for discovering what is available in the
repository.  Ultimately, we only maintained minimal capability to do
it to assure that without javascript autocomplete/term suggest, that
the values were still accessible to non-javascript browsers to meet
accessibility standards.

Solr support pagniantion sufficient to determine a new page is
required to continue to view more results, its actually a query
efficiency design that reduces calculating th entire facet value
frequency resultset when it is not neccessary.

2.) Any enhancements to the fields that are created in Solr in the
discovery module should be discussed within the module by the whole
development team and should not differ between XMLUI and JSPUI
implementations.  The target design goal should be that discovery
design should not vary between implementations.   We abandoned the
original Browse and Search API because they were both bloated and over
engineered,  reimplementing them to use Solr does not do away with
this, and actually negates the benefit of relying on the third party
application and its api by still burdoning us with the cost of
maintaining the custom Browse and Search interfaces for our
application.

3.) I do not want to see the solr schema over-complicated with
semantically complex field names or fields such as those you suggest
in 2.  the closest approach I can see any possibility of (and still a
cludge) is to create first character, first two char and/or first
three character wildcard fields added to support limited hierarchical
faceting on alphabet.

We are currently researching the addition and use of Bobo Browse to
the Solr implementation to further enhance the capability of Solr to
also support hierarchical Facets, Sorting of Facets, paging through
Facet results and Grouping of Search Results.

The Terms Component is only best applied across the entire repository
for a specific field, while your approach may provide something that
emulates restriciting browse to Community/Collection leveles, the
resulting solr instance that is very bloated and devated heavily from
traditional approaches, it will be difficult for other to later
customize if such an approach is the default.

I won't be in the commiters meeting this week.  But intend to be back
working on projects and in the community next week.

Mark

On Sun, Aug 22, 2010 at 5:50 AM, Andrea Bollini <[email protected]> wrote:
>  Hi all,
> as discussed in the last IRC Dev meeting I'm working on a porting of the
> dspace-discovery idea to the JSPUI.
> As side effect I'm trying to better understand and (if possible) improve
> discovery self...
>
> I have mainly completed the replacement of DSpace Lucene engine with the
> one provided by Sorl/Discovery, faceting on search results work well
> also for metadata with authority key.
> Now I'm thinking about the use of SOLR to replace the dspace browse
> system, I'm facing with several issues that I want try to summarize
> showing different strategies.
> Use SOLR for browsing has IMHO the following pros:
> - we will use an external well established library to manage our browse
> system
> - we will use an unified approach for search and browse
> - performance? probably but I have not real data comparison between
> indexing and query time in our current Browse system and SOLR
>
>
> 1) SOLR facets are not good for pagination:
>  - As far as I know there is not out-of-box  way to get answer to this
> question: "how many facet I have for this field in this query?"
>  - you can navigate "facet result" using offset and limit (show facet
> from position X to position Y) but you can't ask to start with the facet
> "My Value" or from a facet that start with letter "X"
>
> This mean that we are not able with SOLR to reproduce the same features
> of our the current browse system, no total count of authors, keywords,
> etc. and not jump to a position in the index...
> So if we use this approach we should remove some existent
> functionalities or look to the SOLR facet component to see if we are
> able to improve it and contribute back to the SOLR community.
>
>
> 2) Using SOLR TermsComponent: during my exploration I found this new
> component in SOLR 1.4
> http://wiki.apache.org/solr/TermsComponent
> It allows great pagination on field terms (total count, offset, limit,
> jump to are all supported)... but it doesn't work in a combined way with
> query.
>
> This mean that we are not able to use it to provide browse of metadata
> values within a community or collection.
> We could workaround this limit making several copies of the "browse
> metadata" in solr field specific of a community or collection, i.e we
> will have solr fields like author_m64 (author in community with id 64)
> and so on.
> I'm not sure if there are issues to put so much fields for document. For
> any metadata browse we will get one addition field for any community and
> collection, so with repository with a height number of
> communities/collections, for example 200 communities and 2k
> collections,  we will get document with potential 2,2K fields for any
> browse.
>
> 3) the last option that I see is to add a new core to SOLR (i.e.
> browse), the SOLR "browse document" could have the following fields
> browse-type (author, keywords, publishers, etc.),    browse-unique-value
> (the value to lookup),    value (the value to display),    authority_key
> (the authority key if any),    sort (the sort  value),  item_id
> (repeatable, the id of all item that use this term)
> using a solr core "browse centric" instead of a core "item centric" will
> simplify and resolve all our pagination issues. Instead new issues arise
> related to filling and keep up-to-date this new index...
> after a first raw evaluation I think that we need how many "solr
> insert/update" as current db browse insert/update...
>
> pros of this strategy vs previous
> - integration of additional information, indexing of "authority source"
> could be easily integrated. If you have a directory of institutional
> author and you want put all the "institutional author" in the browse
> index you can easly accomplish this also if there is no item for an
> "institutional author". The same thing apply to subject classification, etc.
> cons:
> - there are not facet opportunities, we can't filter authors in the
> repository in a specific topic (based on item keywords)
>
> My preference is for the solution 2 but I will be happy to hear other idea.
>
> Andrea
>
> Dott. Andrea Bollini
> Project Manager, IT Architect&  Systems Integrator
> Sezione Servizi per le Biblioteche e l'Editoria Elettronica
> CILEA,http://www.cilea.it
> tel. +39 06-59292853
> cel. +39 348-8277525
>
> ---
>
> Disclaimer: the content of this email is confidential and may be privileged, 
> and it must not be disclosed or copied without the sender's consent. If you 
> have received this message in error, please notify the sender and remove it 
> from your system. The content of this email does not constitute legal advice, 
> nor any responsibility is accepted for loss or damage incurred as a result of 
> acting upon its contents or attachments.
> The statements and opinions expressed in this email are those of the author 
> and do not necessarily reflect those of the employer.
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> Dspace-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-devel
>

-- 
Mark R. Diggory
Head of U.S. Operations - @mire

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Re: [Dspace-devel] Different strategies to implement metadata browse with Discovery

Reply via email to