Andrea, I would like to talk a bit about the design direction of Discovery and make some recommendation for how you should approach working on the JSPUI
1.) Per parity with the Browse system, we do not consider parity important, but have seen requests to have similar browse functionality. We would rather not impose the original browse api as a design requirement on discovery, instead allowing Faceted Search and Term Completion be the primary navigation method for restricting the resultset. It is the case that OPAC like browse paging through lists of terms is not an efficient means of navigation to a wanted set of results nor very useful for discovering what is available in the repository. Ultimately, we only maintained minimal capability to do it to assure that without javascript autocomplete/term suggest, that the values were still accessible to non-javascript browsers to meet accessibility standards. Solr support pagniantion sufficient to determine a new page is required to continue to view more results, its actually a query efficiency design that reduces calculating th entire facet value frequency resultset when it is not neccessary. 2.) Any enhancements to the fields that are created in Solr in the discovery module should be discussed within the module by the whole development team and should not differ between XMLUI and JSPUI implementations. The target design goal should be that discovery design should not vary between implementations. We abandoned the original Browse and Search API because they were both bloated and over engineered, reimplementing them to use Solr does not do away with this, and actually negates the benefit of relying on the third party application and its api by still burdoning us with the cost of maintaining the custom Browse and Search interfaces for our application. 3.) I do not want to see the solr schema over-complicated with semantically complex field names or fields such as those you suggest in 2. the closest approach I can see any possibility of (and still a cludge) is to create first character, first two char and/or first three character wildcard fields added to support limited hierarchical faceting on alphabet. We are currently researching the addition and use of Bobo Browse to the Solr implementation to further enhance the capability of Solr to also support hierarchical Facets, Sorting of Facets, paging through Facet results and Grouping of Search Results. The Terms Component is only best applied across the entire repository for a specific field, while your approach may provide something that emulates restriciting browse to Community/Collection leveles, the resulting solr instance that is very bloated and devated heavily from traditional approaches, it will be difficult for other to later customize if such an approach is the default. I won't be in the commiters meeting this week. But intend to be back working on projects and in the community next week. Mark On Sun, Aug 22, 2010 at 5:50 AM, Andrea Bollini <[email protected]> wrote: > Hi all, > as discussed in the last IRC Dev meeting I'm working on a porting of the > dspace-discovery idea to the JSPUI. > As side effect I'm trying to better understand and (if possible) improve > discovery self... > > I have mainly completed the replacement of DSpace Lucene engine with the > one provided by Sorl/Discovery, faceting on search results work well > also for metadata with authority key. > Now I'm thinking about the use of SOLR to replace the dspace browse > system, I'm facing with several issues that I want try to summarize > showing different strategies. > Use SOLR for browsing has IMHO the following pros: > - we will use an external well established library to manage our browse > system > - we will use an unified approach for search and browse > - performance? probably but I have not real data comparison between > indexing and query time in our current Browse system and SOLR > > > 1) SOLR facets are not good for pagination: > - As far as I know there is not out-of-box way to get answer to this > question: "how many facet I have for this field in this query?" > - you can navigate "facet result" using offset and limit (show facet > from position X to position Y) but you can't ask to start with the facet > "My Value" or from a facet that start with letter "X" > > This mean that we are not able with SOLR to reproduce the same features > of our the current browse system, no total count of authors, keywords, > etc. and not jump to a position in the index... > So if we use this approach we should remove some existent > functionalities or look to the SOLR facet component to see if we are > able to improve it and contribute back to the SOLR community. > > > 2) Using SOLR TermsComponent: during my exploration I found this new > component in SOLR 1.4 > http://wiki.apache.org/solr/TermsComponent > It allows great pagination on field terms (total count, offset, limit, > jump to are all supported)... but it doesn't work in a combined way with > query. > > This mean that we are not able to use it to provide browse of metadata > values within a community or collection. > We could workaround this limit making several copies of the "browse > metadata" in solr field specific of a community or collection, i.e we > will have solr fields like author_m64 (author in community with id 64) > and so on. > I'm not sure if there are issues to put so much fields for document. For > any metadata browse we will get one addition field for any community and > collection, so with repository with a height number of > communities/collections, for example 200 communities and 2k > collections, we will get document with potential 2,2K fields for any > browse. > > 3) the last option that I see is to add a new core to SOLR (i.e. > browse), the SOLR "browse document" could have the following fields > browse-type (author, keywords, publishers, etc.), browse-unique-value > (the value to lookup), value (the value to display), authority_key > (the authority key if any), sort (the sort value), item_id > (repeatable, the id of all item that use this term) > using a solr core "browse centric" instead of a core "item centric" will > simplify and resolve all our pagination issues. Instead new issues arise > related to filling and keep up-to-date this new index... > after a first raw evaluation I think that we need how many "solr > insert/update" as current db browse insert/update... > > pros of this strategy vs previous > - integration of additional information, indexing of "authority source" > could be easily integrated. If you have a directory of institutional > author and you want put all the "institutional author" in the browse > index you can easly accomplish this also if there is no item for an > "institutional author". The same thing apply to subject classification, etc. > cons: > - there are not facet opportunities, we can't filter authors in the > repository in a specific topic (based on item keywords) > > My preference is for the solution 2 but I will be happy to hear other idea. > > Andrea > > Dott. Andrea Bollini > Project Manager, IT Architect& Systems Integrator > Sezione Servizi per le Biblioteche e l'Editoria Elettronica > CILEA,http://www.cilea.it > tel. +39 06-59292853 > cel. +39 348-8277525 > > --- > > Disclaimer: the content of this email is confidential and may be privileged, > and it must not be disclosed or copied without the sender's consent. If you > have received this message in error, please notify the sender and remove it > from your system. The content of this email does not constitute legal advice, > nor any responsibility is accepted for loss or damage incurred as a result of > acting upon its contents or attachments. > The statements and opinions expressed in this email are those of the author > and do not necessarily reflect those of the employer. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > Dspace-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-devel > -- Mark R. Diggory Head of U.S. Operations - @mire http://www.atmire.com - Institutional Repository Solutions http://www.togather.eu - Before getting together, get t...@ther ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Dspace-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-devel
