Yes, it is the second PatternReplaceFilterFactory. the String "Arslanagic, Aida ; Siqveland, Elisabeth" is reduced to "a", whereas the other strings are: "Alexander, Kvam ; Bjørn, Nyland ; Bjørn, Reiten ; Øystein, Huse" --> "alexanderkvambj" "Brennmoen, Ingar ; Hauklien, Øystein ; Hedalen, Trond ; Kvam, Erik" --> "brennmoeningarhauk"
Now this explains the sorting (shit in --> shit out). But why is the first string reduced to "a", wrong regular expression? Bernd Am 12.11.2012 14:51, schrieb Bernd Fehling: > The field type is derived from the distributed alphaOnlySort as follows: > > <fieldType name="alphaOnlySortLim" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.TrimFilterFactory" /> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([\x00-\x2F\x3A-\x40\x5B-\x60\x7B-\x9F\u2000-\u206F\uFEFF\uFFF9-\uFFFD])" > replacement="" > replace="all"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="(.{1,30})(.{31,})" > replacement="$1" > replace="all"/> > </analyzer> > </fieldType> > > It reduces long lists of author names (100 and more authors) to the first 30 > chars > for sorting and removes some illegal chars to keep sorting with utf8 solid. > > Don't see any problems there. > > Will check with admin/analysis page. > > Bernd > > > Am 12.11.2012 14:28, schrieb Erick Erickson: >> First, sorting on tokenized fields is undefined/unsupported. You _might_ >> get away with it if the author field always reduces to one token, i.e. if >> you're always indexing only the last name. >> >> I should say unsupported/undefined when more than one token is the result >> of analysis. You can do things like use the KeywordTokenizer followed by >> tranformations on the _entire_ input field (lowercasing is popular for >> instance). >> >> So somehow the analysis chain you have defined for this field grabs >> "Arslanagic" >> and translates it into "a". Synonyms? Stemming? Some "interesting" sequence? >> >> The fastest way to look at that would be in Solr's admin/analysis page. >> Just put Arslanagic into the index box and you should see which of the >> steps does the translation. Although changing it to "a" is really weird, >> it's almost certainly something you've defined in the indexing analysis >> chain. >> >> FWIW, >> Erick >> >> >> On Mon, Nov 12, 2012 at 8:19 AM, Bernd Fehling < >> bernd.fehl...@uni-bielefeld.de> wrote: >> >>> Hi list, >>> a user reported wrong sorting of our search service running on solr. >>> While chasing this issue I traced it back through lucene into the index. >>> I have a text field for sorting >>> (stored,indexed,tokenized,omitNorms,sortMissingLast) >>> and three docs with author names. >>> >>> If I trace at org.apache.lucene.document.Document.add(IndexableField) while >>> indexing I can see all three author names added as field to each documents. >>> >>> After searching with *:* for the three docs and doing a sort the sorting >>> is wrong >>> because one of the author names is reduced to the first char, all other >>> chars are lost. >>> >>> So having the authors names (Alexander, Arslanagic, Brennmoen) indexed, >>> the result >>> of sorting ascending is (Arslanagic, Alexander, Brennmoen) which is wrong. >>> But this happens because the author "Arslanagic" is reduced to "a" during >>> indexing (???) >>> and if sorted "a" is before "alexander". >>> >>> Currently I use 4.0 but have the same issue with 3.6.1. >>> >>> Without tracing through tons of code: >>> - which is the last breakpoint for debugging to see the docs right before >>> they go into the index >>> - which is the first breakpoint for debugging to see the docs coming right >>> out of the index >>> >>> Regards >>> Bernd >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> > -- ************************************************************* Bernd Fehling Bielefeld University Library Dipl.-Inform. (FH) LibTec - Library Technology Universitätsstr. 25 and Knowledge Management 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net ************************************************************* --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org