Re: content disappears in the index

2012-11-15 Thread Erick Erickson
Oddly I had the exact same thought. Although it's not obvious from the name (and common usage) of trim-like functions that you'd also have a way to specify maximum length (after trimming I'd assume). And the other thought I had was that TrimFilter should optionally take a list of characters to tri

Re: content disappears in the index

2012-11-13 Thread Bernd Fehling
Hi Geoff, cool, that will eliminate possible regex pitfalls in schema.xml I was thinking about enhancing an existing filter as multi-purpose filter. E.g. TrimFilter, if maxLength is set then also limit the termAtt to maxLength. This will keep the number of available filters small, especially for s

Re: content disappears in the index

2012-11-13 Thread Geoff Cooney
Hi, I've been following this thread and happen to have a simple TruncatingFilter class I wrote for the same purpose. I think this should do what you want: import java.io.IOException; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apach

Re: content disappears in the index

2012-11-13 Thread Erick Erickson
There's nothing in Solr that I know of that does this. It would be a pretty easy custom filter to create though FWIW, Erick On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote: > On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling > wrote: > > By the way, why does TrimFilter option updateOffse

Re: content disappears in the index

2012-11-13 Thread Robert Muir
On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling wrote: > By the way, why does TrimFilter option updateOffset defaults to false, > just keep it backwards compatible? > In my opinion this option should be removed. TokenFilters shouldn't muck with offsets, for a lot of reasons, but especially becau

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
Hi Erik, I like the fortune cookie :-) I came to the same solution as you did but with a short java proggy by trying different patterns, so try and error ;-) This brings me to the question, is there now (with 4.0) any filter doing the job for me? I took a look at LengthFilter but it has a differ

Re: content disappears in the index

2012-11-12 Thread Erick Erickson
Because your regex is wrong? (sorry, couldn't resist). Regexes always give me indigestion. But if you look at your results, your regex isn't working in any case at all. The second group is being removed from the end of the string. I _think_ what's happening is that the longest possible string is b

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
Yes, it is the second PatternReplaceFilterFactory. the String "Arslanagic, Aida ; Siqveland, Elisabeth" is reduced to "a", whereas the other strings are: "Alexander, Kvam ; Bjørn, Nyland ; Bjørn, Reiten ; Øystein, Huse" --> "alexanderkvambj" "Brennmoen, Ingar ; Hauklien, Øystein ; Hedalen, Trond

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
The field type is derived from the distributed alphaOnlySort as follows: It reduces long lists of author names (100 and more authors) to the first 30 chars for sorting and removes some illegal chars to keep sorting with utf8 solid. Don't see any problems there.

Re: content disappears in the index

2012-11-12 Thread Jack Krupansky
: http://wiki.apache.org/solr/CommonQueryParameters For example, have an "author" field that is "text" and an "author_s" (or "author_sorted" or "author_string") field that you copy the name to: Query on "author", but sort on &quo

Re: content disappears in the index

2012-11-12 Thread Erick Erickson
First, sorting on tokenized fields is undefined/unsupported. You _might_ get away with it if the author field always reduces to one token, i.e. if you're always indexing only the last name. I should say unsupported/undefined when more than one token is the result of analysis. You can do things lik

RE: content disappears in the index

2012-11-12 Thread Uwe Schindler
rg > Subject: content disappears in the index > > Hi list, > a user reported wrong sorting of our search service running on solr. > While chasing this issue I traced it back through lucene into the index. > I have a text field for sorting > (stored,indexed,tokenized,omitNorms,sortM

content disappears in the index

2012-11-12 Thread Bernd Fehling
Hi list, a user reported wrong sorting of our search service running on solr. While chasing this issue I traced it back through lucene into the index. I have a text field for sorting (stored,indexed,tokenized,omitNorms,sortMissingLast) and three docs with author names. If I trace at org.apache.lu