Oddly I had the exact same thought. Although it's not obvious from the name
(and common usage) of trim-like functions that you'd also have a way to
specify maximum length (after trimming I'd assume).
And the other thought I had was that TrimFilter should optionally take a
list of characters to tri
Hi Geoff,
cool, that will eliminate possible regex pitfalls in schema.xml
I was thinking about enhancing an existing filter as multi-purpose filter.
E.g. TrimFilter, if maxLength is set then also limit the termAtt to maxLength.
This will keep the number of available filters small, especially for s
Hi,
I've been following this thread and happen to have a simple
TruncatingFilter class I wrote for the same purpose. I think this should
do what you want:
import java.io.IOException;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apach
There's nothing in Solr that I know of that does this. It would be a pretty
easy custom filter to create though
FWIW,
Erick
On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote:
> On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
> wrote:
> > By the way, why does TrimFilter option updateOffse
On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling
wrote:
> By the way, why does TrimFilter option updateOffset defaults to false,
> just keep it backwards compatible?
>
In my opinion this option should be removed.
TokenFilters shouldn't muck with offsets, for a lot of reasons, but
especially becau
Hi Erik,
I like the fortune cookie :-)
I came to the same solution as you did but with a short java proggy by
trying different patterns, so try and error ;-)
This brings me to the question, is there now (with 4.0) any filter doing
the job for me? I took a look at LengthFilter but it has a differ
Because your regex is wrong? (sorry, couldn't resist).
Regexes always give me indigestion. But if you look at your results, your
regex isn't working in any case at all. The second group is being removed
from the end of the string. I _think_ what's happening is that the longest
possible string is b
Yes, it is the second PatternReplaceFilterFactory.
the String "Arslanagic, Aida ; Siqveland, Elisabeth" is reduced to "a",
whereas the other strings are:
"Alexander, Kvam ; Bjørn, Nyland ; Bjørn, Reiten ; Øystein, Huse" -->
"alexanderkvambj"
"Brennmoen, Ingar ; Hauklien, Øystein ; Hedalen, Trond
The field type is derived from the distributed alphaOnlySort as follows:
It reduces long lists of author names (100 and more authors) to the first 30
chars
for sorting and removes some illegal chars to keep sorting with utf8 solid.
Don't see any problems there.
:
http://wiki.apache.org/solr/CommonQueryParameters
For example, have an "author" field that is "text" and an "author_s" (or
"author_sorted" or "author_string") field that you copy the name to:
Query on "author", but sort on &quo
First, sorting on tokenized fields is undefined/unsupported. You _might_
get away with it if the author field always reduces to one token, i.e. if
you're always indexing only the last name.
I should say unsupported/undefined when more than one token is the result
of analysis. You can do things lik
rg
> Subject: content disappears in the index
>
> Hi list,
> a user reported wrong sorting of our search service running on solr.
> While chasing this issue I traced it back through lucene into the index.
> I have a text field for sorting
> (stored,indexed,tokenized,omitNorms,sortM
Hi list,
a user reported wrong sorting of our search service running on solr.
While chasing this issue I traced it back through lucene into the index.
I have a text field for sorting
(stored,indexed,tokenized,omitNorms,sortMissingLast)
and three docs with author names.
If I trace at org.apache.lu
13 matches
Mail list logo