such an approach to problem solving.
Tell us the full problem and then we can focus on legitimate "solutions".
-- Jack Krupansky
-Original Message-
From: Erick Erickson
Sent: Sunday, November 04, 2012 8:06 AM
To: java-user
Subject: Re: using CharFilter to inject a space
Ahh
Ahh, I don't know of a better way. I can imagine complex solutions
involving something akin to WordDelimiterFilter... and I can imagine that
that would be ridiculously expensive to maintain when there are really
simple solutions like you're looking at.
Mostly I was curious about your use-case
well, my main goal is to use a ShingleFilter that will only take
shingles that are not separated by commas etc.
for example, the phrase:
"red apples, green tomatoes, and brown potatoes"
should yield the shingles "red apples", "green tomatoes", "and brown",
"brown potatoes"; but not "apple
So I've gotta ask... _why_ do you want to inject the spaces?
If it's just to break this up into tokens, wouldn't something like
LetterTokenizer do? Assuming you aren't interested in
leaving in numbers Or even StandardTokenizer unless you have
e-mail & etc.
Or what about PatternReplaceCharFilt
You're right. I'm not sure what I was thinking.
Thanks for all your help,
Igal
On Nov 3, 2012 5:44 PM, "Robert Muir" wrote:
> On Sat, Nov 3, 2012 at 8:32 PM, Igal @ getRailo.org
> wrote:
> > hi Robert,
> >
> > thank you for your replies.
> >
> > I couldn't find much documentation/examples of
On Sat, Nov 3, 2012 at 8:32 PM, Igal @ getRailo.org wrote:
> hi Robert,
>
> thank you for your replies.
>
> I couldn't find much documentation/examples of this, but this is what I came
> up with (below). is that the way I'm supposed to use the MappingCharFilter?
>
You don't need to extend anythi
hi Robert,
thank you for your replies.
I couldn't find much documentation/examples of this, but this is what I
came up with (below). is that the way I'm supposed to use the
MappingCharFilter?
also, if that is the correct way, wouldn't it make sense to return a
reference to "this" from Norm
On Sat, Nov 3, 2012 at 7:47 PM, Igal @ getRailo.org wrote:
> I considered it, and it's definitely an option.
>
> but I read in the book "Lucene In Action" that MappingCharFilter is
> inefficient and I'm not sure that I need that. if implementing my own
> involves a lot of coding then I might reso
On Sat, Nov 3, 2012 at 7:47 PM, Igal @ getRailo.org wrote:
> I considered it, and it's definitely an option.
>
> but I read in the book "Lucene In Action" that MappingCharFilter is
> inefficient and I'm not sure that I need that. if implementing my own
> involves a lot of coding then I might reso
I considered it, and it's definitely an option.
but I read in the book "Lucene In Action" that MappingCharFilter is
inefficient and I'm not sure that I need that. if implementing my own
involves a lot of coding then I might resort to it as I don't have large
data sets to index at this time.
On Sat, Nov 3, 2012 at 7:35 PM, Igal @ getRailo.org wrote:
> hi,
>
> I want to make sure that every comma (,) and semi-colon (;) is followed by a
> space prior to tokenizing.
>
> the idea is to then use a WhitespaceTokenizer which will keep commas but
> still split the phrase in a case like:
>
>
hi,
I want to make sure that every comma (,) and semi-colon (;) is followed
by a space prior to tokenizing.
the idea is to then use a WhitespaceTokenizer which will keep commas but
still split the phrase in a case like:
"I bought red apples,green pears,and yellow oranges"
I'm thinking
12 matches
Mail list logo