There is a small problem in your problem formulation and Lucene, Lucene
don't count words, you count terms based on an Analyzer that you have
defined during a phase called IndexWriting, such analyzer will tokenize
(which does not means use the white space between the words) a sequence of
strings
Steve
Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email
addresses as follows: john@mycompany.com.au john.doe
mycompany.com.au john doe mycompany com au com.au.We have an overridden
query parser that swaps out anyaddress: with to, from, cc, bcc, etc.
Inside the overri
Hello,
I would like to use Apache *Lucene 4*.x and count words in the string, for
example:
"I loved cats, but now I really love dogs" - count "love" word in the
String - result should be 2.
I would like to count how many times there was: "give up" in the String as
well.
I spend a lot of time to r
Hi Jamie,
What does EmailFilter do?
Why is the expanded form "required for the UAX29URLEmailTokenizer"? Seems like
an exact match would work on the email address alone, without the expanded
components?
Do you have an example of a query that reproducibly matches more documents than
it shoul
Hi Jamie,
is your Query Filter also implemented by your team? If this is the case, maybe
you are not correctly implementing the random access getDocIdSet(), bits(), or
you don't correctly handle acceptDocs parameter in your own DocIdSet / Filter
implementation, leading to random failures.
Uwe
I beg your pardon. Its our EmailFilter class that emits the tokens. We
do it this way, since users like to search using individual components
of an email address. e.g. joe or mycompany.com.au. I think we may have a
synchronization issue at play. I will perform some further testing and
will get
Jamie,
UAX29URLEmailTokenizer does not emit email components as tokens;
“john@mycompany.com.au” will be tokenized as “john@mycompany.com.au”,
nothing more. That’s why I asked what EmailFilter does.
If the filter really is ignored by Lucene, that would be a bug in Lucene. I
think some
Greetings
We have a problem whereby Lucene 4.7 occasionally does not apply a
filter query during searching. The problem is intermittent. One in
thirty or so searches will return what appears to be an unfiltered
result set. There are no exceptions or errors occurring.. just incorrect
results.