tanks erick
i have got the latest INDEXER example from lia2 working properly
thanks a lot
Seid M
On 2/19/09, Michael McCandless wrote:
>
> The early access version of LIA2 (accessible at
> http://www.manning.com/hatcher3/)
> has updated this example to work with recent Lucene releases (though
>some changes were made to the StandardTokenizer.jflex grammer (you can svn
>diff the two URLs fairly trivially) to better deal with correctly >identifying
>word characters, but from what i can tell that should have reduced the number
>of splits, not increased them.
>
>it's hard to tell from you
: In 2.3.2 if the token �Co�mo� came through this it would get changed to
: �como� by the time it made it through the filters.In 2.4.0 this isn�t
: the case. It treats this one token as two so we get �co� and �mo�.So
: instead of search �como� or �Co�mo� to get all the hits we now have t
The explanation of scores from the same document returned from 2 similar
queries differ in an unexpected way. There are 2 fields involved, 'contents'
and 'literals'. The 'literals' field has setBoost = 0. As you an see from
the explanations below, the total weight of the matching terms from the
'li
Hi,
Has there been any work done on getting confidence scores at runtime, so
that scores of documents can be compared across queries? I found one
reference in the mailing list to some work in 2003, but couldn't find any
follow-up:
http://osdir.com/ml/jakarta.lucene.user/2003-12/msg00093.html
Yusuf,
You are 100% correct it is bad that this uses a custom tokenizer.
this was my motivation for attacking it from this angle:
https://issues.apache.org/jira/browse/LUCENE-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
(unfinished)
otherwise, at some point jflex rules
I'm not sure why using a PhraseQuery allows you to search within a
sentence. PhraseQuery just makes sure that the terms appear next to
each other (or within some slop), but it isn't aware of sentence or
paragraph boundaries.
See
http://www.lucidimagination.com/search/document/6a5dfb8df2ce
It's been a few years since I've worked on Arabic, but it sounds
reasonable. Care to submit a patch with unit tests showing the
StandardTokenizer properly handling all Arabic characters? http://wiki.apache.org/lucene-java/HowToContribute
On Feb 20, 2009, at 6:22 AM, Yusuf Aaji wrote:
Hi
Hi Everyone,
My question is related to the arabic analysis package under:
org.apache.lucene.analysis.ar
It is cool and it is doing a great job, but it uses a special tokenizer:
ArabicLetterTokenizer
The problem with this tokenizer is that it fails to handle emails, urls
and acronyms the