On Tuesday 19 September 2006 22:41, eks dev wrote:
> ahh, another one, when you strip suffix, check if last char on remaining
> "stem" is "s" (magic thing in German), delete it if not the only
> letter do not ask why, long unexplained mistery of German language
This is called "Fugenelement" a
Hi Vladimir,
Yes, you are close. Solr doesn't use SOAP, though, and JSON is only one of its
outputs. Solr can be described as a REST-ish web service. You trigger it via
HTTP GET requests and responses are XML, or JSON, or something else in the
future.
I think you are right about Compass, bu
Hi,
Couple of people mentioned here SOLR as a 'new' Lucene based search server. But
NUTCH is also Lucene based. Also, there is an OpenSymphony initiative called
'Compass', which is rather an integration framework than server.
I wonder if anyone can come up with a small summary of what are scope
Mark Miller wrote:
I'll one up you:
http://www.manning.com/hatcher2/
Might as well save yourself a whole lot of time and just buy the book.
If you're going to use Lucene it might as well be required.
There is also "Getting Started" on the Lucene web site:
http://lucene.apache.org/java/doc
I'll one up you:
http://www.manning.com/hatcher2/
Might as well save yourself a whole lot of time and just buy the book.
If you're going to use Lucene it might as well be required.
Simon Willnauer wrote:
Rather than writing some more introductions to lucene I just give you
a hand with google
please see the FAQ "Can I filter by score?" ...
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03
: Date: Tue, 19 Sep 2006 14:07:43 +0530
: From: Bhavin Pandya <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org, Bhavin Pandya <[EMAIL PROTECTE
Rather than writing some more introductions to lucene I just give you
a hand with google.
GoogleQuery: lucene java intro
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html
This should lead you to what you are looking for.
best regards simon
On 9/19/06, S R <[EMAIL PROTECTED]> wrote
I just remembered now on minor thing that made our life easier, recusive loop
has some primitive
stripEndings() method that removes most of variable endings all these
ungs/ungen/... before looking up in SuffixTree. This reduces your dictionary
needs dramatically. I think this is partially done
Hi Otis,
Depends what yo need to do with it, if you need this to be only used as "kind
of stemming" for searching documents, solution is not all that complex. If you
need linguisticly correct splitting than it gets complicated.
for the first case:
Build SuffixTree with your dictionary (hope you
The "i" you pass to Hits.score is the index of the result in that Hits
object ... the "i" you pass to Searcher.explain should be the absolute
docid (the searcher has no way of knowing about your Hits, or what order
they are in).
Try something like...
searcher.explain(disjunctQuery, hits
Forgot to add the hits.score() to print out the hits score.
public void explainSearchScore(String indexLocation, DisjunctionMaxQuery
disjunctQuery){
IndexSearcher searcher = new
IndexSearcher(IndexReader.open(indexLocation));
Hits hits = searcher.search(disjunctQuery);
public void explainSearchScore(String indexLocation, DisjunctionMaxQuery
disjunctQuery){
IndexSearcher searcher = new IndexSearcher(IndexReader.open
(indexLocation));
Hits hits = searcher.search(disjunctQuery);
if(hits == null) return;
for(int i = 0; i < hits.leng
: In the following output, each hit has two lines. The first line is the hit
: score and the second line is the explanation given by the
: DisjunctionMaxQuery.
how are you printing the Explanation? .. are you using the toString()?
can you post a small self contained code example showing how you
I was trying to print out the score explanation by a DisjunctionMaxQuery.
Though there is a hit score > 0 for the results, there is no detailed
explanation. Am I doing something wrong?
In the following output, each hit has two lines. The first line is the hit
score and the second line is the expl
Thanks Yonik for the reply.
What I want is to to index a set of text documents (about 200 .txt files) in
windows invironment so I can search in them. What I am doing is actually
evaluating different search or indexing tools.
Thank you.
Yonik Seeley <[EMAIL PROTECTED]> wrote: On
On 9/19/06, S R <[EMAIL PROTECTED]> wrote:
I have just downloaded LUCENE. I am not an expert in Java. Could someone lead
me in the first few steps..
The first few steps to what?
First, figure out if you want straight lucene-java, or another
application using lucene.
Lucene is a library that
Hello,
I have just downloaded LUCENE. I am not an expert in Java. Could someone lead
me in the first few steps..
Thank you
-
Do you Yahoo!?
Get on board. You're invited to try the new Yahoo! Mail.
Sorry, I sent the message before completing it.
On Tuesday 19 September 2006 19:45, Paul Elschot wrote:
> On Tuesday 19 September 2006 11:49, karl wettin wrote:
> > On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote:
> > > Hi all,
> > >
> > > How to put limit in lucene that "dont return me any do
On Tuesday 19 September 2006 11:49, karl wettin wrote:
> On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > How to put limit in lucene that "dont return me any document which has
score less than 0.25"
>
> You implement a HitCollector and break out when you reach such low sco
On Sep 19, 2006, at 9:21 AM, Otis Gospodnetic wrote:
How do people typically analyze/tokenize text with compounds (e.g.
German)? I took a look at GermanAnalyzer hoping to see how one can
deal with that, but it turns out GermanAnalyzer doesn't treat
compounds in any special way at all.
O
Otis,
I can't offer you any practical advice, but as a student of German, I can tell you that beginners find it difficult to read German words and split them properly. The larger your vocabulary the easier it is. The whole topic sounds like an AI problem:
A possible algorithm for German (no ide
Glad I actually wrote something helpful ..
Memories for filters shouldn't be a problem, filters take up 1 bit per
document (plus some tiny overhead for a Bitset). I think the time is
actually taken up on the number of terms that match each wildcard as well as
the number of terms.
Really, I expec
Hi,
How do people typically analyze/tokenize text with compounds (e.g. German)? I
took a look at GermanAnalyzer hoping to see how one can deal with that, but it
turns out GermanAnalyzer doesn't treat compounds in any special way at all.
One way to go about this is to have a word dictionary and
Thanks for the answer. It is not really necessary for me to read the documents.
That's what you get if you find code searching the net and using it without
really thinking or understanding it. I will just step through the terms and set
the bits as you said. I will add some maximum number of term
I'll side-step the explanations part of your mail since I don't know how to
answer.. But a few observations, see below.
On 9/19/06, Kroehling, Thomas <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to write a WildcardFilter in order to prevent
TooManyBooleanClauses and high memory usage. I wrap a Fi
Hi,
I am trying to write a WildcardFilter in order to prevent
TooManyBooleanClauses and high memory usage. I wrap a Filter in a
ConstantScoreQuery. I enumerate over the WildcardTerms for a query. This
way I can set a maximum number of terms which i will evaluate. If too
many terms match, I throw an
On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote:
Hi all,
How to put limit in lucene that "dont return me any document which has score less
than 0.25"
You implement a HitCollector and break out when you reach such low score.
Hi all,
How to put limit in lucene that "dont return me any document which has score
less than 0.25"
Thanks.
Bhavin pandya
28 matches
Mail list logo