date:20060919

Re: Analysis/tokenization of compound words

2006-09-19 Thread Daniel Naber

On Tuesday 19 September 2006 22:41, eks dev wrote: > ahh, another one, when you strip suffix, check if last char on remaining > "stem" is "s" (magic thing in German), delete it if not the only > letter do not ask why, long unexplained mistery of German language This is called "Fugenelement" a

Re: lucene based frameworks/servers: solr, nutch, compass - which one is for what????

2006-09-19 Thread Otis Gospodnetic

Hi Vladimir, Yes, you are close. Solr doesn't use SOAP, though, and JSON is only one of its outputs. Solr can be described as a REST-ish web service. You trigger it via HTTP GET requests and responses are XML, or JSON, or something else in the future. I think you are right about Compass, bu

lucene based frameworks/servers: solr, nutch, compass - which one is for what????

2006-09-19 Thread Vladimir Olenin

Hi, Couple of people mentioned here SOLR as a 'new' Lucene based search server. But NUTCH is also Lucene based. Also, there is an OpenSymphony initiative called 'Compass', which is rather an integration framework than server. I wonder if anyone can come up with a small summary of what are scope

Re: Help wanted

2006-09-19 Thread Michael McCandless

Mark Miller wrote: I'll one up you: http://www.manning.com/hatcher2/ Might as well save yourself a whole lot of time and just buy the book. If you're going to use Lucene it might as well be required. There is also "Getting Started" on the Lucene web site: http://lucene.apache.org/java/doc

Re: Help wanted

2006-09-19 Thread Mark Miller

I'll one up you: http://www.manning.com/hatcher2/ Might as well save yourself a whole lot of time and just buy the book. If you're going to use Lucene it might as well be required. Simon Willnauer wrote: Rather than writing some more introductions to lucene I just give you a hand with google

Re: How to filter results below perticular score

2006-09-19 Thread Chris Hostetter

please see the FAQ "Can I filter by score?" ... http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03 : Date: Tue, 19 Sep 2006 14:07:43 +0530 : From: Bhavin Pandya <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, Bhavin Pandya <[EMAIL PROTECTE

Re: Help wanted

2006-09-19 Thread Simon Willnauer

Rather than writing some more introductions to lucene I just give you a hand with google. GoogleQuery: lucene java intro http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html This should lead you to what you are looking for. best regards simon On 9/19/06, S R <[EMAIL PROTECTED]> wrote

Re: Analysis/tokenization of compound words

2006-09-19 Thread eks dev

I just remembered now on minor thing that made our life easier, recusive loop has some primitive stripEndings() method that removes most of variable endings all these ungs/ungen/... before looking up in SuffixTree. This reduces your dictionary needs dramatically. I think this is partially done

Re: Analysis/tokenization of compound words

2006-09-19 Thread eks dev

Hi Otis, Depends what yo need to do with it, if you need this to be only used as "kind of stemming" for searching documents, solution is not all that complex. If you need linguisticly correct splitting than it gets complicated. for the first case: Build SuffixTree with your dictionary (hope you

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Chris Hostetter

The "i" you pass to Hits.score is the index of the result in that Hits object ... the "i" you pass to Searcher.explain should be the absolute docid (the searcher has no way of knowing about your Hits, or what order they are in). Try something like... searcher.explain(disjunctQuery, hits

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me

Forgot to add the hits.score() to print out the hits score. public void explainSearchScore(String indexLocation, DisjunctionMaxQuery disjunctQuery){ IndexSearcher searcher = new IndexSearcher(IndexReader.open(indexLocation)); Hits hits = searcher.search(disjunctQuery);

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me

public void explainSearchScore(String indexLocation, DisjunctionMaxQuery disjunctQuery){ IndexSearcher searcher = new IndexSearcher(IndexReader.open (indexLocation)); Hits hits = searcher.search(disjunctQuery); if(hits == null) return; for(int i = 0; i < hits.leng

Re: DisjunctionMaxQuery explaination

2006-09-19 Thread Chris Hostetter

: In the following output, each hit has two lines. The first line is the hit : score and the second line is the explanation given by the : DisjunctionMaxQuery. how are you printing the Explanation? .. are you using the toString()? can you post a small self contained code example showing how you

DisjunctionMaxQuery explaination

2006-09-19 Thread Find Me

I was trying to print out the score explanation by a DisjunctionMaxQuery. Though there is a hit score > 0 for the results, there is no detailed explanation. Am I doing something wrong? In the following output, each hit has two lines. The first line is the hit score and the second line is the expl

Re: Help wanted

2006-09-19 Thread S R

Thanks Yonik for the reply. What I want is to to index a set of text documents (about 200 .txt files) in windows invironment so I can search in them. What I am doing is actually evaluating different search or indexing tools. Thank you. Yonik Seeley <[EMAIL PROTECTED]> wrote: On

Re: Help wanted

2006-09-19 Thread Yonik Seeley

On 9/19/06, S R <[EMAIL PROTECTED]> wrote: I have just downloaded LUCENE. I am not an expert in Java. Could someone lead me in the first few steps.. The first few steps to what? First, figure out if you want straight lucene-java, or another application using lucene. Lucene is a library that

Help wanted

2006-09-19 Thread S R

Hello, I have just downloaded LUCENE. I am not an expert in Java. Could someone lead me in the first few steps.. Thank you - Do you Yahoo!? Get on board. You're invited to try the new Yahoo! Mail.

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot

Sorry, I sent the message before completing it. On Tuesday 19 September 2006 19:45, Paul Elschot wrote: > On Tuesday 19 September 2006 11:49, karl wettin wrote: > > On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > > How to put limit in lucene that "dont return me any do

Re: How to filter results below perticular score

2006-09-19 Thread Paul Elschot

On Tuesday 19 September 2006 11:49, karl wettin wrote: > On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > How to put limit in lucene that "dont return me any document which has score less than 0.25" > > You implement a HitCollector and break out when you reach such low sco

Re: Analysis/tokenization of compound words

2006-09-19 Thread Marvin Humphrey

On Sep 19, 2006, at 9:21 AM, Otis Gospodnetic wrote: How do people typically analyze/tokenize text with compounds (e.g. German)? I took a look at GermanAnalyzer hoping to see how one can deal with that, but it turns out GermanAnalyzer doesn't treat compounds in any special way at all. O

Re: Analysis/tokenization of compound words

2006-09-19 Thread Jonathan O'Connor

Otis, I can't offer you any practical advice, but as a student of German, I can tell you that beginners find it difficult to read German words and split them properly. The larger your vocabulary the easier it is. The whole topic sounds like an AI problem: A possible algorithm for German (no ide

Re: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Erick Erickson

Glad I actually wrote something helpful .. Memories for filters shouldn't be a problem, filters take up 1 bit per document (plus some tiny overhead for a Bitset). I think the time is actually taken up on the number of terms that match each wildcard as well as the number of terms. Really, I expec

Analysis/tokenization of compound words

2006-09-19 Thread Otis Gospodnetic

Hi, How do people typically analyze/tokenize text with compounds (e.g. German)? I took a look at GermanAnalyzer hoping to see how one can deal with that, but it turns out GermanAnalyzer doesn't treat compounds in any special way at all. One way to go about this is to have a word dictionary and

AW: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Kroehling, Thomas

Thanks for the answer. It is not really necessary for me to read the documents. That's what you get if you find code searching the net and using it without really thinking or understanding it. I will just step through the terms and set the bits as you said. I will add some maximum number of term

Re: Question about termDocs.read(docs, freqs)

2006-09-19 Thread Erick Erickson

I'll side-step the explanations part of your mail since I don't know how to answer.. But a few observations, see below. On 9/19/06, Kroehling, Thomas <[EMAIL PROTECTED]> wrote: Hi, I am trying to write a WildcardFilter in order to prevent TooManyBooleanClauses and high memory usage. I wrap a Fi

Question about termDocs.read(docs, freqs)

2006-09-19 Thread Kroehling, Thomas

Hi, I am trying to write a WildcardFilter in order to prevent TooManyBooleanClauses and high memory usage. I wrap a Filter in a ConstantScoreQuery. I enumerate over the WildcardTerms for a query. This way I can set a maximum number of terms which i will evaluate. If too many terms match, I throw an

Re: How to filter results below perticular score

2006-09-19 Thread karl wettin

On 9/19/06, Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi all, How to put limit in lucene that "dont return me any document which has score less than 0.25" You implement a HitCollector and break out when you reach such low score.

How to filter results below perticular score

2006-09-19 Thread Bhavin Pandya

Hi all, How to put limit in lucene that "dont return me any document which has score less than 0.25" Thanks. Bhavin pandya

Re: Analysis/tokenization of compound words

Re: lucene based frameworks/servers: solr, nutch, compass - which one is for what????

lucene based frameworks/servers: solr, nutch, compass - which one is for what????

Re: Help wanted

Re: Help wanted

Re: How to filter results below perticular score

Re: Help wanted

Re: Analysis/tokenization of compound words

Re: Analysis/tokenization of compound words

Re: DisjunctionMaxQuery explaination

Re: DisjunctionMaxQuery explaination

Re: DisjunctionMaxQuery explaination

Re: DisjunctionMaxQuery explaination

DisjunctionMaxQuery explaination

Re: Help wanted

Re: Help wanted

Help wanted

Re: How to filter results below perticular score

Re: How to filter results below perticular score

Re: Analysis/tokenization of compound words

Re: Analysis/tokenization of compound words

Re: Question about termDocs.read(docs, freqs)

Analysis/tokenization of compound words

AW: Question about termDocs.read(docs, freqs)

Re: Question about termDocs.read(docs, freqs)

Question about termDocs.read(docs, freqs)

Re: How to filter results below perticular score

How to filter results below perticular score

28 matches

Site Navigation

Mail list logo

Footer information