Re: MultiPhraseQuery

2006-03-06 Thread Eric Jain
Daniel Naber wrote: Please try to add this to MultiPhraseQuery and let us know if it helps: public List getTerms() { return termArrays; } That is indeed all I need (the list wouldn't have to be mutable though). Any chance this could be committed? Incidentally, would be helpful if th

Re: sumOfSquaredWeights for lengthNorm

2006-03-06 Thread Chris Hostetter
: > 1) the "boosts" associated with Fields and Documents at indexing time, : > which are combined with the lengthNorm at index time to determine a single : > "norm" value for the doc/field pair. : : I don;t think this is what I want because the lengthNorm is still using : the # of terms. You can

Re: Help interpreting explanation

2006-03-06 Thread Chris Hostetter
: on using Lucene but info for the internal workings of Lucene is hard to : come by. As with many OS code bases: the code is the documentation. : 1) I'm using the default QueryParser to parse and return a query so it's : a Boolean-OR query. So does this mean it uses the DisjunctionSumScorer : or

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-06 Thread Chris Hostetter
: I upgade to 1.9.1 and reindexing : I used NumberTool when I index the number. : : after upgrade I got following error when number range query. : with query The possibility of a TooManyClauses exception has always existed with RangeQuery and numbers, even when using NumberTool. Even if you neve

Re: sumOfSquaredWeights for lengthNorm

2006-03-06 Thread Eugene
Hi, My comments in-line. Chris Hostetter wrote: : I would like to override the Similarity class lengthNorm(String : fieldName, int numTerms) so that it behaves similar to queryNorm(float : sumOfSquaredWeights). So the method signature becomes lengthNorm(String : fieldName, float sumOfSquaredWe

Re: Distributed Lucene..

2006-03-06 Thread Prasenjit Mukherjee
I think nutch has a distributed lucene implementation. I could have used nutch straightaway, but I have a different crawler, and also dont want to use NDFS(which is used by nutch) . What I have proposed earlier is basically based on mapReduce paradigm, which is used by nutch as well. It would

Re: Help interpreting explanation

2006-03-06 Thread Eugene
Thanks, Chris for your clear explanations, it seems there are a lot info on using Lucene but info for the internal workings of Lucene is hard to come by. I got some more questions which I'll ask in-line. Chris Hostetter wrote: : Since i'm using a boolean OR query i figured it must be related

BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-06 Thread Youngho Cho
Hello I upgade to 1.9.1 and reindexing I used NumberTool when I index the number. after upgrade I got following error when number range query. with query 2006-03-07 09:08:03,216 [TP-Processor3] DEBUG com.nannet.jettiger.util.word.JetTigerAdapter - Queryafter=+prntid:56 +(+(+attid:113 +[0

Re: Using NOT queries inside parentheses

2006-03-06 Thread Daniel Noll
Satuluri, Venu_Madhav wrote: Hi, The following query does not work as expected for me: "alwaysTrueField:true (-name:john)" neither does this: "alwaysTrueField:true +(-name:john)" It returns zero results, despite there being many documents without name john. (alwaysTrueField is, needless to say,

Re: Multisearch

2006-03-06 Thread Raul Raja Martinez
Wouldn't it make sense to have a Hit know where he came from such as hit.getIndex() instead of having to invoke subSearcher or subDoc? Just a thought Erik Hatcher wrote: On Mar 6, 2006, at 10:05 AM, WATHELET Thomas wrote: I made a multi search into my Lucene index. It's work properly but I wou

Lucene Merge Algorithm, max number of segments

2006-03-06 Thread Dalton, Jeffery
I am just going to wax philosophical for a minute. I am trying to understand lucene's merging algorithm in depth. Let's say I create an index of 25M web pages on a single machine. While creating this index I am doing both search and indexing / re-indexing at the same time, a bit like Technorat

Re: MultiPhraseQuery

2006-03-06 Thread Erik Hatcher
On Mar 6, 2006, at 4:43 PM, Daniel Naber wrote: On Sonntag 05 März 2006 19:03, Eric Jain wrote: I need to write a function that copies a MultiPhraseQuery and changes the field the query applies to. Unfortunately the API allows access to neither the contained terms nor the field! The other qu

Re: MultiPhraseQuery

2006-03-06 Thread Daniel Naber
On Sonntag 05 März 2006 19:03, Eric Jain wrote: > I need to write a function that copies a MultiPhraseQuery and changes > the field the query applies to. Unfortunately the API allows access to > neither the contained terms nor the field! The other query classes I > have so far dealt with all seem

Re: sumOfSquaredWeights for lengthNorm

2006-03-06 Thread Chris Hostetter
: I would like to override the Similarity class lengthNorm(String : fieldName, int numTerms) so that it behaves similar to queryNorm(float : sumOfSquaredWeights). So the method signature becomes lengthNorm(String : fieldName, float sumOfSquaredWeights) where sumOfSquaredWeights = sum of : the squ

sumOfSquaredWeights for lengthNorm

2006-03-06 Thread Eugene
Hi, I would like to override the Similarity class lengthNorm(String fieldName, int numTerms) so that it behaves similar to queryNorm(float sumOfSquaredWeights). So the method signature becomes lengthNorm(String fieldName, float sumOfSquaredWeights) where sumOfSquaredWeights = sum of the squa

Re: Using NOT queries inside parentheses

2006-03-06 Thread Chris Hostetter
: The following query does not work as expected for me: : "alwaysTrueField:true (-name:john)" : neither does this: : "alwaysTrueField:true +(-name:john)" : Does lucene run a sub-query for each part of the query inside : parentheses, which is why the NOT query that is alone doesn't work? I am Bas

Re: Help on Similarity

2006-03-06 Thread Chris Hostetter
: I tried implementing my own Similarity and setting it in : IndexWriter.setSimilarity(new CosSimilarity()). that only changes the Similarity used by the IndexWriter when writing out hte index files (which is really only used to get the lengthNorm) if you wnat to change the Similarity used at qu

Re: Help interpreting explanation

2006-03-06 Thread Chris Hostetter
: Since i'm using a boolean OR query i figured it must be related to the : BooleanScorer (though there's a more complicated BooleanScorer2 which : I'm not sure when it's use). There's actually three possible scorers used: ConjunctionScorer can be used if all of the clauses are required. Most of

RE: Search for synonyms - implemenetation for review

2006-03-06 Thread Rami Hansenne
Hi, I've been working on a project where Lucene queries were expanded with synonyms/related concepts and used a DisjunctionMaxQuery with lower boost factors for the synonym subqueries. This solved part of the problem, but still a number of annoying side effects remained. I've experimented a little

Using NOT queries inside parentheses

2006-03-06 Thread Satuluri, Venu_Madhav
Hi, The following query does not work as expected for me: "alwaysTrueField:true (-name:john)" neither does this: "alwaysTrueField:true +(-name:john)" It returns zero results, despite there being many documents without name john. (alwaysTrueField is, needless to say, true for all documents). This

Re: Help on Similarity

2006-03-06 Thread Eugene
With respect to the earlier post there seems to be a bug in lucene 1.9.1 I tried using the similarity below and changed idf to: public float idf(int docFreq, int numDocs) { float f = (float)(Math.log((double)numDocs/(double)(docFreq+1) + 1.0)); return f; } Now, when I print the explana

RE: Multisearch

2006-03-06 Thread WATHELET Thomas
Thanks a lot. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 06 March 2006 16:25 To: java-user@lucene.apache.org Subject: Re: Multisearch On Mar 6, 2006, at 10:05 AM, WATHELET Thomas wrote: > I made a multi search into my Lucene index. It's work properly but I > w

Help on Similarity

2006-03-06 Thread Eugene
Hi, I tried implementing my own Similarity and setting it in IndexWriter.setSimilarity(new CosSimilarity()). But, there's something weird, it doesn't seem to call the methods in my Similarity. For example, when I set the idf to return 0.0f the Similarity still gives me a score > 0.0f. How

Re: File Name Search

2006-03-06 Thread Erik Hatcher
If and how you tokenize is entirely dependent on how the queries need to work. Lucene index design really is driven from querying needs backwards. Erik On Mar 6, 2006, at 10:00 AM, Brian wrote: Cool, Basically I have soming similar to: name_division.date_order_code So I'm gue

Re: Multisearch

2006-03-06 Thread Erik Hatcher
On Mar 6, 2006, at 10:05 AM, WATHELET Thomas wrote: I made a multi search into my Lucene index. It's work properly but I would like to know if it's possible to know in witch index de document belong to. This just came up the other day as well, and was covered in the past. Here's the thread w

Re: Search for synonyms - implemenetation for review

2006-03-06 Thread mark harwood
Sounds like you've been tackling a number of the issues I was concerned with "fuzzy" searching. It's essentially the same problem - the user types one word and the engine searches for several variants. The FuzzyLikeThisQuery class in the "queries" module of the contrib area in SVN contains similar

RE: Distributed Lucene..

2006-03-06 Thread Andrew Schetinin
Hi Samuru, No, it is a part of a bigger project (quite small part), and nobody is going to sell parts of it, at least for less than $X00,000 :-) Best Regards, Andrew Schetinin -Original Message- From: Samuru Jackson [mailto:[EMAIL PROTECTED] Sent: Monday, March 06, 2006 5:05 PM To: ja

Re: Distributed Lucene..

2006-03-06 Thread Samuru Jackson
Do you plan to release some kind of a commerical product including an API? I ask because I'm evaluating different technologies for a prototype which is part of my diploma thesis. The problem is that I have to deal with real huge data amounts and one machine is simply not enough to handle those am

Multisearch

2006-03-06 Thread WATHELET Thomas
I made a multi search into my Lucene index. It's work properly but I would like to know if it's possible to know in witch index de document belong to.

Re: Exact Search

2006-03-06 Thread Erik Hatcher
Index the original with only basic tokenization into another field, or index the originals into the same field with a zero position increment to allow for accurate phrase querying. I personally would put the original words into the same field and the same position, along with the lexed to

Re: File Name Search

2006-03-06 Thread Brian
GREAT!! I don't have any questions today, I just wanted to make sure it was possible first. I'll be starting this in a few days (when I get an Okie Dokie...) Then I'm sure I'll have some questions. Thanks for the link and the reply. V/R B --- Volodymyr Bychkoviak <[EMAIL PROTECTED]> wrote: > Yes.

Re: File Name Search

2006-03-06 Thread Brian
Cool, Basically I have soming similar to: name_division.date_order_code So I'm guessing I need to tokenize. Thanks, B --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Mar 6, 2006, at 8:07 AM, Brian wrote: > > Quick Question, > > Is it possible to create an index & search > based > > on

Re: File Name Search

2006-03-06 Thread Erik Hatcher
On Mar 6, 2006, at 8:07 AM, Brian wrote: Quick Question, Is it possible to create an index & search based on file names? Of course. One option is to simply make filename another field. The question is, should it be an exact match on filename for querying? Or should the filename get

Re: File Name Search

2006-03-06 Thread Volodymyr Bychkoviak
Yes. It possible. I've developed such search for our LAN shared files. I'm using technique of rotating filenames to improve wildcard queries performance. details : http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/[EMAIL PROTECTED] Other improvements: WildCardQuery is rewr

RE: Distributed Lucene..

2006-03-06 Thread Andrew Schetinin
Hello, We are implementing a distributed searcher and indexer based on Lucene. I cannot share its code but I may provide hints basing on our experience. What we did basically is having several machines indexing documents and creating small Lucene indexes. We hacked :-) IndexWriter of Lucene to s

Search for synonyms - implemenetation for review

2006-03-06 Thread Andrew Schetinin
Dear all, Me and my college, Mr. Ziv Gome, would like to present here an implementation of synonyms search that we use in our server. Probably it will be interesting for those who worked on synonyms, or going to implement synonyms search. We hope that this mail will raise interesting ideas and wi

Re: Distributed Lucene..

2006-03-06 Thread Samuru Jackson
> Does it make any sense ? Also would like to know if there are other ways > to distribute lucene's indexing/searching ? I'm interested in such a distributed architecture too. What I have got in mind is some kind of lucene index cluster where you have got several machines having subindexes in me

File Name Search

2006-03-06 Thread Brian
Quick Question, Is it possible to create an index & search based on file names? Thanks, B __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---

RE: Exact Search

2006-03-06 Thread Waleed Tayea
I'm working on Arabic text, so the example might not be understandable. What I mean is that with the morphologial analysis the tokens are reduced to their lexems. And those lexems are the terms that are stored in the index. So when I perform an extact search and get a result, I will not be able to

Re: Help interpreting explanation

2006-03-06 Thread Eugene
Hi, Since i'm using a boolean OR query i figured it must be related to the BooleanScorer (though there's a more complicated BooleanScorer2 which I'm not sure when it's use). Looking at the BooleanScorer code it's probably a little over my head as I'm still a beginner to Lucene. But, I woul

Re: How to intergrate snowball in lucene

2006-03-06 Thread Erik Hatcher
On Mar 6, 2006, at 6:30 AM, Haritha_Parvatham wrote: Hi kimber, Thanks for replying my query.I have downloaded the snowball.After building it ,What is the next step.How to inplement snowball in lucene.pls reply Simply use the SnowballAnalyzer that is part of the JAR file that got built. It

RE: How to intergrate snowball in lucene

2006-03-06 Thread Haritha_Parvatham
Hi kimber, Thanks for replying my query.I have downloaded the snowball.After building it ,What is the next step.How to inplement snowball in lucene.pls reply Thanks haritha -Original Message- From: Patrick Kimber [mailto:[EMAIL PROTECTED] Sent: Monday, March 06, 2006 3:52 PM To: java-u

Re: Exact Search

2006-03-06 Thread Erik Hatcher
Could you please provide an example of some sample text, the terms that are emitted by the analyzer, and a query you'd like to work? Erik On Mar 6, 2006, at 5:50 AM, Waleed Tayea wrote: Dear All. How can I perform an exact search on an index constructed with a morphological analyze

Re: carrot2 vs. vivisimo

2006-03-06 Thread Dawid Weiss
Hello, my team has been working for the last couple of days on integrating carrot2 into our project as a sort of src (search result clustering) solution. Great to hear this; is there a public URL or something? i was rather impressed with the results, until i checked out vivisimo's demo and

Exact Search

2006-03-06 Thread Waleed Tayea
Dear All. How can I perform an exact search on an index constructed with a morphological analyzer. Thanks in advance Waleed,

Re: How to intergrate snowball in lucene

2006-03-06 Thread Patrick Kimber
Hi You should download the snowball contribution which is in the SubVersion repository: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/snowball This can be built using ANT. Patrick On 06/03/06, Haritha_Parvatham <[EMAIL PROTECTED]> wrote: > Hi, > Can anyone giude me to intergrate s