Re: Lucene implementation/performance question

2008-11-27 Thread Greg Shackles
The queries I'm doing really aren't anything clever...just searching for phrases on pages of text, sometimes narrowing results by other words that must appear on the page, or words that cannot appear on the same page. I don't have experience with those span queries so i can't say much about them.

Re: SpanFirstQuery is not taking wildcard characters (like *) as a logical operator for the preffix

2008-11-27 Thread Karl Wettin
SpanTermQuery is a TermQuery and not a WildcardQuery. You could use a SpanRegexQuery. You could also make your own SpanWildcardQuery based on either WildcardQuery or SpanRegexQuery. You should probably tell us a bit about the problem you try to solve rather than asking about the solution y

Re: Query time document group boosting

2008-11-27 Thread Karl Wettin
27 nov 2008 kl. 10.15 skrev Toke Eskildsen: On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote: The most scary part is that that you will have to score each and every document that has a source, probably all of the documents in your corpus. I now see my query-logic was flawed. In order t

Re: [OT] About stopwords

2008-11-27 Thread David Causse
Thanks for the tip, but I can't imagine the number of documents google has to join in order process such results... There must be a trick. Maybe stopwords are not indexed alone but twice with previous and next token, some sort of 2-gram index? David. Aleksander M. Stensby a écrit : Your que

SpanFirstQuery is not taking wildcard characters (like *) as a logical operator for the preffix

2008-11-27 Thread naveen.a
Below is a document in lucene -- ID : 1 110_a : library information -- Case 1: Term term1 = new Term("110_a", "library"); SpanFirstQuery spanFirstQuery = new SpanFirstQuery(new SpanTermQuery(term1), 1); Case 2

Re: [OT] About stopwords

2008-11-27 Thread Michael McCandless
That's a phrase search, so it's conceivable google could be doing something similar to nutch, whereby adjacent ngrams are indexed as unique terms. But if you do the same search without quotes: http://www.google.fr/search?hl=fr&q=HOW+at+at+of+a+A+a&btnG=Rechercher&meta= they still find

Re: [OT] About stopwords

2008-11-27 Thread Aleksander M. Stensby
Your query includeds apostrophes which tells google to include common words in the query. But, if you remove the apostrophes, you will still get results, as google states: "Google ignores stop words when they're placed in searches alongside less common words. For example, a search for [ The

Re: Lucene implementation/performance question

2008-11-27 Thread Eran Sevi
Hi Greg, Thanks for quick and detailed answer. What kind of queries do you run? Is it going to work for SpanNearQueries/SpanNotQueries as well? Do you also get the word itself at each position? It would be great if I could search on the content of each payload as well, but since the payload cont

[OT] About stopwords

2008-11-27 Thread David Causse
Hi, Look at this google query : http://www.google.fr/search?q=%22HOW+at+at+of+a+A+a%22 What do you think about that concerning stop words? Google has no stop words? David. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addi

Re: FIltering with booleanFilter

2008-11-27 Thread Albert Juhe
Fantastic, now its working perfect. Thank you, Albert prabin meitei wrote: > > Hi, > > You can use MUST an the end. >Using your code use as > codisFiltre="XX07_04141_00853#XX06_03002_00852#UX06_07019_02994" > String[] codi =codisFiltre.split('#'); > *finalFilter = new BooleanFilter();* >

Re: Query time document group boosting

2008-11-27 Thread Toke Eskildsen
On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote: > The most scary part is that that you will have to score each and every > document that has a source, probably all of the documents in your > corpus. I now see my query-logic was flawed. In order to avoid matching all documents every time,