Unexpected Query Results

2010-02-03 Thread Jamie
Hi I have some unexpected query results. When attempting two queries: 1) All fields, exact phrase query returns 48 hits (priority:"было время" attach:"было время" score:"было время" size:"было время" sentdate:"было время" archivedate:"было время" receiveddate:"было время" from:"было время" t

Retrieving field information for each hit when using "MultiFieldQueryParser"

2010-02-03 Thread prashant ullegaddi
Hi, I'm using MultiFieldQueryParser to search over different fields of documents in the index. Whenever I get a hit for a query, is it possible to know in which field the query match occurred? And is it possible to retrieve the field(s) for each hit? To make things clearer, suppose I have four fi

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
I changed one line below... realized I missed the ! (NOT).. corrected in original reply. if ((hq.Size() < numHits || score >= minScore) && !collectedBaseURLArray.Contains(doc.BaseURL)) { mpolzin wrote: > > > if (score > 0.0f) > { > >

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl

Re: Limiting search result for web search engine

2010-02-03 Thread mpolzin
Hi thanks for the suggestion. I am relatively new to Lucene, so I have a few more questions on this implementation. I looked at the source code for Lucene and found the TopDocCollector class. It appears this class derives from the HitCollector class, so I should be able to simply extend TopDocColl

Where to download Mark Miller's Qsol Parser?

2010-02-03 Thread Chris Harris
The QSol query parser (brief overview here: http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/) used to be available at http://myhardshadow.com/qsol.php (there was documentation as well as a link to a SVN server) but it looks like the myhardshadow.com has been relinquished t

Match span of capitalized words

2010-02-03 Thread Max Lynch
Hi, I would like to do a search for "Microsoft Windows" as a span, but not match if words before or after "Microsoft Windows" are upper cased. For example, I want this to match: another crash for Microsoft Windows today But not this: another crash for Microsoft Windows Server today Is this possib

Re: Sort memory usage

2010-02-03 Thread Jake Mannix
On Wed, Feb 3, 2010 at 1:33 PM, tsuraan wrote: > > The FieldCache loads per segment, and the NRT reader is reloading only > > new segments from disk, so yes, it's "smarter" about this caching in this > > case. > > Ok, so the cache is tied to the index, and not to any particular > reader. The act

Re: Sort memory usage

2010-02-03 Thread tsuraan
> The FieldCache loads per segment, and the NRT reader is reloading only > new segments from disk, so yes, it's "smarter" about this caching in this > case. Ok, so the cache is tied to the index, and not to any particular reader. The actual FieldCacheImpl keeps a mapping from Reader to its terms,

Re: Sort memory usage

2010-02-03 Thread Jake Mannix
The FieldCache loads per segment, and the NRT reader is reloading only new segments from disk, so yes, it's "smarter" about this caching in this case. -jake On Wed, Feb 3, 2010 at 1:07 PM, tsuraan wrote: > Is the cache used by sorting on strings separated by reader, or is it > a global thing?

Sort memory usage

2010-02-03 Thread tsuraan
Is the cache used by sorting on strings separated by reader, or is it a global thing? I'm trying to use the near-realtime search, and I have a few indices with a million docs apiece. If I'm opening a new reader every minute, am I going to have every term in every sort field read into RAM for each

Re: Sort and Collector

2010-02-03 Thread tsuraan
> It's not really possible. > Lucene must iterate over all of the hits before it knows for sure that > it has the top sorted by any criteria (other than docid). > A Collector is called for every hit as it happens, and thus one can't > specify a sort order (sorting itself is actually implemented wit

Re: Sort and Collector

2010-02-03 Thread Yonik Seeley
On Wed, Feb 3, 2010 at 1:40 PM, tsuraan wrote: > Is there any way to run a search where I provide a Query, a Sort, and > a Collector?  I have a case where it is sometimes, but rarely, > necessary to get all the results from a query, but usually I'm > satisfied with a smaller amount.  That part I c

Sort and Collector

2010-02-03 Thread tsuraan
Is there any way to run a search where I provide a Query, a Sort, and a Collector? I have a case where it is sometimes, but rarely, necessary to get all the results from a query, but usually I'm satisfied with a smaller amount. That part I can do with just a query and a collector, but I'd like th

Re: Index corruption using Lucene 2.4.1 - thread safety issue?

2010-02-03 Thread Frank Geary
For the record - I haven't proven this yet - but here's my current theory of what is causing the problem: 1) We start with a new RAMDir IW[0] and do some deletes and adds. 2) We create at least one IndexReader based on that IW. The last of which we'll call IndexReader[A]. 3) Then we switch to usi

Re: Limiting search result for web search engine

2010-02-03 Thread Hayri
Mike Polzin wrote: I am working on building a web search engine and I would like to build a reults page similar to what Google does. The functionality I am looking to include is what I refer to a "rolling up" sites, meaning that even if a particular site (defined by its base URL) has many relevent

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

2010-02-03 Thread Uwe Schindler
Just add the field a second time with Field.Store.YES and Field.Index.NO in original case. For searching ad using the Tokenizer approach as described before using the TokenStream. Internally this is handled exactly like this (if you enable both Field.Index.ANALYZED and Field.Store.YES). -

RE: confused by the lucene boolean query with wildcard result

2010-02-03 Thread java8964 java8964
Thanks for you help. I upgrade the lucene to 2.9.1, the problem is gone. It looks like a boolean query bug in the lucene 2.9.0 and fixed in the 2.9.1 Thanks > From: ian@gmail.com > Date: Wed, 3 Feb 2010 10:02:27 + > Subject: Re: confused by the lucene boolean query with wildcard result

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

2010-02-03 Thread java8964 java8964
Thanks for your help. My concern now is that the field could be defined as store. So when the user receive the field data, we want to still show the original data, in upper case in this case. First, I don't think I can use queryParser.SetLowercaseExpandedTerms(false), which will remove the wi

Re: Searching compressed text using CompressionTools

2010-02-03 Thread Ian Lea
Are you saying that by using compression your index size goes up by a factor of more than 1024? From c10 kilobytes to 12 megabytes? Compressing small fields can cause the index to get bigger rather than smaller but obviously not by that much. -- Ian. On Wed, Feb 3, 2010 at 11:01 AM, Suraj Pari

Re: Searching compressed text using CompressionTools

2010-02-03 Thread Suraj Parida
Ian, Small correction made ... Thanks for solving my previous problems. Now i tested the compression with 100 docs and found: 1. Without Compression size of FS directory (on disk)= 10.8 KB 2. With Compression size of FS directory (on disk) = 12.0 MB and with 500 docs: 1. Without Compres

Re: Searching compressed text using CompressionTools

2010-02-03 Thread Suraj Parida
Ian, Thanks for solving my previous problems. Now i tested the compression with 100 docs and found: 1. With Compression size of FS directory (on disk)= 10.8 KB 2. Without Compression size of FS directory (on disk) = 12.0 MB and with 500 docs: 1. With Compression size of FS directory (on

RE: Getting DF & IDF

2010-02-03 Thread Asif Nawaz
In HotelDatabase project of lucene, Following code is written in performSearch method of SearchEngine class. Let queryString = "Located in the heart of paris" Analyzer analyzer = new StandardAnalyzer(); IndexSearcher is = new IndexSearcher("index"); QueryParser parser = new QueryParser("content

Lucene User Group Meetup in Amsterdam

2010-02-03 Thread Uri Boness
Hi All, On 17th February we'll host the first Dutch Lucene User Group Meetup. This meet-up will be split into two parts: - The first part will be dedicated to the user group itself. We'll have an introduction to the members and have an open discussion about the goals of the user group and th

RE: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

2010-02-03 Thread Uwe Schindler
For specific fields using a special TokenStream chain, there is no need to write a separate analyzer. You can add fields to a document using a TokenStream as parameter: new Field(name, TokenStream). As TokenStream just create a chain from Tokenizer and all Filters like: TokenStream ts = new Key

Re: During the wild card search, will lucene 2.9.0 to convert the search string to lower case?

2010-02-03 Thread Ian Lea
I think you'll have to write your own. Or just downcase the text yourself first. -- Ian. On Tue, Feb 2, 2010 at 9:30 PM, java8964 java8964 wrote: > > Is there an analyzer like keyword analyzer, but will also lowering the data > from lucene? Or I have to do a customer analyzer by myself? > >

Re: confused by the lucene boolean query with wildcard result

2010-02-03 Thread Ian Lea
You should probably be using your PerFieldAnalyzerWrapper in your calls to QueryParser but apart from that I can't see any obvious reason. General advice: use Luke to check what has been indexed and read http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3

Re: How further reward documents matching more query terms?

2010-02-03 Thread Ian Lea
If you read the javadocs and source for DefaultSimilarity you'll know as much about it as I do, and see what the default is. To customize it, write your own subclass as I said before. -- Ian. On Tue, Feb 2, 2010 at 7:56 PM, Phan The Dai wrote: > Dear Lan Lea, > Thanks much for your reply. > P