Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-16 Thread Mike Sokolov
It sounds me as if there could be a market for a new kind of query that would implement: A w/5 (B and C) in the way that people understand it to mean - the same A near both B and C, not just any A. Maybe it's too hard to implement using rewrites into existing SpanQueries? In term of the Pos

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-16 Thread Trejkaz
On Thu, May 17, 2012 at 7:11 AM, Chris Harris wrote: > but also crazier ones, perhaps like > > agreement w/5 (medical and companion) > (dog or dragon) w/5 (cat and cow) > (daisy and (dog or dragon)) w/25 (cat not cow) [skip] Everything in your post matches our experience. We ended up writing some

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-16 Thread Ahmet Arslan
> medical w/5 agreement > (medical w/5 agreement) and (doctor w/10 rights) > > but also crazier ones, perhaps like > > agreement w/5 (medical and companion) > (dog or dragon) w/5 (cat and cow) > (daisy and (dog or dragon)) w/25 (cat not cow) This syntax reminds me Surround. http://wiki.apache.o

Approches/semantics for arbitrarily combining boolean and proximity search operators?

2012-05-16 Thread Chris Harris
I'm working on a product for librarians and similar people, who apparently expect to be able to combine classic boolean operators (i.e. AND, OR, NOT) with proximity operators (especially w/n and pre/n -- which basically map to unordered and ordered SpanQueries with slop n, respectively) in unrestri

Optional Terms

2012-05-16 Thread Meeraj Kunnumpurath
Hi, I have the following documents Document doc1 = new Document(); doc1.add(new Field("searchText", "ABC Takeaway f...@company.com f...@company.com", Field.Store.YES, Field.Index.ANALYZED)); Document doc2 = new Document(); doc2.add(new Field("searchText", "XYZ Takeaway f...@company.com", Field.St

Re: Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
Also, if I do the below Query q = new QueryParser(Version.LUCENE_35, "searchText", analyzer).parse("Takeaway f...@company.com^100") I get them in reverse order. Do I need to boost the term, even if it appears more than once in the document? Regards Meeraj On Wed, May 16, 2012 at 9:52 PM, Meeraj

Re: Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
This is the output I get from explaining the plan .. Found 2 hits. 1. XYZ Takeaway f...@company.com 0.5148823 = (MATCH) sum of: 0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of: 0.57735026 = queryWeight(searchText:takeaway), product of: 0.5945349 = idf(docFreq=2, maxDo

Re: Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
The actual query is Query q = new QueryParser(Version.LUCENE_35, "searchText", analyzer).parse("Takeaway f...@company.com"); If I use Query q = new QueryParser(Version.LUCENE_35, "searchText", analyzer).parse(" f...@company.com"); I get them in the reverse order. Regards Meeraj On Wed, May 16

Re: Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
I have tried the same using Lucene directly with the following code, import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.util.Version; import org

Re: Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
Thanks Ivan. I don't use Lucene directly, it is used behind the scene by the Neo4J graph database for full-text indexing. According to their documentation for full text indexes they use white space tokenizer in the analyser. Yes, I do get Listing 2 first now. Though if I exclude the term "Takeaway

Re: Search Ranking

2012-05-16 Thread Ivan Brusic
Use the explain function to understand why the query is producing the results you see. http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query, int) Does your current query return Listing 2 first? That might be because of term fre

Search Ranking

2012-05-16 Thread Meeraj Kunnumpurath
Hi, I am quite new to Lucene. I am trying to use it to index listings of local businesses. The index has only one field, that stores the attributes of a listing as well as email addresses of users who have rated that business. For example, Listing 1: "XYZ Takeaway London f...@company.com bar...@

Re: Memory question

2012-05-16 Thread Chris Bamford
Thanks everyone. Looks like I have lots of reading to do :-) -Original Message- From: Nader, John P To: java-user@lucene.apache.org Sent: Wed, 16 May 2012 16:27 Subject: Re: Memory question Another good link is http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.ht

Re: Memory question

2012-05-16 Thread Nader, John P
Another good link is http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html, which also includes details on iCMS, which is the Incremental Mode for CMS. On 5/15/12 6:32 PM, "Lutz Fechner" wrote: >CMS is the concurrent mark sweep garbage collector. Instead of waiting >for the memor

Re: Memory question

2012-05-16 Thread Christoph Kaser
Another option to consider is to *decrease* the JVM maximum heap size. This in effect leaves more memory for swapped in mmio pages and decreases the GC effort, which might increase system performance and stability. Regards, Christoph Am 15.05.2012 21:38, schrieb Chris Bamford: Thanks Uwe. W