Re: lucene anchor-distance based search

2010-11-17 Thread yang Yang
Thank you very much!!! :) I will have a look at the docs . 2010/11/18 Anshum > Hi. > The way you're forming the BooleanQuery seems fine to me (minus the ture > should've been true, and 'm guessing its a typo). > About the geo-spatial search, you may have a look at the various approaches > there

Re: lucene anchor-distance based search

2010-11-17 Thread Anshum
Hi. The way you're forming the BooleanQuery seems fine to me (minus the ture should've been true, and 'm guessing its a typo). About the geo-spatial search, you may have a look at the various approaches there are for the same. Have a look at the contrib module in lucene. http://wiki.apache.org/luc

lucene anchor-distance based search

2010-11-17 Thread yang Yang
We are using the hibernate search which is based on lucene as the search engine to build a full text search for our position-related data in the MYSQL db. This is the main structure of the table(it save the id,coordinate and name of one Surface_Feature): +++-++ | id

Re: KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal
Based on my experimentation and what it says in the Lucene 2nd edition book: "Using a KeywordAnalyzer on special fields during indexing would eliminate the use of Index.NOT_ANALYZED_NO_NORMS during indexing and replace it with Index.ANALYZED." I guess that there is no way to use KeywordAnalyzer du

KeywordAnalyzer and Boosting

2010-11-17 Thread Pulkit Singhal
Greetings! When using KeywordAnalyzer for indexing a field which has the Field.Index.ANALYZED option selected. Does the use of KeywordAnalyzer automatically mean that there is no point in trying to set the index-time boosts on that field in the document because it will be treated as a full token

Re: IndexWriter.close() performance issue

2010-11-17 Thread Mark Kristensson
Sure, There is only one stack trace (that seems to be how the output for this tool works) for java.lang.String.intern: TRACE 300165: java.lang.String.intern(:Unknown line) org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:74) org.apache.lucene.

Re: IndexWriter.close() performance issue

2010-11-17 Thread Michael McCandless
Lucene interns field names... since you have a truly enormous number of unique fields it's expected intern will be called alot. But that said it's odd that it's this costly. Can you post the stack traces that call intern? Mike On Fri, Nov 5, 2010 at 1:53 PM, Michael McCandless wrote: > Hmm...

RE: Search returning documents matching a NOT range

2010-11-17 Thread David Fertig
I noticed there is still no JIRA ticket for this, do we have any type on consensus on how this issue will/will not be resolved? If MultiSearcher and and MultiReader do not give the same results, I would think one would be considered "broken" and/or possibly "unfixable". Is MultiSearcher goin

Re: IndexWriter.close() performance issue

2010-11-17 Thread Mark Kristensson
After a week away, I'm back and still working to get to the bottom of this issue. We run Lucene from the binaries, so making changes to the source code is not something we are really setup to do right now. I have, however, created a trivial Java app that just opens an IndexReader for our proble

Re: asking about index verification tools

2010-11-17 Thread Erick Erickson
How could there be such a tool? Consider the number of ways that a given input stream can be defined. WordDelimiter, Stopwords, synonyms, etc. Eventually, you'd reconstruct all of the logic embedded in the analysis process in your checking program. Then you'd wonder if that was correct. There's qu

Re: uncorrect results

2010-11-17 Thread Simon Willnauer
Jan, can you elaborate the problem a little more. I see you do indextime analysis with lowercasing (look at LowerCaseTokenizer btw.) but you don't do lowercaseing at query time. You could also use the QueryParser to create phrase query automatically though. Could you give us an idea what the "wron

Re: uncorrect results

2010-11-17 Thread Jan
thats what i figured...i can't find out what i'm doing wrong though ;) so the query is "experiment" (i know not really a phrase...but the assignment requested precisely so). The program constructs the following query +(AbstractText:"experiment" ArticleTitle:"experiment") which looks good to me.

Re: uncorrect results

2010-11-17 Thread Donna L Gresh
As it is probably more likely that you're doing something incorrect than that Lucene is reporting incorrect results :), it might help if you reported the exact query that is being submitted to the IndexSearcher, and then showing us the document that was incorrectly returned. My guess is that ei

uncorrect results

2010-11-17 Thread Jan
Hi, i have an assignment in my Text Analytics class. I am supposed to create an index and search it. The corpus is a PubMed-like XML file. it is possible to query terms (programcall a few terms) and phrases (programcall "a phrase"). When a phrase is queried the program should answer how often the

Re: asking about index verification tools

2010-11-17 Thread Yakob
yes you're correct.but I was just wondering my chances here though. are there any tools that do this crosschecking of index?or else when you make a search engine then you just feel complacent about it and feel the crosschecking of index isn't really necessary? what do you do in this situation? :-)

Re: asking about index verification tools

2010-11-17 Thread Anshum
Lance, CheckIndex would only check for the sanity of the index and not really if all words from the source got added into the index or not. CheckIndex would only check for corrupt indexes and in the process also take a lot of time. Perhaps what Yakob wanted here is just a cross check between the in

Re: asking about index verification tools

2010-11-17 Thread Lance Norskog
The Lucene CheckIndex program does this. It is a class somewhere in Lucene with a main() method. Samarendra Pratap wrote: It is not guaranteed that every term will be indexed. There is a limit on maximum number of terms (as in lucene 3.0 and may be earlier too) per field. Check out this http://