Re: highlighting performance

2011-06-21 Thread Michael Sokolov
OK - it seems as if there is a blow-up in FieldPhraseList if a document has a large number of occurrences of a term that is in the query. In one example, I searched for "1", and this occurs just under 2000 times in one of my test documents (as the value of HTML attributes). Admittedly a weird

any optimizations I can make on this code

2011-06-21 Thread Hiller, Dean x66079
I am running over a 100 million row nosql set and unfortunately building 1 million indexes. Each row I get may or may not be for the index I just wrote too so I can't keep IndexWriter open very long. I am currently simulating how long it would take me to build all the indexes and it looks like

Re: highlighting performance

2011-06-21 Thread Michael Sokolov
I did that, and the benchmark indicates FVH is 10x faster than Highlighter now. I ran with a subset of the wikipedia data since I didn't want to deal with the whole thing. I'm trying to reconcile these weirdly varying results. One difference is that the benchmark doesn't use PhraseQueries -

Re: Solution for FHV and NGram Max Min Gram Restriction

2011-06-21 Thread Koji Sekiguchi
(11/06/22 2:03), Anupam Tangri wrote: Hi, We are using lucene 3.2 for our project where I needed to highlight search matches. I earlier used default highlighter which did not work correctly all the time. So, I started using FHV which worked worked beautifully till I started searching multiple t

I have seen this exception on some posts around but don't see the cause/solution(RamDirectory)..

2011-06-21 Thread Hiller, Dean x66079
Anyone know how to do a simple RamDirectory...I just created it but it is failing with this... Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@1d5a7f6 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@64804:

Re: question about wildcards

2011-06-21 Thread Danny Lade
IMO, a "reversed word Index" does not work in this case, because he's looking for a word in the middle (See curi*). Another idea is to build word chunks and save them in a second index plus docID of the first index. e.g. security go to "security ecurity curity ... ity" This is much faster to

RE: anyway to store value as bytes?

2011-06-21 Thread Hiller, Dean x66079
Sweet, thanks hmmm, yeah, you should have just given me a http://lmgtfy.com/ link ;)...darn, should have found that one myself. I am storing pretty small data. Thanks, Dean -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, June 21, 2011 5:39 AM

RE: question about wildcards

2011-06-21 Thread Hiller, Dean x66079
I wonder if you would be better off creating a second index with the words reversed.depends on your application profile I guess and what you want, but an additional index may not be too bad in some cases to speed up the search. Dean -Original Message- From: G.Long [mailto:jde...@gma

Re: questions about fieldCache

2011-06-21 Thread Erick Erickson
> So action that starts a new searcher and closes the old one (like > replication) > should release cache from fieldCache through garbage collection? Absolutely. It won't be immediate, because the JVM has some heuristics it uses to initiate garbage collection. You could try attaching to the Solr i

why is query picking up extra result

2011-06-21 Thread Hiller, Dean x66079
Because when I use [ 20110601 TO * ] lucene does not return my results greater than 20110601 but when I use [20110601 TO ], it works fine. Why is this? How do I get everything larger than 20110601. I have another Case of sequence numbers and want to get everything above a certain numbe

Solution for FHV and NGram Max Min Gram Restriction

2011-06-21 Thread Anupam Tangri
Hi, We are using lucene 3.2 for our project where I needed to highlight search matches. I earlier used default highlighter which did not work correctly all the time. So, I started using FHV which worked worked beautifully till I started searching multiple terms. On digging I came to know FHV has

Re: About IndexReader.reopen with very similar indexes

2011-06-21 Thread Michael McCandless
Reopening is based entirely on the latest segments_N file present in the index. Lucene loads that file and checks if it refers to any new segments not already open and if so opens those new ones. And segments in common with what the reader already has open (ie same segment name) are simply reused

Re: ComplexPhraseQueryParser with multiple fields

2011-06-21 Thread lichman
Say, Which of the solutions did you find to work better? Can you please say which package should I change it to if I choose to do it that way? Thanks, Moshe Lichman. -- View this message in context: http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-with-multiple-fields-tp2879290p30902

About IndexReader.reopen with very similar indexes

2011-06-21 Thread Marc Sturlese
Hey there, I have a doubt about the behaviour of IndexReader.reopen. I have a tomcat server holding a lucene index over an IndexSearcher. If I move the index.folder to index.folder.old and another index, let's say index.folder.2 to index.folder and then I reopen readers, something weird happen if

Re: question about wildcards

2011-06-21 Thread G.Long
Thank you for the tip :) I'll try it. Regards, Gary Le 21/06/2011 17:38, Ian Lea a écrit : See the javadocs for QueryParser.setAllowLeadingWildcard(boolean allowLeadingWildcard). And from the FAQ, see http://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_support_is_available_from_

Re: question about wildcards

2011-06-21 Thread Ian Lea
See the javadocs for QueryParser.setAllowLeadingWildcard(boolean allowLeadingWildcard). And from the FAQ, see http://wiki.apache.org/lucene-java/LuceneFAQ#What_wildcard_search_support_is_available_from_Lucene.3F Be sure to heed the warnings about performance. -- Ian. On Tue, Jun 21, 2011 at 4:

question about wildcards

2011-06-21 Thread G.Long
Hi :) I've got the following text indexed with simpleAnalyzer : "security is a real problem." If I try to search for secu*, it will find the document. But if I try to search for curi*, there are no results. I raed that it's not possible to add a * wildcard at the begining of the query so wh

RE: need help

2011-06-21 Thread karl.wright
You might want to look at ManifoldCF too. http://incubator.apache.org/connectors/ Karl -Original Message- From: ext Marlen [mailto:zmach...@facinf.uho.edu.cu] Sent: Tuesday, June 21, 2011 9:49 AM To: java-user@lucene.apache.org Subject: need help I need to create a search engine that s

RE: need help

2011-06-21 Thread Vinaya Kumar Thimmappa
Hello Cheta, Check this site : http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ Vinaya -Original Message- From: Marlen [mailto:zmach...@facinf.uho.edu.cu] Sent: Tuesday, June 21, 2011 7:19 PM To: java-user@lucene.apache.org Subject: need help I need to create a search engi

need help

2011-06-21 Thread Marlen
I need to create a search engine that searches on a intranet and my FTP .. .. I want to use Lucene as search engine .. my question is: which best fits my needs .. Nutch or Solr? thank a lot cheta On 13/06/2011 9:17, Ian Lea wrote: Hello Lucene can be used for searching pretty much anything.

Re: questions about fieldCache

2011-06-21 Thread Bernd Fehling
Currently I'm using version 3.2. I used already 4.x some month ago but there was to much change to that time so I decided to go with 3.0.x and updated to 3.1 and now to 3.2. I'm still dealing with my fieldCache OOM issue and want to understand why things are as they are. I have already removed/s

Re: questions about fieldCache

2011-06-21 Thread Erick Erickson
Hmmm, I'm not going to even try to talk about the code itself, but I will add a couple of clarifications: Jetty has nothing to do with it. It's in Lucene, and it's used for sorting and sometimes faceting. The cache is associated with a reader on a machine used to search. When replication happens,

Re: anyway to store value as bytes?

2011-06-21 Thread Erick Erickson
Does this help? http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/util/IndexableBinaryStringTools.html If not, here's a note from Ryan McKinley on another thread (googling lucene storing binary data brought it up)... ** You can store binary data using a binary field type -- then y

questions about fieldCache

2011-06-21 Thread Bernd Fehling
I'm trying to understand the logic of/behind fieldCache. Who has written this peace of code or has good knowledge about it? Why is it under the hood of jetty? I see FieldCache$StringIndex with - f_dccollection - f_dcyear - f_dctype but also - dctitle --> f_dctitle --> f_dccreator - title --> f_

Re: getting OutOfMemoryError

2011-06-21 Thread Ian Lea
Complicated with all those indexes. 3 suggestions: 1. Just give it more memory. 2. Profile it to find out what is actually using the memory. 3. Cut down the number of indexes. See recent threads on pros and cons of multiple indexes vs one larger index. -- Ian. On Mon, Jun 20, 2011 at 2: