Re: lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

2011-10-11 Thread Marc Sturlese
Simon, In this example I've set the DEFAULT_MAX_THREAD_STATES of DocumentsWriterPerThreadPool to 1. I've debugged the code and I've made sure that ThreadAffinityDocumentsWriterThreadPool has the value set to 1 (as I was trying to make it behave similar to lucene 3.4 using a single thread). I'm ind

lucene 4.0 and DocumentsWriterPerThreadPool compared to lucene 3.4

2011-10-11 Thread Marc Sturlese
I'm doing some performance test doing bulk indexing with lucene 4.0 and I'm seeing weird results. I've read http://www.gossamer-threads.com/lists/lucene/java-dev/127190?do=post_view_threaded#127190 but I'm still having doubts. I'm building an index of 1G containing 1 milion docs. When building the

Lucene 3.1 search paralelism per segment doubt

2011-10-10 Thread Marc Sturlese
I've read in another thread (http://lucene.472066.n3.nabble.com/Indexing-slower-in-trunk-td3059836.html#a3062991) /Since Lucene 2.9, Lucene works on a per segment basis when searching. Since Lucene 3.1 it can even parallelize on multiple segments. If you optimize your index you only have one segm

About IndexReader.reopen with very similar indexes

2011-06-21 Thread Marc Sturlese
Hey there, I have a doubt about the behaviour of IndexReader.reopen. I have a tomcat server holding a lucene index over an IndexSearcher. If I move the index.folder to index.folder.old and another index, let's say index.folder.2 to index.folder and then I reopen readers, something weird happen if

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Marc Sturlese
Thanks, so clarifying. As far as I've understood, if I have to end up optimizing the index just after merging it, no matter if I use the lucene 3.X addIndexes or addIndexesNoOptimize as the sum of time of doing both things will be the same in one case or other. Am I right? -- View this message i

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Marc Sturlese
Thanks a lot Shai, couple of questions: >> In Lucene 3x there is a new addIndexes which accepts Directory… that >> simply registers the new indexes in the index, without running merges. >> That makes addIndexes very fast. With the lucene 3.X addIndexes which accepts Directory, if after the mer

performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Marc Sturlese
I am doing some test about merge indexing and have a performance doubt I am doing merge in a simple way, something like: FSDirectory indexes[] = new FSDirectory[indexList.size()]; for (int i = 0; i < indexList.size(); i++) { indexes[i] = FSDirectory.open(new File(indexList

Re: score and multiValued fields

2010-03-17 Thread Marc Sturlese
elevant changes for 3.x... > > I'm pretty sure that your supposition 2 is the right one. > > HTH > Erick > > On Tue, Mar 16, 2010 at 2:58 PM, Marc Sturlese > wrote: > >> >> I would like to know how Lucene deals with the score on multiValued >> fi

score and multiValued fields

2010-03-16 Thread Marc Sturlese
I would like to know how Lucene deals with the score on multiValued fields. I am wandering if: 1) a score is computed per field and the maximum between them wins or 2)all terms of all fields (from the multivalued field) influence eachother to compute the score Let's say I have a document with a m

Re: DisjunctionMaxQuery with tie breaker=1 same as MultiFieldQueryParser?

2010-03-12 Thread Marc Sturlese
Thanks Hoss for the useful info. Acording the coord(q,d) definition it's calculated at document level. It's said: is a score factor based on how many of the query terms are found in the specified document If I am just searching for a term, "ipod" in this case, how would be coord computed? Would i

DisjunctionMaxQuery with tie breaker=1 same as MultiFieldQueryParser?

2010-03-11 Thread Marc Sturlese
Hey there, If I want to search let's say "ipod" in three different fields (device, sound,technology) Would be the same to use a DisjunctionMaxQuery with the tie braker = 1 than to use a MultiFieldQueryParser with and OR to build the boolean queries? As far as I understood in the api documentation

FastVectorHighlighter and query with multiple fields

2010-01-29 Thread Marc Sturlese
I have FastVectorHighlighter working with a query like: title:Ipod OR title:IPad but it's not working when (0 snippets are returned): title:Ipod OR content:IPad Could this be because when FieldQuery is created the query to build it must have just one field? If it's not the case I may be missing

Adding segments to an optimized index

2009-10-28 Thread Marc Sturlese
I am doing some test with optimize and adding segments and I am wondering if someone knows if what I am doing can give document inconsistency. I have 2 folders with one index each. One have a non optimized index1 with 1 milion docs and a mergeFactor=10. The other one, index2 has the same index op

Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Marc Sturlese
Hey there, Until now when using Lucene 2.4 I was always optimizing my index using compound file after updating it. I was doing that because if not I could feel a lot performance loss in search responses. Now in Lucene 2.9 there are per segment readers and I have read something about it performes b

Doubt about Fieldcache.DEFAUL.getStrings[] and Fieldcache.DEFAULT.getStringIndex

2009-09-04 Thread Marc Sturlese
Hey there, I am iterating over a DocSet and for every id I neew to get the value of a field wich is analyzed with KeyworddAnalyzer and is not sored. I have noticed to ways of doing it using Fieldcache. Can someone pleas explain me the pros and contras of using one or another? Using StringIndex:

Re: memory leak with CustomComparatorSource class variables

2009-06-13 Thread Marc Sturlese
/solr/search/MissingStringLastComparatorSource.html >From there I try to link to org.apache.lucene.search.FieldComparatorSource but get a 404 error. Any idea how can I get access to that documentation? Thanks in advance! Michael McCandless-2 wrote: > > On Fri, Jun 12, 2009 at 6:09 PM,

memory leak with CustomComparatorSource class variables

2009-06-12 Thread Marc Sturlese
Hey there, I have noticed I am experiencing sort of a memory leak with a CustomComparatorSource (wich implements SortComparatorSource). I have a HashMap declared as variable of class in CustomComparatorSource: final HashMap docs_to_modify This HashMap contains ids of documents and priorities use

Re: boost and score doubt

2009-04-07 Thread Marc Sturlese
if used) are discarded (have no effect). > > Mike > > On Mon, Apr 6, 2009 at 4:01 PM, Marc Sturlese > wrote: >> >> Hey there, >> Does de function doc.setBoost(x.y) accept negative values or values minor >> than 1?? I mean... it compile and doesn't give e

boost and score doubt

2009-04-06 Thread Marc Sturlese
Hey there, Does de function doc.setBoost(x.y) accept negative values or values minor than 1?? I mean... it compile and doesn't give errors but the behabiour is not exactly what I was expecting. In my use case I have the field title... I want to give very very low relevance to the documents witch t

Re: check if document is deleted using indexwriter

2009-01-22 Thread Marc Sturlese
> processed in bulk when the deletes are flush. So at the time of that > call, IndexWriter does not know how many documents were affected by > the delete. > > But why do you need to check this in the first place? EG searching > will never return to you a deleted do

check if document is deleted using indexwriter

2009-01-21 Thread Marc Sturlese
Hey there, I would like to know how to check if a document has been deleted if I am using an IndexWriter and the fucntions deleteDocument or updateDocument. I have seen that deleteDocument from IndexReader returns an integer but in the IndexWriter's case it's a void. Any advice? Thanks in advance

Re: Boosting fields are searching or indexing time?

2008-12-02 Thread Marc Sturlese
> > > On Nov 30, 2008, at 11:11 AM, Marc Sturlese wrote: > >> >> Hey there, >> I have a simple question about boosting fields, >> I have a lucene indexer app that indexes data from a db. At indexing >> time I >> give different boost to the fields de

Boosting fields are searching or indexing time?

2008-11-30 Thread Marc Sturlese
Hey there, I have a simple question about boosting fields, I have a lucene indexer app that indexes data from a db. At indexing time I give different boost to the fields depending on if the field is title or content. Would it be the same to set the boost at searching time instead of at indexing? I

memory leak getting docs

2008-11-05 Thread Marc Sturlese
Hey there, I have posted about this problem before but I think I didn't explain mysql very well. I'll try to explain my problem inside the context: I get ids from a database and I look for the documents in an index that correspon to each id. There is just one match for every id. One I have the doc

Memory problem dealing with indexsearcher and topdocs

2008-10-25 Thread Marc Sturlese
Hey there, I am having some memory trouble with my Lucene app. I need to get the info and delete about 1000 docs every time I execute the app. I get the IDs of the documents to delete from a database and for all single ID I get the data from the indexed doc using an index searcher and topdocs (sea