Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Thanks for the advice everyone, I'll try updateDocument() for now. Sean On Thu, Jul 12, 2012 at 3:25 PM, Michael McCandless wrote: > On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer > wrote: >> Sean seriously a couple of hundred docs a second, don't bother just >> use updateDocument. My benchma

Re: delete by docid in lucene 4

2012-07-12 Thread Michael McCandless
On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer wrote: > Sean seriously a couple of hundred docs a second, don't bother just > use updateDocument. My benchmarks show that there is only a smallish > impact during indexing especially with concurrent flushing in lucene > 4. I don't know how resource

Re: Is creating an analyzer expensive?

2012-07-12 Thread Simon Willnauer
You can safely reuse a single analyzer across threads. The Analyzer class maintains ThreadLocal storage for TokenStreams internally so you can just create the analyzer once and use it throughout your application. simon On Thu, Jul 12, 2012 at 10:13 PM, Dave Seltzer wrote: > I have one more quest

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
Sean seriously a couple of hundred docs a second, don't bother just use updateDocument. My benchmarks show that there is only a smallish impact during indexing especially with concurrent flushing in lucene 4. I don't know how resource intensive your analysis chain is but on a decent machine you can

Re: Direct memory footprint of NIOFSDirectory

2012-07-12 Thread Lance Norskog
You can choose another directory implementation. On Thu, Jul 12, 2012 at 1:42 PM, Vitaly Funstein wrote: > Just thought I'd bump this. To clarify - for reasons outside my > control, I can't just run the JVM hosting Lucene-enabled application > with -XX:MaxDirectMemorySize=100G or some other huge

Re: Direct memory footprint of NIOFSDirectory

2012-07-12 Thread Vitaly Funstein
Just thought I'd bump this. To clarify - for reasons outside my control, I can't just run the JVM hosting Lucene-enabled application with -XX:MaxDirectMemorySize=100G or some other huge value for the ceiling and never worry about this. Due to preallocation and other restrictions, this parameter has

Is creating an analyzer expensive?

2012-07-12 Thread Dave Seltzer
I have one more question to pose to the group today: I have several thousand searches being performed against MemoryIndexes on a regular basis. I'd like the ability for each search to choose it's own Analyzer, such that some queries could use a regex pattern, other queries could use the Standard

RE: delete by docid in lucene 4

2012-07-12 Thread Uwe Schindler
Hi Sean, Without checking the performance in your case, it makes no sense to discuss about this. Lucene 4.0 changed a lot, there are several improvements. Please read the following: - Because of the new term dictionary, Term lookups on non-existing terms are fail-fast, they don't do any disk IO i

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
I never used updateDocument() due to ignorance. We are indexing several hundred documents per second, and most of the analysis takes places on the non indexer machines to reduce load on the indexers. For our use case, deleteDocument(int docId) will be faster as there are very few duplicates, but

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
On Thu, Jul 12, 2012 at 6:55 PM, Sean Bridges wrote: > Thanks for the tip. > > Does using updateDocument instead of addDocument affect > indexing/search performance? it does affect index performance compared to add document but that might be minor compared to your analysis chain. I wouldn't worry

RES: BrazilianAnalyzer don't woks with any BooleanQuery

2012-07-12 Thread Marcelo Neves
Ok. I'm using positions at ANALYZED fields where search is by terms. The others fields, "NOT_ANALYZED", the search is by complete term, as culture code, url, document code. The index has documents in three languages (Spanish, English and Portuguese (BR)). When perform a search, I realize filters

Pattern Analyzer

2012-07-12 Thread Dave Seltzer
Hello, I have a search project which uses the Lucene PatternAnalyzer for its text/query analysis. At the moment it's configured like so: analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"), true, null); My goal here was to split words based on spaces and make things case in

RE: delete by docid in lucene 4

2012-07-12 Thread Edward W. Rouse
Constants.DEFAULT_ID_FIELD is the name of our unique documentId. The lucene docId has no purpose for us as we consider it for internal use by lucene only and use our own id for document tracking purposes. > -Original Message- > From: Sean Bridges [mailto:sean.brid...@gmail.com] > Sent: Thu

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Thanks for the tip. Does using updateDocument instead of addDocument affect indexing/search performance? Sean On Thu, Jul 12, 2012 at 9:27 AM, Uwe Schindler wrote: > The trick is to index not with addDocument(Document) but instead with > updateDocument(Term, Document). Lucene then adds the docu

RE: delete by docid in lucene 4

2012-07-12 Thread Uwe Schindler
The trick is to index not with addDocument(Document) but instead with updateDocument(Term, Document). Lucene then adds the document atomically while deleting any previous documents with the given term (which is qour unique ID). If the key does not exist it simply indexes without deleting anything.

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
Does that return a Term which matches the lucene docId? What is the value of Constants.DEFAULT_ID_FIELD ? Thanks, Sean On Thu, Jul 12, 2012 at 6:54 AM, Edward W. Rouse wrote: > I get around this by creating an id based term like: > > new Term(Constants.DEFAULT_ID_FIELD, id) > >> -Original

Re: delete by docid in lucene 4

2012-07-12 Thread Sean Bridges
We have indexer machines which are fed documents by other machines. If an error occurs (machine crashing etc) the same document may be sent to an indexer multiple times. Serial ids are assigned before documents reach the indexer, so a document, may be in the index multiple times, each time with th

Re: about some date store

2012-07-12 Thread Erick Erickson
You can only show that is stored (Field.Store.YES). Only then can you use document.get(...) and get something to display Best Erick On Thu, Jul 12, 2012 at 2:55 AM, sam wrote: > it's take a new problem,what even I seaching,I can only get the first line > data,if the data can be seach.and ,when i

RE: delete by docid in lucene 4

2012-07-12 Thread Edward W. Rouse
I get around this by creating an id based term like: new Term(Constants.DEFAULT_ID_FIELD, id) > -Original Message- > From: Sean Bridges [mailto:sean.brid...@gmail.com] > Sent: Wednesday, July 11, 2012 9:09 PM > To: java-user@lucene.apache.org > Subject: delete by docid in lucene 4 > > Is

Re: BrazilianAnalyzer don't woks with any BooleanQuery

2012-07-12 Thread Simon Willnauer
can you tell us more about your index side of things? Are you using positions in the index since I see PhraseQuery in your code? Where are you passing the text you are searching for to the BrasilianAnalyzer, I don't see it in your code. You need to process you text at search time too to get results

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
On Thu, Jul 12, 2012 at 3:09 AM, Sean Bridges wrote: > Is it possible to delete by docId in lucene 4? I can delete by docid > in lucene 3 using IndexReader.deleteDocument(int docId), but that > method is gone in lucene 4, and IndexWriter only allows deleting by > Term or Query. that is correct.