Thanks for the advice everyone, I'll try updateDocument() for now.
Sean
On Thu, Jul 12, 2012 at 3:25 PM, Michael McCandless
wrote:
> On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer
> wrote:
>> Sean seriously a couple of hundred docs a second, don't bother just
>> use updateDocument. My benchma
On Thu, Jul 12, 2012 at 6:17 PM, Simon Willnauer
wrote:
> Sean seriously a couple of hundred docs a second, don't bother just
> use updateDocument. My benchmarks show that there is only a smallish
> impact during indexing especially with concurrent flushing in lucene
> 4. I don't know how resource
You can safely reuse a single analyzer across threads. The Analyzer
class maintains ThreadLocal storage for TokenStreams internally so you
can just create the analyzer once and use it throughout your
application.
simon
On Thu, Jul 12, 2012 at 10:13 PM, Dave Seltzer wrote:
> I have one more quest
Sean seriously a couple of hundred docs a second, don't bother just
use updateDocument. My benchmarks show that there is only a smallish
impact during indexing especially with concurrent flushing in lucene
4. I don't know how resource intensive your analysis chain is but on a
decent machine you can
You can choose another directory implementation.
On Thu, Jul 12, 2012 at 1:42 PM, Vitaly Funstein wrote:
> Just thought I'd bump this. To clarify - for reasons outside my
> control, I can't just run the JVM hosting Lucene-enabled application
> with -XX:MaxDirectMemorySize=100G or some other huge
Just thought I'd bump this. To clarify - for reasons outside my
control, I can't just run the JVM hosting Lucene-enabled application
with -XX:MaxDirectMemorySize=100G or some other huge value for the
ceiling and never worry about this. Due to preallocation and other
restrictions, this parameter has
I have one more question to pose to the group today:
I have several thousand searches being performed against MemoryIndexes on
a regular basis.
I'd like the ability for each search to choose it's own Analyzer, such
that some queries could use a regex pattern, other queries could use the
Standard
Hi Sean,
Without checking the performance in your case, it makes no sense to discuss
about this. Lucene 4.0 changed a lot, there are several improvements. Please
read the following:
- Because of the new term dictionary, Term lookups on non-existing terms are
fail-fast, they don't do any disk IO i
I never used updateDocument() due to ignorance.
We are indexing several hundred documents per second, and most of the
analysis takes places on the non indexer machines to reduce load on
the indexers. For our use case, deleteDocument(int docId) will be
faster as there are very few duplicates, but
On Thu, Jul 12, 2012 at 6:55 PM, Sean Bridges wrote:
> Thanks for the tip.
>
> Does using updateDocument instead of addDocument affect
> indexing/search performance?
it does affect index performance compared to add document but that
might be minor compared to your analysis chain. I wouldn't worry
Ok. I'm using positions at ANALYZED fields where search is by terms. The others
fields, "NOT_ANALYZED", the search is by complete term, as culture code, url,
document code.
The index has documents in three languages (Spanish, English and Portuguese
(BR)). When perform a search, I realize filters
Hello,
I have a search project which uses the Lucene PatternAnalyzer for its
text/query analysis.
At the moment it's configured like so:
analyzer = new PatternAnalyzer(Version.LUCENE_35, Pattern.compile("\\s+"),
true, null);
My goal here was to split words based on spaces and make things case
in
Constants.DEFAULT_ID_FIELD is the name of our unique documentId. The lucene
docId has no purpose for us as we consider it for internal use by lucene
only and use our own id for document tracking purposes.
> -Original Message-
> From: Sean Bridges [mailto:sean.brid...@gmail.com]
> Sent: Thu
Thanks for the tip.
Does using updateDocument instead of addDocument affect
indexing/search performance?
Sean
On Thu, Jul 12, 2012 at 9:27 AM, Uwe Schindler wrote:
> The trick is to index not with addDocument(Document) but instead with
> updateDocument(Term, Document). Lucene then adds the docu
The trick is to index not with addDocument(Document) but instead with
updateDocument(Term, Document). Lucene then adds the document atomically
while deleting any previous documents with the given term (which is qour
unique ID). If the key does not exist it simply indexes without deleting
anything.
Does that return a Term which matches the lucene docId? What is the
value of Constants.DEFAULT_ID_FIELD ?
Thanks,
Sean
On Thu, Jul 12, 2012 at 6:54 AM, Edward W. Rouse wrote:
> I get around this by creating an id based term like:
>
> new Term(Constants.DEFAULT_ID_FIELD, id)
>
>> -Original
We have indexer machines which are fed documents by other machines.
If an error occurs (machine crashing etc) the same document may be
sent to an indexer multiple times. Serial ids are assigned before
documents reach the indexer, so a document, may be in the index
multiple times, each time with th
You can only show that is stored (Field.Store.YES). Only then can
you use document.get(...) and get something to display
Best
Erick
On Thu, Jul 12, 2012 at 2:55 AM, sam wrote:
> it's take a new problem,what even I seaching,I can only get the first line
> data,if the data can be seach.and ,when i
I get around this by creating an id based term like:
new Term(Constants.DEFAULT_ID_FIELD, id)
> -Original Message-
> From: Sean Bridges [mailto:sean.brid...@gmail.com]
> Sent: Wednesday, July 11, 2012 9:09 PM
> To: java-user@lucene.apache.org
> Subject: delete by docid in lucene 4
>
> Is
can you tell us more about your index side of things? Are you using
positions in the index since I see PhraseQuery in your code?
Where are you passing the text you are searching for to the
BrasilianAnalyzer, I don't see it in your code. You need to process you
text at search time too to get results
On Thu, Jul 12, 2012 at 3:09 AM, Sean Bridges wrote:
> Is it possible to delete by docId in lucene 4? I can delete by docid
> in lucene 3 using IndexReader.deleteDocument(int docId), but that
> method is gone in lucene 4, and IndexWriter only allows deleting by
> Term or Query.
that is correct.
21 matches
Mail list logo