Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-08 Thread Simon Willnauer
Thank you ryan for pushing on this, being persistent and getting the vote out. On Tue, Sep 8, 2020 at 5:55 PM Ryan Ernst wrote: > This vote is now closed. The results are as follows: > > Binding Results > A1: 12 (55%) > D: 6 (27%) > A2: 4 (18%) > > All Results > A1: 16 (55%) > D: 7 (

Random Index Corruption exceptions during bulk indexing

2017-06-08 Thread simon
ment and workflow, and have managed to get this to throw the exception (the first stack trace above is from a run with that). My semi-informed guess is that this is due to a race condition between segment merges and index updates... -Simon

Customizing levels in DateRangePrefixTree (or using number ranges directly?)

2015-04-09 Thread Simon Rainer
Hi, I'm using DateRangePrefixTree and NumberRangePrefixTreeStrategy to show "time histograms" for my search results. This works perfectly. But is there a way to configure the way the levels are defined in the index? In my case, documents only come with integer ranges (from year X to year Y), b

AW: Lucene Spatial: sort by best fit

2015-04-01 Thread Simon Rainer
oo, thanks! Cheers, Rainer Von: david.w.smi...@gmail.com [david.w.smi...@gmail.com] Gesendet: Mittwoch, 01. April 2015 21:51 An: java-user@lucene.apache.org Betreff: Re: Lucene Spatial: sort by best fit On Wed, Apr 1, 2015 at 3:21 PM, Simon Rainer wrote:

AW: Lucene Spatial: sort by best fit

2015-04-01 Thread Simon Rainer
is feature was non-obvious, I think I may need to make this more prominent from the BBoxStrategy class level javadocs. Did you at least find this strategy? ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Wed, Apr 1, 2015 at 1:49

Lucene Spatial: sorg by best fit

2015-04-01 Thread Simon Rainer
Hi, I'm trying to implement sorting by 'best fit' in Lucene spatial. I.e. I want to query my index for documents that intersect a query rectangle, and get my results sorted by the amount overlap between the query rectangle and the document shape. I was wondering whether this is a use case that

Time range facets on documents associated with a time interval

2015-03-25 Thread Simon Rainer
Hi, I'm trying to implement dynamic range facets in Lucene, along the same lines as in the org.apache.lucene.demo.facet.RangeFacetsExample. However, in my case I'm dealing with documents that don't have a single timestamp, but an interval defined by a start- and end-timestamp. What I'm trying

AW: Can't get distance sorting to work in Lucene Spatial 4.10.3

2015-02-26 Thread Simon Rainer
the maxLevel parameter of the GeoHashPrefixTree (d'oh) and this is what messed things up. Anyways. Issue solved and, lesson learned ;-) Thanks, Rainer Von: Simon Rainer Gesendet: Mittwoch, 25. Februar 2015 17:11 An: java-user@lucene.apache.org Betref

AW: Can't get distance sorting to work in Lucene Spatial 4.10.3

2015-02-25 Thread Simon Rainer
ant it to, testing each time that the sort works. ~ David On Wed, Feb 25, 2015 at 7:18 AM, Simon Rainer wrote: > Hi! > > I have problems getting distance sorting to work in Lucene Spatial. (I'm > using v4.10.3.) I'm following the SpatialExample.java from the Lucene docs.

Can't get distance sorting to work in Lucene Spatial 4.10.3

2015-02-25 Thread Simon Rainer
Hi! I have problems getting distance sorting to work in Lucene Spatial. (I'm using v4.10.3.) I'm following the SpatialExample.java from the Lucene docs. My code is below (it's Scala, but translates 1:1 into Java). When I run the query, results don't seem to be affected by the sorting at all. Ch

[ANNOUNCE] Apache Lucene 4.7.0 released.

2014-02-26 Thread Simon Willnauer
February 2014, Apache Lucene™ 4.7 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.7 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text

[ANNOUNCE] Apache Lucene 4.6 released

2013-11-24 Thread Simon Willnauer
mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searchi

Re: Omitting term frequencies while preserving positions

2013-08-05 Thread Simon Willnauer
sEnum requires you to call nextPos() up to freq() times otherwise the behaviour is undefined. So essentially if you dont' want to take the TF into account in your scoring model you kind of left with changing your similarity. simon On Tue, Aug 6, 2013 at 1:41 AM, Ivan Brusic wrote: > As the sub

Re: lucene 4.3 seems to be much slower in indexing than lucene 3.6?

2013-08-01 Thread Simon Willnauer
one thing I wonder is if you could just publish your benchmark code? simon On Thu, Aug 1, 2013 at 7:45 PM, Michael McCandless wrote: > On Wed, Jul 31, 2013 at 7:17 PM, Zhang, Lisheng > wrote: >> >> Hi Mike, >> >> I retested and results are the same: >> >

Re: MemoryIndex in Lucene 4.x

2013-07-15 Thread Simon Willnauer
hey, can you share your benchmark and/or tell us a little more about how your data looks like and how you analyze the data. There might be analysis changes that contribute to that? simon On Sun, Jul 14, 2013 at 7:56 PM, cischmidt77 wrote: > I use Lucene/MemoryIndex for a large number

Re: ERROR help me please ,org.apache.lucene.search.IndexSearcher.(Ljava/lang/String;)V

2013-05-17 Thread Simon Willnauer
Well IndexSearcher doesn't have a constructor that accepts a string, maybe you should pass in an indexreader instead? simon On Fri, May 17, 2013 at 3:11 PM, fifi wrote: > please,how I can solve this error? > > Exception in thread "main" jav

Re: Deadlock in DocumentsWriterFlushControl

2013-05-15 Thread Simon Willnauer
This seems like a bug caused due to the fact that we moved the CFS building into DWPT. Can you open an issue for this? simon On Wed, May 15, 2013 at 5:50 PM, Sergiusz Urbaniak wrote: > Hi all, > > We have an obvious deadlock between a "MaybeRefreshIndexJob&quo

Re: lucene and mongodb

2013-05-15 Thread Simon Willnauer
there is also elasticsearch (elasticsearch.org) build on top of lucene that might feel more natural if you come from mongo simon On Wed, May 15, 2013 at 11:38 AM, Rider Carrion Cleger wrote: > Thanks you Hendrik, > I'm new with Apache Lucene, the problem that arises is like st

[ANNOUNCE] Apache Lucene 4.3 released

2013-05-06 Thread Simon Willnauer
May 2013, Apache Lucene™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.3 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text searc

Re: How to use TokenStream build two fields

2013-04-23 Thread Simon Willnauer
track that down to this class or debug it but from my perspective we can't really help here. simon On Tue, Apr 23, 2013 at 1:51 PM, 808 wrote: > I am a lucene user from China,so my English is bad.I will try my best to > explain my problem. > The version I use is 4.2.I have a pro

Re: Token Stream with Offsets (Token Sources class)

2013-04-07 Thread Simon Willnauer
hey, first, please don't crosspost! second, can you provide more infos like that part where you index the data. maybe something that is selfcontained? simon On Mon, Apr 8, 2013 at 1:16 AM, vempap wrote: > Hi, > > I've the following snippet code where I'm trying

Re: "4.1 consuming more memory than 3.0.2 while Indexing"

2013-04-01 Thread Simon Willnauer
can you provide some information how much ram you are setting on the index writer config? also how many threads are you using for indexing? simon On Mon, Apr 1, 2013 at 2:21 PM, Arun Kumar K wrote: > Hi Adrien, > > I have seen memory usage using linux command top for RES memory &

Re: Get BitSet from Filter object in 4.1

2013-03-26 Thread Simon Willnauer
You can do Filter#getDocIdSet(reader, acceptedDocs).bits() yet, this method might return null if the filter can not be represented as bits or for other reasons like performance. simon On Tue, Mar 26, 2013 at 10:37 AM, Ramprakash Ramamoorthy wrote: > Team, > > We are migrating

Re: Compression and Highlighter

2013-03-26 Thread Simon Willnauer
^5 ;) On Mon, Mar 25, 2013 at 11:02 PM, Bushman, Lamont wrote: > Thank you very much for the help Simon. I am amazed I was able to accomplish > what I wanted. I didn't store the body in the Index. And I used Highlighter > to return the best fragments by parsing my ori

Re: Assert / NPE using MultiFieldQueryParser

2013-03-25 Thread Simon Willnauer
you know, a lot of what we do is pressing buttons ;) But luckily not everything. simon On Mon, Mar 25, 2013 at 7:19 PM, Erick Erickson wrote: > @Simon > > did I actually catch a reference to: http://xkcd.com/722/ > ??? that's one of my all-time favorites on XKCD, I think

Re: Assert / NPE using MultiFieldQueryParser

2013-03-25 Thread Simon Willnauer
adam, thanks for opening it and reporting the bug! Very much appreciated and definitely 50% of the work. I just pressed buttons until tests passed! simon On Mon, Mar 25, 2013 at 5:37 PM, Adam Rauch wrote: > Thanks, Simon. You've obviously seen (and fixed!) the issue already, but fo

Re: Compression and Highlighter

2013-03-25 Thread Simon Willnauer
not using the FastVectorHighlighter. You can just pass in the string value you wanna highlight no matter if you stored it in lucene or not. You just need to see if that works for you performance wise without storing TV. simon > > Thanks

Re: [ANNOUNCE] Wiki editing change

2013-03-25 Thread Simon Willnauer
ContributorsGroup page - this is a one-time step. please add me to the list "simonwillnauer" simon > > Steve > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For add

Re: Using AnalyzingSuggester with stopwords

2013-03-24 Thread Simon Willnauer
Alex, did you try to get it working with a single term like adding "the foobar" and then drawing suggestions for "the foo" ? simon On Sun, Mar 24, 2013 at 8:51 PM, Alexander Reelsen wrote: > Hey there, > > I am trying to get up some working example with the Analy

Re: Assert / NPE using MultiFieldQueryParser

2013-03-24 Thread Simon Willnauer
Hey, this is in-fact a bug in the MultiFieldQueryParser, can you open a ticket for this please in our bugtracker? MultifieldQueryParser should override getRegexpQuery but it doesn't simon On Sun, Mar 24, 2013 at 3:57 PM, Adam Rauch wrote: > I'm using MultiFieldQueryParser to

Re: Field.Index deprecation ?

2013-03-23 Thread Simon Willnauer
t;> > -Original Message- >> > From: Michael McCandless [mailto:luc...@mikemccandless.com] >> > Sent: Friday, March 22, 2013 9:41 PM >> > To: java-user@lucene.apache.org; simon.willna...@gmail.com >> > Subject: Re: Field.Index deprecation ? >&g

Re: Field.Index deprecation ?

2013-03-22 Thread Simon Willnauer
On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless wrote: > We badly need Lucene in Action 3rd edition! go mike go!!! ;) > > The easiest approach is to use one of the new XXXField classes under > oal.document, eg StringField for your example. > > If none of the existing XXXFields "fit", you can

Re: Lucene reliability as primary store

2013-03-22 Thread Simon Willnauer
ere is a bug, the index will not be corrupted and B is ignored / lost. CheckIndex will not be able to recover your lost docs it will only delete broken segments if you ask it to do so. Once you commit and lucene returned successfully you should also survice a power outage. If you disk is broken then

Re: Segment file clean-up and codecs

2013-03-22 Thread Simon Willnauer
can you send this to d...@lucene.apache.org? simon On Fri, Mar 22, 2013 at 7:52 PM, Ravikumar Govindarajan wrote: > Most of us, writing custom codec use segment-name as a handle and push data > to a different storage > > Would it be possible to get a hook in the codec APIs, w

Re: question about document-frequency in score

2013-03-22 Thread Simon Willnauer
all statistics in lucene are per field so is document frequency simon On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste wrote: > Hi > > I am trying to figure out if the document-frequency of a term used in > calculating the score. Is it per field? Or is independent of the field

Re: Lucene slow performance -- still broke

2013-03-20 Thread Simon Willnauer
quick question, why on earth do you set: lbsm.setMaxMergeDocs(10); if you have 10 docs in a segment you don't want to merge anymore? I don't think you should set this at all. simon On Wed, Mar 20, 2013 at 10:48 PM, Scott Smith wrote: > First, I decided I wasn't comfortab

Re: Overall doc-count in TermStats, during flush...

2013-03-20 Thread Simon Willnauer
statistics of the doc count per field in the index since 4.0 so we can't use the segmetns doc count. hope that helps simon On Wed, Mar 20, 2013 at 1:12 PM, Ravikumar Govindarajan wrote: > This is an internal code I came across in lucene today and unable to >

Re: Lucene slow performance

2013-03-15 Thread Simon Willnauer
ry odd though, do you see file that get actually removed / merged if you call IndexWriter#forceMerge(1) simon > > Thanks > > Scott > > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Friday, March 15, 2013 4:49 PM > To: java-user@lucene.apache

Re: Lucene slow performance

2013-03-15 Thread Simon Willnauer
we went to 4.x. what do you mean what didn't you notice, the slowness or the CFS files? simon > > Scott > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
need to merge more segments but it also means you MP can make better decisions and merge smaller segments first. I am not convinced that this wound not help you. Especially if you keep the background process merging this could be a win overall. simon > > Mike McCandless > > http://blog.mikemcc

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
would you expect > such a solution to perform by comparison? if you can do that against different harddisks that will certainly give you a boost since this process i pretty IO Bound I would guess. simon > > Best regards, > Jan > > > > > > On 7 March 2013 17:44, Michael

Re: Field seems to have become binary field on update to Lucene 4.1

2013-02-19 Thread Simon Willnauer
phew! thanks for clarifying simon On Tue, Feb 19, 2013 at 11:19 PM, Paul Taylor wrote: > On 19/02/2013 20:56, Paul Taylor wrote: >> >> >> Strange test failure after converting code from Lucene 3.6 to Lucene 4.1 >> >> public void testIndexPuid() throws Except

Re: IndexSearcher.close() removed in 4.0

2013-02-19 Thread Simon Willnauer
) which I recommend to use. The process is usually something like this IndexReader indexReader = manager.accquire(); try { IndexSearcher s = new IndexSeacher(indexReader); //do your search } finally { manager.release(indexReader); } hope that helps simon On Mon, Feb 18, 2013 at 11:30 PM

Re: Need Help:How to Get the enumeration of Terms Ending with a given word

2013-02-18 Thread Simon Willnauer
On Thu, Feb 14, 2013 at 11:42 AM, VIGNESH S wrote: > Hi, > > I have two questions > > 1.How to Get the enumeration of Terms Ending with a given word > I saw we can get enumerations of word starting at a given word by > Indexreader.terms(term())) method unless you want to iterate all terms and che

Re: IndexSearcher.close() removed in 4.0

2013-02-18 Thread Simon Willnauer
so close the index reader that was created. Since we removed this constructor we also removed close since it's a no-op. IndexSearcher is just a wrapper to add some functionality on top of the reader. You can ignore the IS#close() if you closing the IndexReader properly. simon [1] http://lucen

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread Simon Willnauer
On Fri, Jan 25, 2013 at 3:29 PM, saisantoshi wrote: > Thanks a lot. If we want to wrap TopScoreDocCollector into > PositiveScoresOnlyCollector. Can we do that? > I need only positive scores and I dont think topscore collector can handle > by itself right? > I guess so! But how do you get neg. sco

Re: how to add attributes to a field just like term's payload ?

2013-01-25 Thread Simon Willnauer
Directory directory = ... final SegmentInfos sis = new SegmentInfos(); sis.read(directory); Map commitUserData = sis.getUserData(); simon On Fri, Jan 25, 2013 at 2:32 AM, wgggfiy wrote: > hello, but there is no getCommitUserData in IndexReader, > how can I get the userdata ??

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-01-25 Thread Simon Willnauer
dInOrder); and use the specialized collector for your settings in the delegate? simon On Thu, Jan 24, 2013 at 11:37 PM, saisantoshi wrote: > Can someone please help us here to validate the above? > > Thanks, > Sai. > > > > -- > View this message in context: &

Re: StoredFieldsFormat / documentation

2013-01-24 Thread Simon Willnauer
ar for the lucene 4 > release. > > Does there exist any further documentation, especially with examples > for the new releases? I don't think we have examples on the wiki for that stuff but we should I guess. simon > > Regards, > > Bernd > >

Re: Tool for Lucene storage recovery

2013-01-18 Thread Simon Willnauer
hey, do you wanna open a jira issue for this and attach your code? this might help others too and if the shit hits the fan its good to have something in the lucene jar that can bring some data back. simon On Fri, Jan 18, 2013 at 6:37 PM, Michał Brzezicki wrote: > in lucene (*.fdt). Code

Re: Excessive use of IOException without proper documentation

2012-11-04 Thread Simon Willnauer
n, people might > be tempted to wrap any Ex w/ LuceneEx, even ArrayIndexOutOfBound etc. I > don't want that. > > If you've hit exceptions which could use better messages, I prefer that we > concentrate on them, rather than complicating the exceptions throwing > mechanism. +1 to all

Re: Excessive use of IOException without proper documentation

2012-11-02 Thread Simon Willnauer
an > up exception management? I'd really like to hear what you have in mind. can you elaborate? simon > > Mike > > - > To unsubscribe, e-mail

Re: 4.0 tokenStream or SimpleAnalyzer bug?

2012-11-02 Thread Simon Willnauer
hey scott, this is intentional see the javadoc step 2: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html simon On Fri, Nov 2, 2012 at 2:07 AM, Scott Smith wrote: > I was doing some tokenizer/filter analysis attempting to fix a bug I have in > highli

Re: How to correctly use SearcherManager#close?

2012-11-01 Thread Simon Willnauer
hey michael, On Thu, Nov 1, 2012 at 11:30 PM, Michael-O <1983-01...@gmx.net> wrote: > Thanks for the quick response. Any chance this could be clearer in the > JavaDoc of this class? sure thing, do you wanna open an issues / create a patch I am happy to commit it. simon > >

Re: Norms and Term Vectors in Lucene 4.0

2012-10-30 Thread Simon Willnauer
nly > option? > > I assume I also have to go through the new Field() if I need to control > TermVectors? > > Where's LIA3 when you need it :) yeah man that is a good question! simon > > Scott - To u

Re: Term Positions added to one document forward

2012-10-29 Thread Simon Willnauer
you should call currDocsAndPositions.nextPosition() before you call currDocsAndPositions.getPayload() payloads are per positions so you need to advance the pos first! simon On Mon, Oct 29, 2012 at 6:44 PM, Ivan Vasilev wrote: > Hi Guys, > > I use the following code to index document

Re: Scoring based on document

2012-10-23 Thread Simon Willnauer
/IndexSearcher.html#termStatistics(org.apache.lucene.index.Term, org.apache.lucene.index.TermContext) http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/IndexSearcher.html#collectionStatistics(java.lang.String) simon On Mon, Oct 22, 2012 at 11:25 PM, Siraj Haider wrote: > I am us

Re: How could i take into account the other part of a field which not matches with the query

2012-10-14 Thread Simon Willnauer
ll use the norm value that every field has (given you don't omit norms) and use it for scroing. look into similarity how to decode / fetch norms. simon > > >> Best Regards >> >> >> E - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: I want to know that why transform numeric to string

2012-09-24 Thread Simon Willnauer
quick answer, Lucene only operates on strings (from a high level perspective) simon On Fri, Sep 21, 2012 at 11:54 AM, 惠达 王 wrote: > hi all: > I want to know that why transform numeric to string? > > public static int longToPrefixCoded(final long val, final int shift, > final

Re: Directory flushing / commit / openIfChanged

2012-08-06 Thread Simon Willnauer
= nrtManager.acquire(); try { IndexReader reader = s.getReader(); // do something } finally { nrtManager.release(s); } from time to time you can prune the mapping for sequence ids that are already flushed. hope that helps simon > > Harald. > > -

Re: questions about DocValues in 4.0 alpha

2012-08-06 Thread Simon Willnauer
hey, On Mon, Aug 6, 2012 at 11:34 AM, Li Li wrote: > hi everyone, > in lucene 4.0 alpha, I found the DocValues are available and gave > it a try. I am following the slides in > http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene

Re: Problem with near realtime search

2012-08-05 Thread Simon Willnauer
Hey Harald, On Sat, Aug 4, 2012 at 7:58 AM, Harald Kirsch wrote: > Hello Simon, > > now that I knew what to search for I found > > http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F > > So that clearly explains this issue for me

Re: Problem with near realtime search

2012-08-03 Thread Simon Willnauer
values. Just keep around the searcher you used and NRTManager / SearcherManager will do the job for you. simon On Fri, Aug 3, 2012 at 3:41 PM, Harald Kirsch wrote: > I am trying to (mis)use Lucene a bit like a NoSQL database or, rather, a > persistent map. I am entering 38000 documents at a r

Re: Analyzer on query question

2012-08-03 Thread Simon Willnauer
tions if you want to create phrase queries etc. just add a PositionIncrementAttribute like this: PositionIncrementAttribute posAttr = stream.addAttribute(PositionsIncrementAttribute.class); pls. doublecheck the code it's straight from the top of my head. simon > > Like I

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Simon Willnauer
ike to follow it AFAIK Robert already created and issue here: https://issues.apache.org/jira/browse/LUCENE-4279 and it seems fixed. Given the massive commit last night its already committed and backported so it will be in 4.0-BETA. simon > > Thanks again > Saroj > > > > >

Re: Usage of NoMergePolicy and its potential implications

2012-07-25 Thread Simon Willnauer
performance and low compression. I think you should really consider fixing your app instead of hacking lucene. simon > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Usage-of-NoMergePolicy-and-its-potential-implications-tp3996630p3996784.html > Sent f

Re: FixedStraightBytesImpl - flushing

2012-07-24 Thread Simon Willnauer
hey SimonM :) On Mon, Jul 23, 2012 at 6:37 PM, Simon McDuff wrote: > > Hello, (LUCENE 4.0.0-ALPHA) > > We are using the DocValues features (very nice). cool! > > We are using FixedBytesRef. > > In that specific case, we were wondering why does it flush at the end (w

FixedStraightBytesImpl - flushing

2012-07-23 Thread Simon McDuff
Hello, (LUCENE 4.0.0-ALPHA) We are using the DocValues features (very nice). We are using FixedBytesRef. In that specific case, we were wondering why does it flush at the end (when we commit) ? Would be more efficient (for memory) to write its buffer as it goes ? Thank you Simon

Re: Flushing Thread

2012-07-21 Thread Simon Willnauer
On Fri, Jul 20, 2012 at 2:43 PM, Simon McDuff wrote: > > Hi Simon W., > See comments below. >> Date: Fri, 20 Jul 2012 11:49:03 +0200> Subject: Re: Flushing Thread >> From: simon.willna...@gmail.com >> To: java-user@lucene.apache.org >> >> hey simon ;

RE: Flushing Thread

2012-07-20 Thread Simon McDuff
Hi Simon W., See comments below. > Date: Fri, 20 Jul 2012 11:49:03 +0200> Subject: Re: Flushing Thread > From: simon.willna...@gmail.com > To: java-user@lucene.apache.org > > hey simon ;) > > > On Fri, Jul 20, 2012 at 2:29 AM, Simon McDuff wrote: > &

Re: Flushing Thread

2012-07-20 Thread Simon Willnauer
hey simon ;) On Fri, Jul 20, 2012 at 2:29 AM, Simon McDuff wrote: > > Thank you Simon Willnauer! > > With your explanation, we`ve decided to control the flushing by spawning > another thread. So the thread is available to still ingest ! :-) (correct me > if I'm wrong)W

RE: Flushing Thread

2012-07-19 Thread Simon McDuff
Thank you Simon Willnauer! With your explanation, we`ve decided to control the flushing by spawning another thread. So the thread is available to still ingest ! :-) (correct me if I'm wrong)We do so by checking the RAM size provided by Lucene! (Thank you!)By putting the automatic flushi

Re: Flushing Thread

2012-07-19 Thread Simon Willnauer
hey, On Thu, Jul 19, 2012 at 7:41 PM, Simon McDuff wrote: > > Thank you for your answer! > > I read all your blogs! It is always interesting! for details see: http://www.searchworkings.org/blog/-/blogs/gimme-all-resources-you-have-i-can-use-them!/ and http://www.searchworki

RE: Flushing Thread

2012-07-19 Thread Simon McDuff
. Correct ? The concurrent flushing will ONLY work when I have many threads adding documents ? (In that case I will need to put a ringbuffer in front) Do I understand correctly ? Did I miss something ? Simon > From: luc...@mikemccandless.com > Date: Thu, 19 Jul 2012 13:02:42 -0400 > Su

Flushing Thread

2012-07-19 Thread Simon McDuff
while the other is flushing ? (I do understand that if my flushing is taking two much time, they will both flush... :-)) Thank you! Simon

RE: Lucene 4.0 .FDT

2012-07-19 Thread Simon McDuff
each time reset is called) Simon > Date: Thu, 19 Jul 2012 16:44:28 +0200 > From: a...@getopt.org > To: java-user@lucene.apache.org > Subject: Re: Lucene 4.0 .FDT > > On 19/07/2012 14:26, Simon McDuff wrote: > > > > I'm using Lucene 4.0. > > &

Lucene 4.0 .FDT

2012-07-19 Thread Simon McDuff
I'm using Lucene 4.0. I'm inserting around 300 000 documents / seconds. We do not have any store fields. But we noticed that .fdt get populated even so. .fdx contains useless informations. .fdt contains only zerouseless... Is there a way to minimize the impact ? Thank you SImon

Re: RAM or SSD...

2012-07-18 Thread Simon Willnauer
On Wed, Jul 18, 2012 at 9:05 PM, Tim Eck wrote: > Rum is an essential ingredient in all software systems :-) Absolutely! :) simon > > -Original Message- > From: Simon Willnauer [mailto:simon.willna...@gmail.com] > Sent: Wednesday, July 18, 2012 11:49 AM > To: java-user

Re: RAM or SSD...

2012-07-18 Thread Simon Willnauer
1. use mmap directory 2. buy rum 3. get an SSD simon :) On Wed, Jul 18, 2012 at 8:36 PM, Vitaly Funstein wrote: > You do not want to store 30 G of data in the JVM heap, no matter what > library does this. > > On Wed, Jul 18, 2012 at 10:44 AM, Paul Jakubik wrote: >> If only

Re: In memory Lucene configuration

2012-07-18 Thread Simon Willnauer
ferent queries (well, some are repeated >> twice or thrice), and includes search time and doc loading (reading the two >> fields I mentioned). The queries are all straight boolean conjunctions, and >> yes, I am dropping the first few queries when calculating averages. >> >&

RE: Indexed BytesRef

2012-07-18 Thread Simon McDuff
ized at all, etc). > > > On Wed, Jul 18, 2012 at 10:35 AM, Simon McDuff wrote: > > > > Hi, > > > > I'm using Lucene 4.0. > > > > I would like to index String, > > but since my system required High volume I need to reuse always the > &

Indexed BytesRef

2012-07-18 Thread Simon McDuff
Hi, I'm using Lucene 4.0. I would like to index String, but since my system required High volume I need to reuse always the same memory. No question to use String. My process receives bytes and I can transform it in BytesRef (representing a String) At the moment, it seems that when I use fiel

Re: In memory Lucene configuration

2012-07-16 Thread Simon Willnauer
try the new G1 collector while it usually only useful for larger heaps: java -server -Xms1G -Xmx1G -Xss128k -XX:+UseG1GC simon On Mon, Jul 16, 2012 at 8:43 AM, Doron Yaacoby wrote: > I haven't tried that yet, but it's an option. The reason I'm waiting on th

Re: In memory Lucene configuration

2012-07-15 Thread Simon Willnauer
at is important for lucene if you use MMap / NIOFS) Your queries are straight boolean conjunctions or do you use positions ie phrase queries or spans? simon > > Any ideas about what could be the ideal configuration for me? > Thanks. > -

Re: Is creating an analyzer expensive?

2012-07-12 Thread Simon Willnauer
You can safely reuse a single analyzer across threads. The Analyzer class maintains ThreadLocal storage for TokenStreams internally so you can just create the analyzer once and use it throughout your application. simon On Thu, Jul 12, 2012 at 10:13 PM, Dave Seltzer wrote: > I have one m

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
achine you can easily go > 20k docs a second with updateDocument. If you want to give deleteByDocid a try for kicks I'd be curious how you solve some of the really tricky issues! :) simon On Thu, Jul 12, 2012 at 10:08 PM, Uwe Schindler wrote: > Hi Sean, > > Without checking the p

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
. I wouldn't worry about updateDocument its the only sensible way to use lucene really. Why didn't you use this before, any reason? What is your ingest rate / doc throughput and where would you get concerned? simon > > Sean > > On Thu, Jul 12, 2012 at 9:27 AM, Uwe Schindler wrote:

Re: BrazilianAnalyzer don't woks with any BooleanQuery

2012-07-12 Thread Simon Willnauer
o get results. simon On Wed, Jul 11, 2012 at 5:32 PM, Marcelo Neves wrote: > Hi all, > > ** ** > > I create a method above que generate my boolean query based in many > parameters. The query's on not analyzed fields works perfect in debug. > > When start a se

Re: delete by docid in lucene 4

2012-07-12 Thread Simon Willnauer
of the same doc? With Lucene 4 relying on the doc id can become very tricky. If you use multiple threads you create a lot of segments which can be merged in any order. You can't tell if a document ID maintains happened-before semantics at all. Can you tell us more about your usecase

Re: index searcher leading to system freeze ?

2012-07-11 Thread Simon Willnauer
are you closing your underlying IndexReaders properly? simon On Wed, Jul 11, 2012 at 5:04 AM, Yang wrote: > I'm running 8 index searchers java processes on a 8-core node. > They all read from the same lucene index on local hard drive. > > > the index contains about 20milli

Re: Problems with hundreds of BLOCKED threads.

2012-07-09 Thread Simon Willnauer
]} arrays. * This class is optimized for small memory-resident indexes. * It also has bad concurrency on multithreaded environments. simon On Sat, Jul 7, 2012 at 1:29 PM, Simon Willnauer wrote: > On Fri, Jul 6, 2012 at 9:28 PM, Leon Rosenberg > wrote: >> Hello, >> >> we ha

Re: Problems with hundreds of BLOCKED threads.

2012-07-09 Thread Simon Willnauer
gt; backoffice system. > > any ideas what is happening here or what we are doing wrong? > > we replaced call to lucene with String.indexOf() to check if the > problem is in our code, it didn't show the problematic behavior. > > Is there a non-blocking search alternative in

Re: about .frq file format in doc

2012-06-27 Thread Simon Willnauer
see definitions: http://lucene.apache.org/core/3_6_0/fileformats.html#Definitions simon On Wed, Jun 27, 2012 at 6:08 PM, Simon Willnauer wrote: > a term in this context is a (field,text) tuple - does this make sense? > simon > > On Wed, Jun 27, 2012 at 11:40 AM, wangjing wr

Re: about .frq file format in doc

2012-06-27 Thread Simon Willnauer
a term in this context is a (field,text) tuple - does this make sense? simon On Wed, Jun 27, 2012 at 11:40 AM, wangjing wrote: > http://lucene.apache.org/core/3_6_0/fileformats.html#Frequencies > > The .frq file contains the lists of documents which contain each term, > along with t

Re: what is the fdx file exactly mean

2012-06-25 Thread Simon Willnauer
see http://lucene.apache.org/core/3_6_0/fileformats.html#field_index for file format documentation. simon On Mon, Jun 25, 2012 at 5:28 AM, wangjing wrote: > .fdx file  contains, for each document, a pointer to its field data. > > BUT fdx is contains pointer to WHAT? it's a poi

Re: lucene (search) performance tuning

2012-05-26 Thread Simon Willnauer
(i'm only > using lucene as a first level "rough" search, so the search quality is not > a huge issue here) , so that, for example, fewer fields are evaluated or a > simpler scoring function is used? are you using disjunction or conjunction queries? Can you make some parts

Re: IndexReader.deleteDocument(Term) in Lucene 3.6/4.0

2012-05-25 Thread Simon Willnauer
anyone please suggest how to solve this issue? Can simply run term > query before, but it seems to be absolutely inefficient. what you can do is use IndexReader#docFreq(Term) to figure out documents that have been deleted / will be deleted by

Re: FilterClause serializable

2012-05-21 Thread Simon Willnauer
we removed almost all serializable from lucene since it was causing many problems and wasn't complete either. users should serialize classes / logic themself or use higher level impls that deal with that already. simon On Mon, May 21, 2012 at 1:05 PM, Lars Gjengedal wrote: > Hi > &

Re: Lucene's internal doc ID space

2012-05-12 Thread Simon Willnauer
should not be used in the application integrating lucene or at least not in a way you would use a primary "auto-incremented" key in a DB. you can specify your own "id" field and reuse the ids (you actually have to if you want to update. does that make sense? simon > > T

Re: weird multifile problems

2012-04-06 Thread Simon Willnauer
is not using CFS. Yet, if you see multiple CFS files then you have an index with more than one segment. Those segments are written during indexing and merged together as the number of segments grows which is just fine. hope that helps. simon On Fri, Apr 6, 2012 at 7:54 AM, Chengcheng Zhao wrote

  1   2   3   4   5   >