Re: Lucene in action

2023-06-10 Thread Mark Miller
Nature abhors being anything but an author by name on a second tech book. The ruse is up after one when you have the inputs crystalized and the hourly wage in hand. Hard to find anything but executive producers after that. I’d shoot for a persuasive crowdfunding attempt.

[ANNOUNCE] Apache Lucene 4.10.3 released

2014-12-29 Thread Mark Miller
case, please try another mirror. This also goes for Maven access. Happy Holidays, Mark Miller http://www.about.me/markrmiller - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

[ANNOUNCE] Apache Lucene 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Lucene™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.5.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable

Re: Regarding Compression Tool

2013-09-16 Thread Mark Miller
Have you considered storing your indexes server-side? I haven't used compression but usually the trade-off of compression is CPU usage which will also be a drain on battery life. Or maybe consider how important the highlighter is to your users - is it worth the trade-off of either disk space or bat

[ANNOUNCE] Apache Lucene 4.2.1 released

2013-04-03 Thread Mark Miller
April 2013, Apache Lucene™ 4.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.2.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-te

Re: Luke?

2013-03-15 Thread Mark Miller
If anyone is able to donate some effort, a nice future scenario could be that Luke comes fully up to date with every Lucene release: https://issues.apache.org/jira/browse/LUCENE-2562 - Mark On Mar 15, 2013, at 5:58 AM, Eric Charles wrote: > For the record, I happily use Luke (with Lucene 4.1)

Re: Lucene 4.1 tentative release

2012-12-12 Thread Mark Miller
We are hoping for 4.1 very soon! With the holidays it will be difficult to say - but 4.1 talk has been going on for some time now. Its really a matter of wrapping up some short term work and getting some guys to do the release work. I dont think anyone can give you a date, but it's certainly in

Re: "read past EOF" when merge

2012-11-03 Thread Mark Miller
Can you file a JIRA Markus? This is probably related to the new code that uses Directory for replication. - Mark On Nov 2, 2012, at 6:53 AM, Markus Jelsma wrote: > Hi, > > For what it's worth, we have seen similar issues with Lucene/Solr from this > week's trunk. The issue manifests itself w

Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-08 Thread Mark Miller
While we are in constant sync due to the merge, lucene would still be updated multiple times before a solr 4 release, and it would be subject to happen at any time - so its really not any different. On Wednesday, December 7, 2011, Jamie Johnson wrote: > Yeah, biggest issue for us is we're using t

Re: ElasticSearch

2011-11-17 Thread Mark Miller
The XML query parser can map to Lucene one to one as well - hasn't seemed to pick up enough steam to be included with Solr yet, but there has been some commotion so it's likely to go in at some point. Not enough demand yet I guess. https://issues.apache.org/jira/browse/SOLR-839 XML Query Parser Sup

Re: optimize with num segments > 1 index keeps growing

2011-09-12 Thread Mark Miller
> we should correct the javadocs for expungeDeletes here I think: so > that its more consistent with the javadocs for optimize? > > "Requests an expunge operation..." ? > +1 - it's a documentation bug now. - Mark Miller lu

Re: implicit closing of an IndexWriter

2011-07-26 Thread Mark Miller
On Jul 26, 2011, at 9:52 AM, Clemens Wyss wrote: > Side note: I am using threads when writing and theses threads are (by design) > interrupted (from time to time) Perhaps you are seeing this: https://issues.apache.org/jira/browse/LUCENE-2239 - Mark Miller lucidimaginati

Re: Search within a sentence (revisited)

2011-07-26 Thread Mark Miller
case tests like I likely should try if I was going to commit this thing. - Mark Miller lucidimagination.com On Jul 26, 2011, at 8:56 AM, Peter Keegan wrote: > Thanks Mark! The new patch is working fine with the tests and a few more. If > you have particular test cases in mind, I'd

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
y use even more tests before feeling too confident here… I've attached a patch for 3X with the new test and fix (changed that include back to exclude). - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:29 AM, Mark Miller wrote: > Thanks Peter - if you supply the unit tests, I'

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
Thanks Peter - if you supply the unit tests, I'm happy to work on the fixes. I can likely look at this later today. - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > Hi Mark, > > Sorry to bug you again, but there's another case that

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
I just uploaded a patch for 3X that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: > Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change > that to an IndexReader I believe. > > - Mark > > On Jul 21, 2011, at 4:01 PM, Pe

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
t there. > > Peter > > On Thu, Jul 21, 2011 at 3:07 PM, Mark Miller wrote: > >> Hey Peter, >> >> Getting sucked back into Spans... >> >> That test should pass now - I uploaded a new patch to >> https://issues.apache.org/jira/browse/LUCENE-777

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
.length, 1); > > clauses[1] = makeSpanTermQuery("3"); > allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // > SpanAndQuery equivalent > query = new SpanWithinQuery(allKeywords, endSentence, 0); > System.out.println("query: "+query); > hits =

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: > > On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > >> Mark Miller's 'SpanWithinQuery' patch >> seems to have the same issue. > > If I remember right (It's been more the a couple years),

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
nd I think the limitation that I ate was that the word could belong to both it's true sentence, and the one after it. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache

Re: Questions on index Writer

2011-07-16 Thread Mark Miller
My advice: Don't close the IndexWriter - just call commit. Don't worry about forcing merges - let them happen as they do when you call commit. If you are going to use the IndexWriter again, you generally do not want to close it. Calling commit is the preferred option. - M

[Announce] Lucene-Eurocon Call for Participation Closes Friday, JULY 15

2011-07-12 Thread Mark Miller
e Lucene EuroCon 2011 is presented by Lucid Imagination, the commercial entity for Apache Solr/Lucene Open Source Search; proceeds of the conference benefit The Apache Software Foundation. "Lucene" and "Apache Solr" are trademarks of the Apache Software Foundation. - Mark

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Mark Miller
On Jul 8, 2011, at 5:43 AM, Jahangir Anwari wrote: > I don't think this is the best > solution, am open to other alternatives. Could also make it static public where it is? Either way. - Mark Miller lucidimag

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-07 Thread Mark Miller
sp); } else if (query instanceof TermQuery) { - extractWeightedTerms(terms, query); + extractWeightedSpanTerms(terms, new SpanTermQuery(((TermQuery)query).getTerm())); } else if (query instanceof SpanQuery) { extractWeightedSpanTerms(terms, (SpanQuery) query);

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-06 Thread Mark Miller
icitly set it higher than 0 for now. Feel free to create a JIRA issue and we can give it's own default greater than 0. - Mark Miller lucidimagination.com On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote: > I have a CustomHighlighter that extends the SolrHighlighter and overrides &

Re: NRT consistency

2011-04-11 Thread Mark Miller
- Amazon Dynamo uses vector clocks for this. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Mark Miller >> To: java-user@lucene.

Re: NRT consistency

2011-04-11 Thread Mark Miller
rom the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - Mark Mill

Re: NRT consistency

2011-04-11 Thread Mark Miller
T-consistency-tp2801878p2801878.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java

Re: Difference between regular Highlighter and Fast Vector Highlighter ?

2011-04-11 Thread Mark Miller
er if you do. FVH: works with fewer query types and requires that you store term vectors - but scales better than the std Highlighter to very large documents - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org On Apr 1, 2011, at 8:

[ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-09 Thread Mark Miller
Hey all - apologize for the quick cross post - just to let you know, Andrzej is giving a free webinar this wed. His presentations are always fantastic, so check it out: Lucid Imagination Presents a free technical webinar: Mastering the Lucene Index Wednesday, August 11, 2010 11:00 AM PST / 2:00 P

Re: NumericField API

2010-06-01 Thread Mark Miller
On 6/1/10 9:34 AM, Mindaugas Žakšauskas wrote: It's just an early observation as historically Lucene has been doing an amazing job in terms of API stability. Yes it has :) Get ready for even more change in that area though :) -- - Mark http://www.lucidimagination.com ---

[ANN] Lucene/Solr Meetup in NYC on May 11th

2010-05-08 Thread Mark Miller
If you haven't heard, there is a Lucene/Solr meetup in New York next week: http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/calendar/13325754/ The scheduled talks are (in addition to lightening talks): Solr 1.5 and Beyond: Yonik Seeley, author of Solr, co-founder, Lucid Imagination Topics w

Re: Batch Indexing - best practice?

2010-03-15 Thread Mark Miller
exing (docs/sec)...just to give me an idea of what to shoot for? Paul -Original Message- From: java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org ] On Behalf Of Mark Miller Sent: Monday, March 15, 2010 10

Re: Batch Indexing - best practice?

2010-03-15 Thread Mark Miller
On 03/15/2010 10:41 AM, Murdoch, Paul wrote: Hi, I'm using Lucene 2.9.2. Currently, when creating my index, I'm calling indexWriter.addDocument(doc) for each Document I want to index. The Documents aren't large and I'm averaging indexing about 500 documents every 90 seconds. I'd like to try

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Mark Miller
On 03/04/2010 06:52 PM, Justin wrote: Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index, r

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
n wrote: Who the heck is in charge here? Maybe it's Colonel Walter E. Kurtz? Intuitively perhaps people expect the committers to drive the project? When they don't see this are they less likely to contribute? On Thu, Feb 25, 2010 at 10:33 AM, Mark Miller wrote: Hahaha - you ha

Re: If you could have one feature in Lucene...

2010-02-25 Thread Mark Miller
Hahaha - you have a sly humor. I totally agree though. Features are long overdo, and the committers are lazy. I call for a cancellation of all of their paychecks and a stern warning about slacking off in Lucene land. There are dozens of features that are just taking way to long - whatever

Re: Where to download Mark Miller's Qsol Parser?

2010-02-04 Thread Mark Miller
Chris Harris wrote: > The QSol query parser (brief overview here: > http://www.lucidimagination.com/blog/2009/02/22/exploring-query-parsers/) > used to be available at > > http://myhardshadow.com/qsol.php > > (there was documentation as well as a link to a SVN server) but it > looks like the myhard

Re: Search for more than one term

2010-01-27 Thread Mark Miller
ctorresl wrote: > Hello: > IÄm working with Lucene for my thesis, please I need answers to > these questions: > 1. How can I tell Lucene to search for more than one term??? (for example: > the query "house garden computer" will return documents in which at least > one of the > term appears) What cl

Re: Highlighter doesn't highlight wildcard queries after updating to 2.9.1/3.0.0

2009-12-30 Thread Mark Miller
Mohsen Saboorian wrote: > After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries > like "abc*". I thought that it would be because of scoring, so I also set > myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching. > I tested with both QueryScorer and QueryTer

Re: Tokenized fields in Lucene 3.0.0

2009-12-15 Thread Mark Miller
Any more info to share? In 2.9, Tokenized literally == Analyzed. /** @deprecated this has been renamed to {...@link #ANALYZED} */ public static final Index TOKENIZED = ANALYZED; Michel Nadeau wrote: > Hi, > > I just realized that since I upgraded from Lucene 2.x to 3.0.0 (and removed > a

Re: org.apache.lucene.search.RemoteSearchable missing

2009-12-08 Thread Mark Miller
Weiwei Wang wrote: > Hi,all, > I can't not find this class in the downloaded jar and I can't figure out > what's wrong. > Does anybody here know how to fix it? > > Its now in the remote Contrib. -- - Mark http://www.lucidimagination.com

Re: NearSpansUnordered payloads

2009-11-25 Thread Mark Miller
Grant Ingersoll wrote: > On Nov 20, 2009, at 6:49 PM, Jason Rutherglen wrote: > > >> I'm interested in getting the payload information from the >> matching span, however it's unclear from the javadocs why >> NearSpansUnordered is different than NearSpansOrdered in this >> regard. >> >> NearSpans

Re: SpanQuery for Terms at same position

2009-11-23 Thread Mark Miller
Your trying -1 with ordered right? Try it with non ordered. Christopher Tignor wrote: > A slop of -1 doesn't work either. I get no results returned. > > this would be a *really* helpful feature for me if someone might suggest an > implementation as I would really like to be able to do arbitrary s

Re: Lucene Java 3.0.0 RC1 now available for testing

2009-11-17 Thread Mark Miller
Here is some of the history: https://issues.apache.org/jira/browse/LUCENE-652 https://issues.apache.org/jira/browse/LUCENE-1960 Glen Newton wrote: > Could someone send me where the rationale for the removal of > COMPRESSED fields is? I've looked at > http://people.apache.org/~uschindler/staging-a

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
r, which is inconsistent with > 2.9.1. I guess that's the flux you referred to. > > Peter > > > On Mon, Nov 9, 2009 at 8:13 PM, Mark Miller wrote: > > >> Yeah - its a debatable point. You can have issues when building though - >> did you build with java 1.5? The

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
d with patches from an official release might warrant a '-dev' > version, I suppose. > (just my 2 cents.) > > Peter > > On Mon, Nov 9, 2009 at 7:57 PM, Mark Miller wrote: > > >> The build/release formula is always in flux - we likely hard coded the >&

Re: Questions about SEN patch submissions

2009-11-09 Thread Mark Miller
Marvin Humphrey wrote: > On Mon, Nov 09, 2009 at 04:07:55PM -0500, Robert Muir wrote: > >> Mark, I think my concern is that Sen itself is LGPL ( >> https://sen.dev.java.net/). >> >> this lucene-ja is just a lucene interface to this LGPL library. >> >> I think this dependency might be a problem,

Re: building lucene-core from source

2009-11-09 Thread Mark Miller
The build/release formula is always in flux - we likely hard coded the change in 2.9.0 when releasing - we likely won't again in the future. Some discussion about it came up recently on the list. -- - Mark http://www.lucidimagination.com Peter Keegan wrote: > OK. I just downloaded the 2.9.0 s

Re: ComplexPhraseQueryParser highlight problem

2009-11-03 Thread Mark Miller
AHMET ARSLAN wrote: >> Looks like its because the query >> coming in is a ComplexPhraseQuery and >> the Highlighter doesn't current know how to handle that >> type. >> >> It would need to be rewritten first barring the special >> handling it >> needs - but unfortunately, that will break multi-term

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Looks like its because the query coming in is a ComplexPhraseQuery and the Highlighter doesn't current know how to handle that type. It would need to be rewritten first barring the special handling it needs - but unfortunately, that will break multi-term query highlighting unless you use boolean r

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Yes - please share your test programs and I can investigate (ApacheCon this week, so I'm not sure when). And its best to keep communications on the list - that allows others with similar issues (now or in the future) to benefit from whatever goes on. You will also reach a wider pool of people that

Re: IO exception during merge/optimize

2009-10-29 Thread Mark Miller
h lower values. > Btw, the indexing times are really about 5 min. shorter because of some > non-Lucene related delays after the last document. > > Peter > > > > On Thu, Oct 29, 2009 at 4:30 PM, Mark Miller wrote: > > >> Any chance I could get you to try that

Re: IO exception during merge/optimize

2009-10-29 Thread Mark Miller
Any chance I could get you to try that again with a buffer of like 800MB to a gig and do a comparison? I've been investigating the returns you get with a larger buffer size. It appears to be pretty diminishing returns over 100MB or so - at higher than that, I've gotten both slower speeds for some

[ANN] New Technical White Paper on Apache Lucene 2.9 from Lucid Imagination

2009-10-28 Thread Mark Miller
With the recent release of Apache Lucene 2.9, Lucid Imagination has put together an in-depth technical white paper on the range of performance improvements and new features (per segment indexing, trierange numeric analysis, and more), along with recommendations for upgrading your Lucene application

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-27 Thread Mark Miller
Luis Alves wrote: > Mark Miller wrote: >> Mark Miller wrote: >> >>> Michael Busch wrote: >>> >>>> Why will just saying once again "Hey, let's just release more often" >>>> work now if it hasn't in the last two ye

Re: 2.9 per segment searching/caching

2009-10-22 Thread Mark Miller
api etc., sorting > is actually faster than 2.4. > -John > > On Thu, Oct 22, 2009 at 5:07 AM, Mark Miller wrote: > > >> Bill Au wrote: >> >>> Since Lucene 2.9 has per segment searching/caching, does query >>> >> performance >&g

Re: How to loop through all the entries for a field

2009-10-22 Thread Mark Miller
But with Lucene 2.9 you would want to use StringHelper.intern right? adviner wrote: > Thank you > > > Uwe Schindler wrote: > >> Use this one: >> >> >> >> String fieldname="BookTitle"; >> >> >> >> fieldname = fieldname.intern(); // because of this we need no >> String.equals() >> >> TermEnum

Re: 2.9 per segment searching/caching

2009-10-22 Thread Mark Miller
Bill Au wrote: > Since Lucene 2.9 has per segment searching/caching, does query performance > degrade less than before (2.9) as more segments are added to the index? > Bill > > I think non sorting cases are actually faster now over multiple segments - though you will still see performance degrad

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Mark Miller wrote: > Michael Busch wrote: > >> Why will just saying once again "Hey, let's just release more often" >> work now if it hasn't in the last two years? >> >> Mich >> > > I don't know that we need to release m

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Michael Busch wrote: > Why will just saying once again "Hey, let's just release more often" > work now if it hasn't in the last two years? > > Mich I don't know that we need to release more often to take advantage of major numbers. 2.2 was released in 07 - we could have just released 2.9 right a

Re: Difference between 2.4.1 and 2.9.0 (possible regression?)

2009-10-16 Thread Mark Miller
It was a bug and Mike fixed it. The bug was that exact matches where not being returned as you state. Will be fixed in 2.9.1. stefcl wrote: > Thanks, > Even if you add to the example a document called "giga", I'm not sure that > searching "giga~0.8" would return anything. > > It seems a bit weir

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Steven A Rowe wrote: > On 10/16/2009 at 2:58 AM, Michael Busch wrote: > >> B) best effort drop-in back compatibility for the next minor version >> number only, and deprecations may be removed after one minor release >> (e.g. v3.3 will be compat with v3.2, but not v3.4) >> > > This is only t

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Mark Miller
Jukka Zitting wrote: > Hi, > > On Fri, Oct 16, 2009 at 10:23 AM, Danil ŢORIN wrote: > >> What about creating major version more often? >> > > +1 We're not going to run out of version numbers, so I don't see a > reason not to upgrade the major version number when making > backwards-incompat

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-08 Thread Mark Miller
them out. You can manage that with a delete policy now though. > Thanks, > Chris > > On Wed, Oct 7, 2009 at 4:02 PM, Mark Miller wrote: > > >> Solr just copies them into the same directory - Lucene files are write >> once, so its not much different than what happens

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-07 Thread Mark Miller
y, and > closing the old one. We don't use IndexReader.reopen() because the updated > index is in a different directory (as opposed to being updated in-place). > > (Reading about some of the 2.9 changes motivated me to look into actually > using reopen(). And Michael Busch and Mark Mi

Re: Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-05 Thread Mark Miller
I keep considering a full response too this, but I just can't get over the hump and spend the time writing something up. Figured someone else would get to it - perhaps they still will. I will make a comment here though: >Before Lucene 2.9, I don't think this made any difference, as (I think) the

Re: TimeLimitedCollector hang on, VM process doesn't die (TOMCAT)

2009-10-02 Thread Mark Miller
Mani EZZAT wrote: > Mark Miller wrote: >> That thread will only be stopped if its interrupted. So it would appear >> there is a not a path that leads to it being interrupted ... why that is >> would be the next question ... >> >> > I found someone (a japanes

Re: TimeLimitedCollector hang on, VM process doesn't die (TOMCAT)

2009-10-02 Thread Mark Miller
That thread will only be stopped if its interrupted. So it would appear there is a not a path that leads to it being interrupted ... why that is would be the next question ... -- - Mark http://www.lucidimagination.com Mani EZZAT wrote: > Hello everyone. > I'm using solrJ for an application de

Re: Error using multireader searcher in Lucene 2.9

2009-10-02 Thread Mark Miller
Sorry Raf - technically your not allowed to use internal Lucene id's that way. It happened to work in the past if you didn't use MultiSearcher, but its not promised by the API, and no longer works as you'd expect in 2.9. You have to figure out another approach that doesn't use the internal ids (eg

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
" it don't. > Thanks a lot - I'll check it out and get back to you. > the name is realy TermQueryScorer or is QueryTermScorer(i found that in the > package)?? > Sorry! Thats what happens when I trust my memory ;) Its QueryTermScorer. > Thanks. > > > On Th

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
e package as the QueryScorer, in the Highlighter contrib. > Thanks! > > On Wed, Sep 30, 2009 at 6:38 PM, Mark Miller wrote: > > >> Felipe Lobo wrote: >> >>> Hi, i updated my lucene lib to 2.9.0 and i'm trying to insta

Re: Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Mark Miller
Per segment over many segments is actually a bit faster for none sort cases and many sort cases -but an optimized index will still be fastest - the speed benifit of many segments comes when reopening - so say for realtime search - in that case you may want to sac the opt perf for a segment

Re: Highlighting phrases in 2.9

2009-09-30 Thread Mark Miller
Scott Smith wrote: > I've been looking at the changes I have to make in my code to go from > 2.4.1 to 2.9. One of the features I have is to highlight query hits in > documents which meet the search criteria. If the query has a phrase, > then I need to highlight the phrase, but not isolated words

Re: Implement SpanScorer on 2.9 lucene lib!

2009-09-30 Thread Mark Miller
Felipe Lobo wrote: > Hi, i updated my lucene lib to 2.9.0 and i'm trying to instanciate the > spanscorer but the constructor is protected. > I looked in the javadoc of lucene and saw 2 subclasses of it > (PayloadNearQuery.PayloadNearSpanScorer, > PayloadTermQuery.PayloadTermWeight.PayloadTermSpanSc

Re: TopDocCollector limits

2009-09-30 Thread Mark Miller
the deprecated Hits class? > > On Tue, Sep 29, 2009 at 7:40 PM, Mark Miller wrote: > > >> Max Lynch wrote: >> >>> Hi, >>> I am developing a search system that doesn't do pagination (searches are >>> >> run >> >&g

Re: TSDC, TopFieldCollector & co

2009-09-30 Thread Mark Miller
If you want relevance sorting (Sort.Score not Sort.Relevance right?), I'd think you want to use TopScoreDocCollector, not TopFieldCollector. The only reason to use relevance with TopFieldCollector is if you you are doing a nth sort with a field sort as well. You don't really need to worry about th

Re: TopDocCollector limits

2009-09-29 Thread Mark Miller
Max Lynch wrote: > Hi, > I am developing a search system that doesn't do pagination (searches are run > in the background and machine analyzed). However, TopDocCollector makes me > put a limit on how many results I want back. For my system, each result > found is important. How can I make it col

Re: PrefixQuery vs wildcardquery

2009-09-28 Thread Mark Miller
Though in 2.9 this is not much of a concern - the multi term queries are smart - if it matches few enough terms it will rewrite to a constant score booleanquery - if it matches a lot of terms it will rewrite to a constantscore query - using a filter underneath. So maxclause issues should no

Re: PrefixQuery vs wildcardquery

2009-09-28 Thread Mark Miller
John Seer wrote: > Hello, > > Is there any benefit of using one or other for "start with query"? > > Which one is faster? > > > Regards > Prefix query is a bit more efficient - not sure what it turns into realworld, but prefix just checks if the term's start with the prefix - wildcard has a bi

The Release of Lucene 2.9

2009-09-25 Thread Mark Miller
cene/ The Next Release: The next release will be Lucene 3.0. This should come along shortly, and will remove all of the deprecated code in Lucene 2.9. Lucene 3.0 will also be the first release to move from Java 1.4 to Java 1.5 as a requirement. Thanks, Mark Miller -BEGIN PGP SIGNATURE-

Re: Getting Payload data from BooleanQuery results

2009-09-24 Thread Mark Miller
I should beef up that spans extractor - it can actually work on the constantscore multi term queries (the base ones that now have a constant score mode in 2.9), just like the Highlighter does. That class really belongs in contrib probably. You can use the filter and the spanquery to get the result

Lucene 2.9 RC5 now available for testing

2009-09-19 Thread Mark Miller
ucene2.9changes/CONTRIB-CHANGES.txt Download release candidate 5 here: http://people.apache.org/~markrmiller/staging-area/lucene2.9rc5/ Be sure to report back with any issues you find! Thanks, Mark Miller -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
round in 2.4. This is part of the >> DocIdSetIterator changes. >> >> Anyway - either these are just not comparable runs, or there is a major >> bug (which seems unlikely). >> >> Just to keep pointing out the obvious: >> >> 2.4 cal

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
times That just doesn't jive. Mark Miller wrote: > Notice that while DisjunctionScorer.advance and > DisjuntionScorer.advanceAfterCurrent appear to be called > in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent > called. > > Can someone explain

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Notice that while DisjunctionScorer.advance and DisjuntionScorer.advanceAfterCurrent appear to be called in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent called. Can someone explain that? Mark Miller wrote: > Something is very odd about this if they both cover the s

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Something is very odd about this if they both cover the same search and the environ for both is identical. Even if one search was done twice, and we divide the numbers for the new api by 2 - its still *very* odd. With 2.4, ScorerDocQueue.topDoc is called half a million times. With 2.9, its called

Re: What would be the fastest BooleanQuery possible?

2009-09-16 Thread Mark Miller
. >> >> Mike >> >> On Wed, Sep 16, 2009 at 9:14 AM, Benjamin Pasero >> wrote: >> >>> Ah wow that sounds great. I am using 2.3.2 though (and have to use it >>> for now). Anything >>> in that version that could speed things up? >>&

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
Ah - that explains a bit. Though if you divide by 2, the new one still appears to overcall each method in comparison to 2.4. - Mark Uwe Schindler wrote: >> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png >> >> Have to verify that the last one is not by accident more than one reque

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
gt; and even worse: > http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png > > Have to verify that the last one is not by accident more than one request. > Will > do the run again and then post the required info. > > Mark Miller wrote: > >> bq. I'll do

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
//www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >> >>> -Original Message- >>> From: Mark Miller [mailto:markrmil...@gmail.com] >>> Sent: Wednesday, September 16, 2009 6:23 PM >>> To: java-user@lucene.apache.org >>> Subject: Re

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-16 Thread Mark Miller
tests now with SimpleFSDirectory and MMapDirectory. Both are > faster than NIOFS and the response times improved. But it's still slower than > 2.4. > > I'll do some profiling now again and let you know the results. > > Thanks again for all the great support to all who&#

Re: What would be the fastest BooleanQuery possible?

2009-09-16 Thread Mark Miller
With the new Collector API in Lucene 2.9, you no longer have to compute the score. Now a Collector is passed a Scorer if they want to use it, but you can just ignore it. -- - Mark http://www.lucidimagination.com Benjamin Pasero wrote: > Hi, > > I am using Lucene not only for smart fulltext s

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024 poolsize=2 filelen=164956707 answer=-31115729, ms=45691, MB/sec=1444.106778140115 Mark Miller wrote: > I'm jealous of your 4 3.0Ghz to my 2.0Ghz. > > I was on dynamic scaling frequency and switched to 2.0Ghz hard. >

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
nchmarking... some >> things with IO cause the freq to drop, and when it's CPU bound again >> it takes a while for Linux to scale up the freq again. >> >> For example, on my ubuntu box, ChannelFile went from 100MB/sec to >> 388MB/sec. This effect probably won't

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
I'm jealous of your 4 3.0Ghz to my 2.0Ghz. I was on dynamic scaling frequency and switched to 2.0Ghz hard. On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s with SeparateFile. I'm smoked on PooledPread and ChannelPread though. Still sub 500 for both, even on the ramdisk. It

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
4 poolsize=2 filelen=730554368 answer=-282295361, ms=766340, MB/sec=381.3212767179059 Mark Miller wrote: > Michael McCandless wrote: > >> I don't like that the answer is different... but it's really really >> odd that it's different-yet-almost-the-same. >>

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
ley > wrote: > >> It's been a while since I wrote that benchmarker... is it OK that the >> answer is different? Did you use the same test file? >> >> -Yonik >> http://www.lucidimagination.com >> >> >> >> On Tue, Sep 15, 2009 at

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
you use the same test file? > > -Yonik > http://www.lucidimagination.com > > > > On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote: > >> The results: >> >> config: impl=SeparateFile serial=false nThreads=4 iterations=100 >> bufsize=1024 pool

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
llee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Tuesday, September 15, 2009 7:15 PM >> To: java-user@lucene.apache.org >> Subject: Re: lucene

  1   2   3   4   5   6   7   >