Re: SpanQuery for Terms at same position

2009-11-25 Thread Paul Elschot
t this in the query > construction. I think requiring n terms at the same position would need a slop of 1-n, and I'd like to have some test cases added for that. Now if I only had some time... Regards, Paul Elschot > > thanks, > > C>T> > On Tue, Nov 24, 2009 at 9:17 AM, Chr

Re: SpanQuery for Terms at same position

2009-11-23 Thread Paul Elschot
when spans at the same positions are considered ordered. Did I understand correctly that the unordered case with a slop of -1 and without the edit works to match terms at the same position? In that case it may be worthwhile to add that to the javadocs, and also add a few testcases. Regards, Paul El

Re: SpanQuery for Terms at same position

2009-11-23 Thread Paul Elschot
y like to be able to do arbitrary span > searches where tokens may be at the same position and also in other > positions where the ordering of subsequent terms may be restricted as per > the normal span API. My pleasure, Paul Elschot > > thanks, > > C>T> > > On Sun

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
w can I join several such filters together? There are various ways. OpenBitSet and OpenBitSetDISI can do this, and there's also BooleanFilter and ChainedFilter in contrib. > Using FieldCacheTermsFilter sounds promising. Fortunately it is a single > value field (our unique doc id). Regards, P

Re: Efficient filtering advise

2009-11-22 Thread Paul Elschot
Try a MultiTermQueryWrapperFilter instead of the QueryFilter. I'd expect a modest gain in performance. In case it is possible to form a few groups of terms that are reused, it could even be more efficient to also use a CachingWrapperFilter for each of these groups. Regards, Paul Elscho

Re: SpanQuery for Terms at same position

2009-11-22 Thread Paul Elschot
e too much to only match at the same position. SpanNearQuery may or may not work for a slop of -1, but one could try that for both the ordered and unordered cases. One way to do that is to start from the existing test cases. Regards, Paul Elschot > > Regards, > Adriano Crestani

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Paul Elschot
compatibility for minor version numbers > (e.g. v3.5 will be compatible with v3.2) > B) best effort drop-in back compatibility for the next minor version > number only, and deprecations may be removed after one minor release > (e.g. v3.3 will be compat with v3.2, but not v3.4) I'd prefer B), with a minimum period of about two months to the next release in case it removes deprecations. Regards, Paul Elschot

Re: faceted search performance

2009-10-13 Thread Paul Elschot
ed, for example by using the ones with the best query score. Limiting the number of terms would also be good, but that less easy. Regards, Paul Elschot > > Chris > > 2009/10/12 Paul Elschot > > > Chris, > > > > You could also store term vectors for all docs a

Re: faceted search performance

2009-10-12 Thread Paul Elschot
Chris, You could also store term vectors for all docs at indexing time, and add the termvectors for the matching docs into a (large) map of terms in RAM. Regards, Paul Elschot On Monday 12 October 2009 21:30:48 Christoph Boosz wrote: > Hi Jake, > > Thanks for your helpful explanat

Re: faceted search performance

2009-10-12 Thread Paul Elschot
the matching documents by using a counting HitCollector on the IndexSearcher. Regards, Paul Elschot

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
As long as next(), skipTo(), doc() and score() on a Scorer work, the search will be done. I hope the results are correct in this case, but I'm not sure. Regards, Paul Elschot On Wednesday 15 July 2009 19:08:00 Michael McCandless wrote: > I don't think a toplevel BS2 is able to

Re: speed of BooleanQueries on 2.9

2009-07-15 Thread Paul Elschot
happening. Eks, could you try a toString() on the top level scorer for one of the affected queries to see whether it shows BS2 on top level and BS for the inner scorers? Regards, Paul Elschot > > BooleanQuery only uses BooleanScorer when there are no required terms, > and allowDocs

Re: Boolean retrieval

2009-07-04 Thread Paul Elschot
It is also possible to use the HitCollector api and simply ignore the score values. Regards, Paul Elschot On Saturday 04 July 2009 21:14:41 Mark Harwood wrote: > > Check out booleanfilter in contrib/queries. It can be wrapped in a > constantScoreQuery > > > > On 4 Jul

Re: multi-field index and search (Not MultiFieldQuery). Help setting up index and search

2009-05-04 Thread Paul Elschot
there might be of help during interactive retrieval. Your application is not really a web shop, but there are (at least) some overlaps. Regards, Paul Elschot On Monday 04 May 2009 19:16:10 Christian Bongiorno wrote: > I am trying to build a search (have been experimenting with using Lucene) >

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
test and test. > As a side note, Will the Shingle Filter help me getting all possible > combination of the input tokens? I don't know. Regards, Paul Elschot

Re: Need help : SpanNearQuery

2009-04-17 Thread Paul Elschot
different weights in SpanTermQuery. Regards, Paul Elschot On Friday 17 April 2009 12:18:46 Radhalakshmi Sreedharan wrote: > To make the question simple, > > What I need is the following : > If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd). > > Given the

Re: Index in text format

2009-04-09 Thread Paul Elschot
On Thursday 09 April 2009 21:56:44 Andy wrote: > Is there a way to have lucene to write index in a txt file? No. You could try a hexdump of the index file(s), but that isn't really human readable. Instead of that you may want to try Luke: http://www.getopt.org/luke/ Regards, Paul Elschot

Re: Internals question: BooleanQuery with many TermQuery children

2009-04-07 Thread Paul Elschot
t, and by a heap. For the time being, Lucene does not have a low level facility for key values that occur at most once per document field, so for these it normally helps to use a Filter. Regards, Paul Elschot

Re: Using SpanNearQuery.getSpans() in a Search Result

2009-04-02 Thread Paul Elschot
I'm using > ParallelMultiSearcher so I'm not even 100% sure that I know what index > each Hit is located in. It's the other way around: for span queries a search result is created (internally, by SpanScorer) from the spans resulting from the getSpans() method above. Does tha

Re: number of hits of pages containing two terms

2009-03-17 Thread Paul Elschot
. Regards, Paul Elschot On Tuesday 17 March 2009 12:35:19 Adrian Dimulescu wrote: > Ian Lea wrote: > > Adrian - have you looked any further into why your original two term > > query was too slow? My experience is that simple queries are usually > > extremely fast. > Let

Re: Speeding up RangeQueries?

2009-03-14 Thread Paul Elschot
the new TrieRangeQuery: http://wiki.apache.org/lucene-java/SearchNumericalFields Regards, Paul Elschot

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-17 Thread Paul Elschot
when using the same criterion as in the removed methods there, your original problem might not have occurred at all. In the CachingWrapperFilter in trunk the choice is left to an overridable method. Regards, Paul Elschot > > Regards, > Raf > > On Sun, Feb 15, 2009 at 2:39 PM, P

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-15 Thread Paul Elschot
when it is smaller than OpenBitSet), please comment at LUCENE-1296. Regards, Paul Elschot On Sunday 08 February 2009 09:47:24 Raffaella Ventaglio wrote: > Hi Paul, > > One way to implement that would be to use one of the boolean combination > > filters in contrib, BooleanFilter o

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-08 Thread Paul Elschot
On Sunday 08 February 2009 09:53:00 Uwe Schindler wrote: > I would do so, it's really simple, you can even do it in an anonymous inner > class. It is indeed simple, but it might also help to take a look at the source code of the Lucene classes involved. Regards, Paul Elschot >

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-08 Thread Paul Elschot
ounting. Could you describe how this compact forwarded index works? > Similar to FieldCache idea but more compact. Does this also use FieldCacheRangeFilter and/or FieldCacheTermsFilter? Regards, Paul Elschot

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-07 Thread Paul Elschot
ng counts it uses > even 2GB of memory (and this is very bad). 50.000 facets? Well, in case the performance of the last suggestion is not good enough, one could try and implement a better data structure than OpenBitSet and SortedVIntList to provide a DocIdSetIterator, preferably with a fast skipTo() and possibly with a fast intersection count. In that case, you may want to ask further on the java-dev list. Regards, Paul Elschot

Re: TermScorer default buffer size

2009-01-08 Thread Paul Elschot
sue and mention the performance improvements? Regards, Paul Elschot > > -John > > On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot wrote: > > > John, > > > > Continuing, see below. > > > > On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: > &g

Re: TermScorer default buffer size

2009-01-08 Thread Paul Elschot
John, Continuing, see below. On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote: > On Wednesday 07 January 2009 07:25:17 John Wang wrote: > > Hi: > > > >The default buffer size (for docid,score etc) is 32 in TermScorer. > > > > We have a large i

Re: TermScorer default buffer size

2009-01-07 Thread Paul Elschot
help, but not for AND queries. See also LUCENE-430 on reducing buffer sizes for the underlying TermDocs for very sparse doc sets. Regards, Paul Elschot

Re: Lucene retrieval model

2008-12-30 Thread Paul Elschot
tribute to the score. One might consider the scoring of the optional clauses to be an implementation of the extended Boolean model. Fuzzy searching is implemented by constructing a Boolean query with optional (and actually present) terms that are similar enough to the fuzzy query term. Regards, P

Re: BooleanQuery Performance Help

2008-12-20 Thread Paul Elschot
hat further caching > could be done apart from the default caching which lucene does. More caching is probably not going to help. Regards, Paul Elschot - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-18 Thread Paul Elschot
lthough I understand the idea behind > the setting, I am not sure why it made a difference in my case. That option chooses another algorithm to search these queries, it will only affect queries without required terms. (The change in search algorithm is from BooleanScore

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Paul Elschot
y lucene's OpenBitSet. Also have a look at earlier discussions on the subject: you might find a good use for OpenBitSetDISI and contrib/**/{BooleanFilter,ChainedFilter}. Regards, Paul Elschot Op Tuesday 09 December 2008 07:44:20 schreef Michael Stoppelman: > Hi all, > > I'm w

Re: Term numbering and range filtering

2008-11-19 Thread Paul Elschot
to show an unexpected tradeoff possibility opened by the new Filter api. I don't know whether you followed LUCENE-584 (Decouple Filter from BitSet), but a contribution like this multi range filter makes it all worthwhile. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: 2.4 Performance

2008-11-19 Thread Paul Elschot
need to > use it directly. Is this part of the problem https://issues.apache.org/jira/browse/LUCENE-1296 ? Also consider o.a.l.util.OpenBitSetDISI, and how that is used in contrib/queries/**/BooleanFilter Regards, Paul Elschot -

Re: Term numbering and range filtering

2008-11-18 Thread Paul Elschot
range boolean query. > > Mike, Paul, I'm happy to contribute this (ugly but working) code if > there is interest. Let me know and I'll open a JIRA issue for it. In case you think more performance improvements based on this are possible... Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Term numbering and range filtering

2008-11-11 Thread Paul Elschot
sk, > we could do this packing during index such that loading at search > time is very fast. Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410. I think what you're referring to is PDICT, which has frame exceptions for values that occur infrequently. Regards, Paul Elsc

Re: Term numbering and range filtering

2008-11-11 Thread Paul Elschot
gt; > However that'd be quite a bit deeper change to Lucene. The cheap version is hierarchical prefixing here: http://wiki.apache.org/jakarta-lucene/DateRangeQueries Regards, Paul Elschot - To unsubscribe, e-mail: [EMAI

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
ructure in the cache. (Sparse enough means less than 1 in 8 of all docs available the index reader.) See also LUCENE-1296 for caching another data structure than the one used to collect the filtered docs. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Term numbering and range filtering

2008-11-10 Thread Paul Elschot
Tim, I didn't follow all the details, so this may be somewhat off, but did you consider using TermVectors? Regards, Paul Elschot Op Monday 10 November 2008 19:18:38 schreef Tim Sturge: > Yes, that is a significant issue. What I'm coming to realize is that > either I will end u

Re: How to combine filter in Lucene 2.4?

2008-11-09 Thread Paul Elschot
PlaceAnd() is not optimal, although it should work just fine. A patch for a performance improvement will follow. Regards, Paul Elschot > > Cheers > Mark > > > - > To unsubscribe, e

Re: How to combine filter in Lucene 2.4?

2008-11-08 Thread Paul Elschot
/queries/**/BooleanFilter Regards, Paul Elschot Op Saturday 08 November 2008 19:06:15 schreef Timo Nentwig: > Hi! > > Since Filter.bits() is deprecated and replaced by getDocIdSet() now I > wonder how I am supposed to combine (AND) filters (for facets). > > I worked around this

Re: Sorting posting lists before intersection

2008-10-13 Thread Paul Elschot
ator superclass): public abstract int estimatedDocFreq(); and implement this for all existing instances. TermScorer could implement it without estimating. For AND/OR/NOT such an estimation is straightforward but for proximity queries it would be more of a guess. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Paul Elschot
Op Friday 05 September 2008 16:57:34 schreef Mark Miller: > Paul Elschot wrote: > > Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: > >> Sounds like its more in line with what you are looking for. If I > >> remember correctly, the phrase query factors i

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-04 Thread Paul Elschot
and idf is not used for scoring Spans. The reason why idf is not used could be that there is no basic score value associated with inner spans; only top level spans are scored by SpanScorer. For more details, please consult the SpanScorer code. Regards, Paul Elschot > > - Mark > &g

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: > On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: > > Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: > >> Hi all, > >> > >> I am working on implementing a new Query, Weight and Scorer that > &g

Re: Pre-filtering for expensive query

2008-09-03 Thread Paul Elschot
Op Wednesday 03 September 2008 18:06:57 schreef Matt Ronge: > On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote: > > Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge: > >> On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote: > >>> Can you tell us a bit more ab

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge: > On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote: > > Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge: > >> Hi all, > >> > >> I am working on implementing a new Query, Weight and Scorer that > &g

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
filtering at all, because it already uses skipTo() where possible. In case you are looking for documents that contain partial phrases from an input query that has more than 2 words, have a look at Nutch. Regards, Paul Elschot > > > -- > Matt > > >> Hi all, > >> &

Re: Pre-filtering for expensive query

2008-08-30 Thread Paul Elschot
erates on? Yes, Filters. > Or should I just implement something myself in a custom scorer? In case you have a better way than skipTo(), or something to improve on this issue to allow a Filter as clause to BooleanQuery: https://issues.apache.org/j

Re: Fastest way to get just the "bits" of matching documents

2008-07-26 Thread Paul Elschot
Op Thursday 24 July 2008 23:00:33 schreef Robert Stewart: > Queries are very complex in our case, some have up to 100 or more > clauses (over several fields), including disjunctions and prohibited > clauses. Other than the earlier advice, did you try setAllowDocsOutOfOrder() ? Rega

Re: Scoring filters

2008-06-11 Thread Paul Elschot
r docs and these score values. Then use this as the scorer for a new Query, via a Weight. Once this new Query is available, just add it as required to a BooleanQuery. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTE

Re: SpanNearQuery: how to get the "intra-span" matching positions?

2008-05-30 Thread Paul Elschot
positions SpanScorer will also need to be extended or even replaced. In case you want to continue this discussion, please do so on java-dev. Regards, Paul Elschot. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanNearQuery scoring

2008-05-23 Thread Paul Elschot
#x27;t know. The Spans interface does not contain a weight() or score() method, so there is no way to pass such information to SpanScorer. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MultiTerm Or Query with per-term boost. Does it exist?

2008-05-18 Thread Paul Elschot
gt;> (often 1000) of constituent TermQueries. I'm wondering if there is > >> a better way to do this? > >> I'm open to implementing my own Query subclass if I can expect > >> significant performance improvements from doing this. Does BooleanQuery.setAllowDocsOut

Re: multi word synonyms

2008-05-18 Thread Paul Elschot
Op Sunday 18 May 2008 16:30:26 schreef Karl Wettin: > 18 maj 2008 kl. 00.01 skrev Paul Elschot: > > Op Saturday 17 May 2008 20:28:40 schreef Karl Wettin: > >> As far as I know Lucene only handle single word synonyms at index > >> time. My life would be much simple

Re: multi word synonyms

2008-05-17 Thread Paul Elschot
or the synonym. Was this one of the workarounds? The advantage of the zero position increment is that the original token positions are not affected, so at least there is no influence on scoring because of changes in the original token positions. Regards, Paul Elschot -

Re: theoretical maximum score

2008-05-17 Thread Paul Elschot
ed) and/or/phrase/span) make sure that the subscore values are combined into another value that has the same theoretical maximum. Have a look here to start: https://issues.apache.org/jira/browse/LUCENE-293 Regards, Paul Elschot - To

Re: Filtering a SpanQuery

2008-05-12 Thread Paul Elschot
iltered case. > I guess your suggested solution is my best option without changing > the way getSpans works (which I'm not going to change any time soon ) Before doing that, have a look at the code of SpanWeight/SpanScorer, ConjunctionScorer, and the filtering code in IndexSearcher. Regards, P

Re: Filtering a SpanQuery

2008-05-07 Thread Paul Elschot
internally but I guess that if the > filter is known beforehand, A Filter needs to make a BitSet available before the query search. > it could speed things up quite a bit. I would expect a substantial speedup from using skipTo() on the Spans when only 0.1% of the results passes the fi

Re: Filtering a SpanQuery

2008-05-07 Thread Paul Elschot
Op Tuesday 06 May 2008 17:39:38 schreef Paul Elschot: > Eran, > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi: > > Hi, > > > > I am looking for a way to filter a SpanQuery according to some > > other query (on another field from the one used for the SpanQu

Re: Filtering a SpanQuery

2008-05-06 Thread Paul Elschot
use spans.start() and spans.end() here // ... more = spans.next(); } if (! more) { break; } filterDoc = bits.nextSetBit(spans.doc()); } Please check the javadocs of java.util.BitSet, there may be a 1 off error in the arguments to nextSetBit(). Regards, Paul Elschot > > I tried looking

Re: Lucene Proximity Searches

2008-04-18 Thread Paul Elschot
arSpansOrdered class in the org.apache.lucene.search.spans package to allow a match for less than all subqueries. This is not going to be straightforward, but it is possible. In case you choose this last option, please continue on the java-dev list. Regards, Paul Elschot > > On Fri, Apr 4, 2008

Re: QueryWrapperFilter question...

2008-04-17 Thread Paul Elschot
mer. I had really convinced myself till the > thought came to me at lunch :). For a single query, adding a filter off course has a cost. But when the location part can be reused in later queries, give CachingWrapperFilter a try. Regards, Paul Elschot > > -M > > On Wed, Apr 16, 2008

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-12 Thread Paul Elschot
Op Saturday 12 April 2008 00:03:13 schreef Antony Bowesman: > Paul Elschot wrote: > > Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: > >> Use Filter and BitSet. > >> From the personnal data, you build a Filter > >> (http://lucene.apache.org/jav

Re: Using Lucene partly as DB and 'joining' search results.

2008-04-11 Thread Paul Elschot
ne.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil >ter.html) wich is used in the main index. With 1 billion mails, and possibly a Filter per user, you may want to use more compact filters than BitSets, which is currently possible in the development trunk of lucene. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-08 Thread Paul Elschot
is no specific reason why it cannot be done, one only needs to provide the corresponding tokenizer to be used at indexing time. Kind regards, Paul Elschot > > Itamar. > > -Original Message- > From: Paul Elschot [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 08, 2008 1:5

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-07 Thread Paul Elschot
Itamar, Have a look here: http://lucene.apache.org/java/2_3_1/scoring.html Regards, Paul Elschot Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko: > Paul and John, > > Thanks for your quick reply. > > The problem with query rewriting is the beforementioned >

Re: Why Lucene has to rewrite queries prior to actual searching?

2008-04-07 Thread Paul Elschot
match a document, as long as at least one matches. For the required query parts (AND like), Scorer.skipTo() is used, and that could well be the filter mechanism you are referring to; have a look at the javadocs of Scorer, and, if necessary, at the actual code of ConjunctionScorer. Regards, Paul

Re: Improving Index Search Performance

2008-03-26 Thread Paul Elschot
more data to the lucene index that can be used to reduce the number of results to be fetched. Regards, Paul Elschot Op Wednesday 26 March 2008 13:51:24 schreef Shailendra Mudgal: > > The bottom line is that reading fields from docs is expensive. > > FieldCache will, I believe, lo

Re: Improving Index Search Performance

2008-03-25 Thread Paul Elschot
reason, retrieving docs is best done in doc id order, but that is unlikely to go wrong as doc ids are normally collected in increasing order. Regards, Paul Elschot Op Tuesday 25 March 2008 13:43:18 schreef Shailendra Mudgal: > Hi Everyone, > > We are using Lucene to search on a index

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
Op Saturday 22 March 2008 00:32:32 schreef Paul Elschot: > Milu, > > This is a PHP problem, not a Lucene one, so you might get better > response at a PHP mailing list. > > The easy way around your problem is probably by invoking a shell > script from php that exports

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
, you'll probably want to use the PHP/Java extension to avoid initializing a JVM for each call to lucene. Try this: http://www.google.nl/search?q=php+java+org+apache+lucene&ie=UTF-8&oe=UTF-8 This was one of the results: http://www.idimmu.net/index.php?blog%5Bpagenum%5D=3 Regards, Paul

Re: HELP: how to list term score inside some document?

2008-03-14 Thread Paul Elschot
approach? Have a look at Searcher.explain() Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread Paul Elschot
irstQuery, BooleanClause.Occur.MUST); //must is > like an AND > overallquery.add(secondQuery, BooleanClause.Occur.MUST): There is no need for a QueryParser in this case when using a TermQuery instead of a Query for q1, q2, q3 and q4: TermQuery q1 = new TermQuery(new Term("title", "ter

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-15 Thread Paul Elschot
already have a firm requirement for that case? SpanNotQuery can be used to prevent matches over paragraph borders when these are indexed as such, but I would not expect that you would need those, given the fuzzyness of the [10/5/2]. Regards, Paul Elschot Op Friday 15 February 2008 09:45:58 schreef

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
avoid disjunctions. For example for verbs, one could index only the stem and use a payload for the actual inflected form (singular/plural, past/present, first/second/third person, etc). Regards, Paul Elschot > > Cedric > > > On Fri, Feb 15, 2008 at 7:15 AM, Paul Elschot <[EM

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
revert to using another field for different position info. Regards, Paul Elschot Op Thursday 14 February 2008 09:44:40 schreef Cedric Ho: > Hi Paul, > > Sorry I am not sure I understand your solution. > > Because I would need to apply this scoring logic to all the different > types

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Paul Elschot
y on this extra field would almost do, and you will probably need https://issues.apache.org/jira/browse/LUCENE-1093 . This will be somewhat slower than using a payload, because the search will be done in two separate fields, but it will work. Regards, Paul Elschot --

Re: recall/precision with lucene

2008-02-09 Thread Paul Elschot
return TopDocs. From this one can make a precision/recall graph for the query by considering the total results higher than a given score. When a lot of such computations are needed, you may also want to cache the values of a unique identifier field for all indexed docs, have a look at Field

Re: Lucene syntax query matched against a string content

2008-02-08 Thread Paul Elschot
ing the results. Regards, Paul Elschot Op Friday 08 February 2008 05:48:08 schreef Nilesh Bansal: > Hi, > > I want to create a function, which takes in a query string (in lucene > syntax), and a string as content and returns back if the query matches > the content or not. This wou

Re: Lucene to index OCR text

2008-01-29 Thread Paul Elschot
Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll: > On Friday 25 January 2008 19:26:44 Paul Elschot wrote: > > There is no way to do exact phrase matching on OCR data, because no > > correction of OCR data will be perfect. Otherwise the OCR would have made >

Re: Lucene to index OCR text

2008-01-25 Thread Paul Elschot
he contrib area. It has truncation and proximity based on span queries, but no fuzzy term matching, so it could also be a start for investigating. It all depends on how good the OCR was, but in some cases (think old paper) it's just not possible to do good OCR. Regards, Paul Elschot -

Re: Lucene Performance

2008-01-19 Thread Paul Elschot
for all terms in the query, a separate scorer will be used during query search. The query rewrite could in principle do this, but it might affect the score values. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PRO

Re: Self Join Query

2008-01-10 Thread Paul Elschot
e indexed to allow filtering, and stored to allow retrieval for filtering in another index. Retrieving stored fields is normally a performance bottleneck, so a FieldCache might be handy. Regards, Paul Elschot On Thursday 10 January 2008 12:58:44 sachin wrote: > Here are more details about my i

Re: Query processing with Lucene

2008-01-09 Thread Paul Elschot
rers and by Span Scorer. That is for the case that offsets were meant to be positions within a document. It is also possible that offsets were meant in the sense of using skipTo(doc) instead of next() on a Scorer. This is done during query search when at least one term is required. Regards, Paul Els

Re: question on the implementation of a SetFilter

2007-12-28 Thread Paul Elschot
that on top of TermEnum. The TermEnum starts at a given field/term and iterates through all indexed terms after that, including terms with field names ordered later than the given field. That's why the field name must be checked in the Term. Perhaps that could be another bit functio

Re: Can I do boosting based on term postions?

2007-12-18 Thread Paul Elschot
On Tuesday 18 December 2007 14:59:45 Peter Keegan wrote: > > Should I open a Jira issue? > What shall I say? http://www.apache.org/foundation/how-it-works.html Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL

Re: "Field weights"

2007-12-14 Thread Paul Elschot
Karl, This might work for you: https://issues.apache.org/jira/browse/LUCENE-293 Regards, Paul Elschot On Friday 14 December 2007 18:06:01 Karl Wettin wrote: > I have an index that contains three sorts of documents: > > Car brand > Tire brand > Tire pressure > > (Please b

Re: Scoring for all the documents in the index relative to a query

2007-11-20 Thread Paul Elschot
;foo^0") => returns the same X results even if all scores are 0 In the patch, Matcher is a superclass of Scorer and it does not have the score() method, so 'matching' is independent of the any score value. The matchi

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Paul Elschot
Gentlefolk, Well, the javadocs as patched at LUCENE-584 try to change all the cases of zero scoring to 'non matching'. I'm happily bracing for a minor conflict with that patch. In case someone wants to take another look at the javadocs as patched there, don't let me stop y

Re: Search performance using BooleanQueries in BooleanQueries

2007-11-06 Thread Paul Elschot
On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote: > On 29-Oct-07, at 9:43 AM, Paul Elschot wrote: > > On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote: > >> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e > >> > >> is much faster than > >> >

Re: 2/3 of terms matched + coverage filter

2007-10-31 Thread Paul Elschot
t. This Y% is not directly possible, but I would expect the default document score to correlate reasonably well with coverage. In case you want an exact Y% cutoff, you'll run into the fact that the field norm (the inverse square root of the field length) is encoded in only 8 bits, which is

Re: Looking for "Exact match but no other terms"... how to express it?

2007-10-30 Thread Paul Elschot
nding your queries with these special tokens, for example: "=begin= foo bar dot =end=" . Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search performance using BooleanQueries in BooleanQueries

2007-10-29 Thread Paul Elschot
ding BooleanQuery.rewrite(). Take care about query weights, though. Regards, Paul Elschot > > thanks for any help, > > Regards Ard - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Cache BitSet or doc number?

2007-10-27 Thread Paul Elschot
and SortedVIntList. Regards, Paul Elschot On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote: > On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > > Thom Nelson wrote: > > > Check out the HashDocSet from Solr, this is the best way to cache small > > > se

Re: Adding support for NOT NEAR construct?

2007-10-17 Thread Paul Elschot
does not work for this because it works on doc level and not within the matching text of a field. Regards, Paul Elschot On Wednesday 17 October 2007 17:57:21 Dave Golombek wrote: > We've run into a situation where having "NOT NEAR" queries would really > help. I hav

Re: Scoring a single document from a corpus based on a given query

2007-10-10 Thread Paul Elschot
for. I was hoping for a cleaner > approach. You can try this: Explanation e = indexSearcher.explain(query, documentId); and get the score value from the explanation. Have a look at the code of any Scorer.explain() method on how to get the score value only. There really is no need to filter

Re: Scorer skipTo() expectations?

2007-10-04 Thread Paul Elschot
. The reason for that is performance, BooleanScorer uses a faster data structure than a priority queue, but BooleanScorer does not implement skipTo(). Regards, Paul Elschot On Thursday 04 October 2007 09:12, Dan Rich wrote: > Hi, > > I have a custom Query class that provides a long list

Re: a query for a special AND?

2007-10-01 Thread Paul Elschot
As for suggestions on how to do this, I have no other than to make sure that you can create the queries necessary to obtain the required output. Regards, Paul Elschot On Sunday 30 September 2007 09:20, Mohammad Norouzi wrote: > Hi Paul, > thanks, I dot your idea, now I am planing to imp

  1   2   3   4   >