Re: Efficient filtering advise

2009-11-24 Thread Eran Sevi
ave your Collector insure that any docs > > were in the Filter.. > > > > FWIW > > Erick > > > > > > > > On Mon, Nov 23, 2009 at 11:01 AM, Eran Sevi wrote: > > > >> I've taken TermsFilter from contrib which does exactly tha

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
IDs using TermDocs.seek(Term) to see how long assembling > the filter would take. Using the Filter in a query doesn't cost > much at all > > Best > Erick > > > On Mon, Nov 23, 2009 at 8:12 AM, Eran Sevi wrote: > > > Erick, > > > > Maybe I didn&#x

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
one you send to your query... > > If I'm off base here, could you post a reasonable extract of your filter > construction code, and how you use them to search? Because I don't > think we're all talking about the same thing here. > > HTH > er...@thismakesnose

Re: Efficient filtering advise

2009-11-23 Thread Eran Sevi
or loop, and see if there's *any* noticeable > difference in speed. That'll tell you whether your problems > arise from the filter construction/search or what you're doing > in the collector > > Best > Erick > > On Sun, Nov 22, 2009 at 11:41 AM, Eran Sevi

Re: Efficient filtering advise

2009-11-22 Thread Eran Sevi
e is being spent? That'd > be a big help in suggesting alternatives. If I'm on the right track, > I'd expect the time to be spent assembling the filters. > > Not much help here, but I'm having trouble wrapping my head > around this... > > Best > Erick

Re: Efficient filtering advise

2009-11-22 Thread Eran Sevi
> > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Eran Sevi [mailto:erans...@gmail.com] > > Sent: Sunday, November 22, 2009 3:49

Efficient filtering advise

2009-11-22 Thread Eran Sevi
Hi, I have a need to filter my queries using a rather large subset of terms (can be 10K or even 50K). All these terms are sure to exist in the index so the number of results can be about the same number of terms in the filter. The terms are numbers but are not subsequent and are from a large set o

Re: Invitation: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features (Sep 24 02:00 PM EDT)

2009-10-15 Thread Eran Sevi
Is there a recording of the Webinars for anyone who's missed it? On Sat, Sep 19, 2009 at 12:03 AM, wrote: > *Description* > > > > __ > > Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features > --- > > J

Re: score from spans

2009-08-26 Thread Eran Sevi
et any work going, don't be shy to start posting code there, and > perhaps you can get some additional eyes/help as you go. > > I think in the end, it might have to be an optional mode, if we get the > code produced. > > -- > - Mark > > http://www.lucidimaginat

Re: score from spans

2009-08-09 Thread Eran Sevi
bit complicated, b/c actually getting the Spans > is separate from doing the query. I agree there could be tighter > integration. However, what you could do is use Spans.skipTo to move to the > document you are examining in the search results. > > -Grant > > > On Aug

Is it possible to receive a score when using span queries?

2009-08-04 Thread Eran Sevi
Hi, Does anyone knows of how to retrieve such score for any kind of span queries (especially SpanNearQueries) ? Thanks, Eran.

score from spans

2009-08-02 Thread Eran Sevi
Hi, How can I get the score of a span that is the result of SpanQuery.getSpans() ? The score should can be the same for each document, but if it's unique per span, it's even better. I tried looking for a way to expose this functionality through the Spans class but it looks too complicated. I'm no

Re: changing term freq in indexing time

2009-04-22 Thread Eran Sevi
t; // } > } > } > > ** > *** > > For the synonyms with the weights, I tried the following code: > BooleanQuery bq = new BooleanQuery(); > TermQuery tq = new TermQuery(new Term(WordIndex.FIELD_WORLDS, "3")); > tq.setBoost((float) 1.0);

Re: changing term freq in indexing time

2009-04-21 Thread Eran Sevi
Hi, You might want to take a look at Payloads. If you know the frequency of the words in each world in advance than during tokenization for each world you could save the frequency as the payload. During searches you could use BoostingTermQuery to take the frequency into account. Eran. On Tue, Ap

Re: Lucene implementation/performance question

2008-11-27 Thread Eran Sevi
est > patch > (see the case here: https://issues.apache.org/jira/browse/LUCENE-1465) > since > it fixed some important bugs I had come across. > > I hope this made sense, I haven't finished my morning coffee yet so I can't > be too sure : ) Let me know if you h

Re: Lucene implementation/performance question

2008-11-26 Thread Eran Sevi
Hi, Can you please shed some light on how your final architecture looks like? Do you manually use the PayloadSpanUtil for each document separately? How did you solve the problem with phrase results? Thanks in advance for your time, Eran. On Tue, Nov 25, 2008 at 10:30 PM, Greg Shackles <[EMAIL PROTE

Re: Searching repeating fields

2008-11-19 Thread Eran Sevi
If you don't have a lot of entries for each invoice you can duplicate the invoice for each entry - you'll have some field duplications (and bigger index size) between the different invoices but it'll be easy to find exactly what you want. If you have too many different values, I built a solution s

Re: Lucene implementation/performance question

2008-11-13 Thread Eran Sevi
Hi, I have the same need - to obtain "attributes" for terms stored in some field. I also need all the results and can't take just the first few docs. I'm using an older version of lucene and the method i'm using right now is this: 1. Store the words as usual in some field. 2. Store the attributeso

Re: Optimizing while readers are open

2008-09-25 Thread Eran Sevi
imized index. That will delete the old files. > > On other OSs, which usually implement "delete on last close", the disk > space should be automatically freed up once you close the old reader. > > Mike > > > Eran Sevi wrote: > > Hi, >> >> I have the

Optimizing while readers are open

2008-09-25 Thread Eran Sevi
Hi, I have the following scenario using Lucene 2.1 1. Open reader on index to perform some searches. 2. Use reader to check if index is optimized. 2. Open writer and run optimize() 3. Close old reader and open a new reader for further searches. I expected that after closing the old reader , the

Re: SpanQuery and FilteredQuery

2008-08-26 Thread Eran Sevi
Hi Chris, I asked exactly the same question a little while ago and got a pretty good answer from Paul Elschot. Try searching the archives for 'Filtering a SpanQuery'. It was around the 13/5/08. Hope it helps, Eran. On Mon, Aug 25, 2008 at 8:18 PM, Christopher M Collins <[EMAIL PROTECTED]>wrote:

Re: Preventing index corruption

2008-06-29 Thread Eran Sevi
is from the "IndexWriter.addIndexes(Directory[])" documentation: > > > > "This method is transactional in how Exceptions are handled: it does not > > commit a new segments_N file until all indexes are added. This means if > an > > Exception occurs (for example

Re: Preventing index corruption

2008-06-26 Thread Eran Sevi
are fires and floods and earthquakes to consider > > Best > Erick > > On Thu, Jun 26, 2008 at 10:28 AM, Eran Sevi <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I'm looking for the correct way to create an index given the following > > restrictions:

Preventing index corruption

2008-06-26 Thread Eran Sevi
Hi, I'm looking for the correct way to create an index given the following restrictions: 1. The documents are received in batches of variable sizes (not more then 100 docs in a batch). 2. The batch insertion must be transactional - either the whole batch is added to the index (exists physically o

Best way to get payloads

2008-05-22 Thread Eran Sevi
Hi, I'm running a SpanQuery and get the Spans result which tell me the documents and positions of what I searched for. I would now like to get the payloads in those documents and positions without having to iterate on TermPositions since I don't have a term but I do have the document and position.

Re: Filtering a SpanQuery

2008-05-12 Thread Eran Sevi
ednesday 07 May 2008 10:18:38 schreef Eran Sevi: > > Thanks Paul for your reply, > > > > Since my index contains a couple of millions documents and the filter > > is supposed to limit the search space to a few thousands I was hoping > > I won't have to do the filtering m

Re: Filtering a SpanQuery

2008-05-07 Thread Eran Sevi
: > > Eran, > > > > Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi: > > > Hi, > > > > > > I am looking for a way to filter a SpanQuery according to some > > > other query (on another field from the one used for the SpanQuery). > > >

Filtering a SpanQuery

2008-05-06 Thread Eran Sevi
Hi, I am looking for a way to filter a SpanQuery according to some other query (on another field from the one used for the SpanQuery). I need to get access to the spans themselves of course. I don't care about the scoring of the filter results and just need the positions of hits found in the docu

Re: Sorting consumes hundreds of MBytes RAM

2008-04-26 Thread Eran Sevi
If you read the payloads in sequence they're not arranged by their original position whereas when you use a stored field you get the terms in the correct order. If you need to sort the values it doesn't matter of course. On Fri, Apr 25, 2008 at 5:42 PM, Nadav Har'El <[EMAIL PROTECTED]> wrote: > On

Re: Multi process writer access to an index

2008-03-19 Thread Eran Sevi
e would be good). The code you're executing when you get the error. Imagine you're trying to advise someone else and think about what you'd find useful and try to provide that, please. Best Erick On Wed, Mar 19, 2008 at 9:54 AM, Eran Sevi <[EMAIL PROTECTED]> wro

Multi process writer access to an index

2008-03-19 Thread Eran Sevi
Hi, I'm trying to write to a specific index from several different processes and encounter problems with locked files (deletable for example). I don't perform any specific locking because as I understand it there should be file-specific locking mechanism used by lucene API. This doesn't seem to be

Multi process writer access to an index

2008-03-19 Thread Eran Sevi
Hi, I'm trying to write to a specific index from several different processes and encounter problems with locked files (deletable for example). I don't perform any specific locking because as I understand it there should be file-specific locking mechanism used by lucene API. This doesn't seem to

Re: Specialized XML handling in Lucene

2008-03-12 Thread Eran Sevi
Indeed it seems like a problematic way. I would also have a problem searching for documents with more then one value. if the query is something simple like : "value1 AND value2" I would expect to get all xml docs with both values, but if I use the doc=element method, I won't get any result because

Query for "Bigger then" specific term

2008-03-11 Thread Eran Sevi
Hi, What's the best way to query Lucene for a "bigger then" term, for example " value > 10". I know there's a range query where I can use a large upper bound but maybe there's something more efficient (instead of Lucene transfrom to query to thousands of OR queries). Thanks, Eran.

Re: Specialized XML handling in Lucene

2008-03-11 Thread Eran Sevi
? Thanks in advance. On Tue, Mar 11, 2008 at 5:48 PM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Eran, see my comments below inline: > > On 03/11/2008 at 9:23 AM, Eran Sevi wrote: > > I would like to ask for suggestions of the best design for > > the following scen

Specialized XML handling in Lucene

2008-03-11 Thread Eran Sevi
Hi, I would like to ask for suggestions of the best design for the following scenario: I have a very large number of XML files (around 1M). Each file contains several sections. Each section contains many elements (about 1000-5000). Each element has a value and some attributes describing the value