subject:"Re\: Search"

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner

I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-DoesLucenesupportauto-suggest/autocomplete? I will also try to provide an example, for example https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36 https://

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner

Am 08.10.21 um 18:49 schrieb Michael Sokolov: Thank you for offering to add to the FAQ! Indeed it should mention the suggester capability. I think you have permissions to edit that wiki? yes :-) Please go ahead and I think add a link to the suggest module javadocs ok, will do! Thanks M

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Sokolov

Thank you for offering to add to the FAQ! Indeed it should mention the suggester capability. I think you have permissions to edit that wiki? Please go ahead and I think add a link to the suggest module javadocs On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner wrote: > > Thanks very much for your fe

Re: Search while typing (incremental search)

2021-10-06 Thread Michael Wechner

Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense? - "Does Lucene support incremental search?" - "Does Lucene supp

Re: Search while typing (incremental search)

2021-10-06 Thread Robert Muir

TLDR: use the lucene suggest/ package. Start with building suggester from your query logs (either a file or index them). These have a lot of flexibility about how the matches happen, for example pure prefixes, edit distance typos, infix matching, analysis chain, even now Japanese input-method integ

RE: Search results/criteria validation

2021-03-17 Thread Siraj Haider

Ceccarelli Subject: Re: Search results/criteria validation See https://issues.apache.org/jira/browse/LUCENE-9640 On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht wrote: > > Explain is a heavyweight thing. Maybe it helps you, maybe you need > something high-performance. > > I was a

RE: Search results/criteria validation

2021-03-17 Thread Siraj Haider

Thanks for the response Paul, it would be great if you can point me to that discussion. -- Regards -Siraj Haider (212) 306-0154 -Original Message- From: Paul Libbrecht Sent: Wednesday, March 17, 2021 4:02 PM To: java-user@lucene.apache.org; Diego Ceccarelli Subject: Re: Search

Re: Search results/criteria validation

2021-03-17 Thread Michael Sokolov

See https://issues.apache.org/jira/browse/LUCENE-9640 On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht wrote: > > Explain is a heavyweight thing. Maybe it helps you, maybe you need > something high-performance. > > I was asking a similar question ~10 years ago and got a very interesting > answer on

Re: Search results/criteria validation

2021-03-17 Thread Paul Libbrecht

Explain is a heavyweight thing. Maybe it helps you, maybe you need something high-performance. I was asking a similar question ~10 years ago and got a very interesting answer on this list. If you want I can try to dig this to find it. At the end, and with some limitation in the number of queri

Re: Search in lines, so need to index lines?

2018-08-02 Thread Tomoko Uchida

August 1, 2018 2:35 PM > To: java-user@lucene.apache.org > Subject: Re: Search in lines, so need to index lines? > > Ira, > > I do not understand your requirements, but essentially lucene is not for > regex searching. > There are tools for fast regular expression search, if you do

RE: Search in lines, so need to index lines?

2018-08-01 Thread Gordin, Ira

:35 PM To: java-user@lucene.apache.org Subject: Re: Search in lines, so need to index lines? Ira, I do not understand your requirements, but essentially lucene is not for regex searching. There are tools for fast regular expression search, if you do not satisfy with java standard library, for

Re: Search in lines, so need to index lines?

2018-08-01 Thread Michael Sokolov

> > > Hi Tomoko, > > > > I need to search in many files and we use Lucene for this purpose. > > > > Thanks, > > Ira > > > > -Original Message- > > From: Tomoko Uchida > > Sent: Wednesday, August 1, 2018 1:49 PM > > To: jav

Re: Search in lines, so need to index lines?

2018-08-01 Thread Tomoko Uchida

. Tomoko 2018年8月1日(水) 20:01 Gordin, Ira : > Hi Tomoko, > > I need to search in many files and we use Lucene for this purpose. > > Thanks, > Ira > > -Original Message- > From: Tomoko Uchida > Sent: Wednesday, August 1, 2018 1:49 PM > To: java-user@lucene.ap

Re: Search in lines, so need to index lines?

2018-08-01 Thread Robert Muir

Wednesday, August 1, 2018 1:49 PM > To: java-user@lucene.apache.org > Subject: Re: Search in lines, so need to index lines? > > Hi Ira, > >> I am trying to implement regex search in file > > Why are you using Lucene for regular expression search? > You can impleme

RE: Search in lines, so need to index lines?

2018-08-01 Thread Gordin, Ira

Hi Tomoko, I need to search in many files and we use Lucene for this purpose. Thanks, Ira -Original Message- From: Tomoko Uchida Sent: Wednesday, August 1, 2018 1:49 PM To: java-user@lucene.apache.org Subject: Re: Search in lines, so need to index lines? Hi Ira, > I am trying

Re: Search in lines, so need to index lines?

2018-08-01 Thread Tomoko Uchida

ile the same as in editors, in > Notepad++ for example. > > Thanks, > Ira > > -Original Message- > From: Uwe Schindler > Sent: Tuesday, July 31, 2018 6:12 PM > To: java-user@lucene.apache.org > Subject: RE: Search in lines, so need to index lines? > > Hi

RE: Search in lines, so need to index lines?

2018-07-31 Thread Gordin, Ira

Hi Uwe, I am trying to implement regex search in file the same as in editors, in Notepad++ for example. Thanks, Ira -Original Message- From: Uwe Schindler Sent: Tuesday, July 31, 2018 6:12 PM To: java-user@lucene.apache.org Subject: RE: Search in lines, so need to index lines? Hi

RE: Search in lines, so need to index lines?

2018-07-31 Thread Uwe Schindler

Hi, you need to create your own tokenizer that splits tokens on \n or \r. Instead of using WhitespaceTokenizer, you can use: Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r' || ch=='\n'); But I would first think of how to implement the whole thing correctly. Using a re

Re: search any field name having a specific value

2017-03-17 Thread Marco Reis

You can add a new field called "full_text" and during the indexing time you concatenate all the values of the other fields in it. Do you think it's a good idea for this case? On Fri, Mar 17, 2017 at 6:27 PM Lokesh Madan wrote: > May be, index the field names as metadata file. When when queryin

Re: search any field name having a specific value

2017-03-17 Thread Lokesh Madan

May be, index the field names as metadata file. When when querying, first get list of alls fields and then shoot a query. You can do this 2 hop query, or else maintain some cache and then shoot a query. > On Mar 17, 2017, at 11:53 AM, Cristian Lorenzetto > wrote: > > It permits to search in a

Re: search any field name having a specific value

2017-03-17 Thread Ahmet Arslan

Hi, You can retrieve the list of field names using LukeRequestHandler. Ahmet On Friday, March 17, 2017 9:53 PM, Cristian Lorenzetto wrote: It permits to search in a predefined lists of fields that you have to know in advance. In my case i dont know what is the fieldname. maybe WildcardQuer

Re: search any field name having a specific value

2017-03-17 Thread Corbin, J.D.

Hi, I am not sure if there is a way to specify a search against all fields in the index without knowing the fields. WildcardQuery probably won't work since it does target a specific field within the index. The specification of the index field comes in the definition of the Term that is passed as

Re: search any field name having a specific value

2017-03-17 Thread Cristian Lorenzetto

It permits to search in a predefined lists of fields that you have to know in advance. In my case i dont know what is the fieldname. maybe WildcardQuery? 2017-03-17 19:30 GMT+01:00 Corbin, J.D. : > You might take a look at MultiFieldQueryParser. I believe it allows you > to search multiple inde

Re: search any field name having a specific value

2017-03-17 Thread Corbin, J.D.

You might take a look at MultiFieldQueryParser. I believe it allows you to search multiple index fields at the same time. J.D. Corbin Senior Research Engineer Advanced Computing & Data Science Lab 3075 W. Ray Road Suite 200 Chandler, AZ 85226-2495 USA M: (303) 912-0958 E: jd.cor...@pears

Re: Search Performance with NRT

2015-05-27 Thread kiwi clive

Hi Mike, Thanks for the very prompt and clear response. We look forward to using the new (new for us) Lucenene goodies :-) Clive From: Michael McCandless To: Lucene Users ; kiwi clive Sent: Thursday, May 28, 2015 2:34 AM Subject: Re: Search Performance with NRT As long as you

Re: Search Performance with NRT

2015-05-27 Thread Michael McCandless

As long as you call SM.maybeRefresh from a dedicated refresh thread (not from a query's thread) it will work well. You may want to use a warmer so that the new searcher is warmed before becoming visible to incoming queries ... this ensures any lazy data structures are initialized by the time a que

Re: search on a field by a single word

2015-02-11 Thread wangdong

thanks a lot for your answer i find the second way may be useful to me. it is new to me and i will try it. thanks andrew 在 2015/2/11 21:39, Ian Lea 写道: If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be a

Re: search on a field by a single word

2015-02-11 Thread Ian Lea

If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be aware that it is exact: if you do nothing else, a search for "a" will not match "A" or "A ". Or you could so something with start and end markers e.g. index yo

Re: Search "_all" field with a term

2014-10-11 Thread haiwei.xie-soulinfo

> You should ask this on the elasticsearch mailing list. > > BTW, look at elasticsearch copy_to feature. Better than _all field. > > My 2 cents. I will try it. thanks. > > > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo"

RE: Search "_all" field with a term

2014-10-11 Thread Uwe Schindler

Hi, > 'Internally this is indexing every field a second time into the "_all" > field.' > This sentence mean second indexing has total different analyzer and > indexing compared with my first indexing? Exactly. > So I need rewrite the second > process to fix my problem? In Elasticsearc

Re: Search "_all" field with a term

2014-10-11 Thread David Pilato

You should ask this on the elasticsearch mailing list. BTW, look at elasticsearch copy_to feature. Better than _all field. My 2 cents. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo" a > écrit : > > Hi, > > Thanks for y

Re: Search "_all" field with a term

2014-10-11 Thread haiwei.xie-soulinfo

Hi, Thanks for your advise, SimpleQueryParser api is not enough in my case. Actually, I want to index data from database, there are so many fields, I have tested "_all" parameter in ElasticSearch system, but the result of '_all' and 'fieldname' are different for chinese term. 'Interna

RE: Search "_all" field with a term

2014-10-11 Thread Uwe Schindler

Hi, by default there is no "_all" field. E.g., Elasticsearch adds this special field depending on your index mapping at the time of indexing the data. Internally this is indexing every field a second time into the "_all" field. With Lucene you have to do this on yourself. An alternative would

RE: Search with term intersection

2014-10-10 Thread aurelien . mazoyer

Hi Mike and Uwe, Thank you for your answers. It is clear, now. Regards, Aurélien On 10.10.2014 12:32, Uwe Schindler wrote: Hi, every segment is executed on its own (every segment is its own index). Every segment returns its own document ids and the result is the union of them ranked by score

RE: Search with term intersection

2014-10-10 Thread Uwe Schindler

Hi, every segment is executed on its own (every segment is its own index). Every segment returns its own document ids and the result is the union of them ranked by score using a PriorityQueue. There is no cross-segment term dictionary and posting lists in Lucene. It was like that before Lucene

Re: Search with term intersection

2014-10-10 Thread Michael McCandless

By intersection, do you mean a MUST clause on a BooleanQuery? Lucene uses "doc at a time" scoring, so for BooleanQuery, all MUST'd clauses are visiting the same doc (if they match) at a time, so we do the intersection for that document all at once, within each segment, across the N clauses. Mike

Re: search performance

2014-06-20 Thread Vitaly Funstein

If you are using stored fields in your index, consider playing with compression settings, or perhaps turning stored field compression off altogether. Ways to do this have been discussed in this forum on numerous occasions. This is highly use case dependent though, as your indexing performance may o

RE: search performance

2014-06-20 Thread Uwe Schindler

Hi, > Am I correct that using SearchManager can't be used with a MultiReader and > NRT? I would appreciate all suggestions on how to optimize our search > performance further. Search time has become a usability issue. Just have a SearcherManger for every index. MultiReader construction is cheap

Re: search performance

2014-06-20 Thread Jamie

Greetings Lucene Users As a follow-up to my earlier mail: We are also using Lucene segment warmers, as per recommendation, segments per tier is now set to five, buffer memory is set to (Runtime.getRuntime().totalMemory()*.08)/1024/1024; See below for code used to instantiate writer:

Re: search performance

2014-06-20 Thread Jamie

Hi All Thank you for all your suggestions. Some of the recommendations hadn't yet been implemented, as our code base was using older versions of Lucene with reduced capabilities. Thus, far, all the recommendations for fast search have been implemented (e.g. using pagination with searchAfter,

RE: Search degradation on Windows when upgrading from lucene 3.6 to lucene 4.7.2

2014-06-18 Thread De Simone, Alessandro

Hi! We have switched from Lucene 3.6 to >=Lucene 4.7 (java7) and we are also experiencing a distinct slowdown using the same dataset. We are running the software under Windows 2008R2. In our case, we have identified that there a lot more IO calls (= number of time the buffer is refilled in Ind

Re: search performance

2014-06-06 Thread Jamie

Jon I ended up adapting your approach. The solution involves keeping a LRU cache of page boundary scoredocs and their respective positions. New positions are added to the cache as new pages are discovered. To cut down on searches, when scrolling backwards and forwards, the search begins from

RE: search performance

2014-06-03 Thread Toke Eskildsen

Jamie [ja...@mailarchiva.com] wrote: > It would be nice if, in future, the Lucene API could provide a > searchAfter that takes a position (int). It would not really help with large result sets. At least not with the current underlying implementations. This is tied into your current performance pr

Re: search performance

2014-06-03 Thread Jamie

Thanks Jon I'll investigate your idea further. It would be nice if, in future, the Lucene API could provide a searchAfter that takes a position (int). Regards Jamie On 2014/06/03, 3:24 PM, Jon Stewart wrote: With regards to pagination, is there a way for you to cache the IndexSearcher, Que

Re: search performance

2014-06-03 Thread Jon Stewart

With regards to pagination, is there a way for you to cache the IndexSearcher, Query, and TopDocs between user pagination requests (a lot of webapp frameworks have object caching mechanisms)? If so, you may have luck with code like this: void ensureTopDocs(final int rank) throws IOException {

Re: search performance

2014-06-03 Thread Jamie

Robert. Thanks, I've already done a similar thing. Results on my test platform are encouraging.. On 2014/06/03, 2:41 PM, Robert Muir wrote: Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good).

Re: search performance

2014-06-03 Thread Robert Muir

Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good). Instead consider making it near-realtime, by doing this every second or so instead. Look at SearcherManager for code that helps you do this. O

Re: search performance

2014-06-03 Thread Jamie

Robert FYI: I've modified the code to utilize the experimental function.. DirectoryReader dirReader = DirectoryReader.openIfChanged(cachedDirectoryReader,writer, true); In this case, the IndexReader won't be opened on each search, unless absolutely necessary. Regards Jamie On 2014/06

Re: search performance

2014-06-03 Thread Jamie

Robert Hmmm. why did Mike go to all the trouble of implementing NRT search, if we are not supposed to be using it? The user simply wants the latest result set. To me, this doesn't appear out of scope for the Lucene project. Jamie On 2014/06/03, 1:17 PM, Robert Muir wrote: No, you are

Re: search performance

2014-06-03 Thread Robert Muir

No, you are incorrect. The point of a search engine is to return top-N most relevant. If you insist you need to open an indexreader on every single search, and then return huge amounts of docs, maybe you should use a database instead. On Tue, Jun 3, 2014 at 6:42 AM, Jamie wrote: > Vitality / Rob

Re: search performance

2014-06-03 Thread Jamie

Vitality / Robert I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. Unless I am mistaken, the Lucene library's pagination mechanism, makes the assumption that you will cache the scoredocs for the entire result set. This is not practical when you have a result set that e

Re: search performance

2014-06-03 Thread Vitaly Funstein

Jamie, What if you were to forget for a moment the whole pagination idea, and always capped your search at 1000 results for testing purposes only? This is just to try and pinpoint the bottleneck here; if, regardless of the query parameters, the search latency stays roughly the same and well below

Re: search performance

2014-06-03 Thread Robert Muir

Check and make sure you are not opening an indexreader for every search. Be sure you don't do that. On Mon, Jun 2, 2014 at 2:51 AM, Jamie wrote: > Greetings > > Despite following all the recommended optimizations (as described at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in so

Re: search performance

2014-06-03 Thread Jamie

Vitaly See below: On 2014/06/03, 12:09 PM, Vitaly Funstein wrote: A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is

Re: search performance

2014-06-03 Thread Vitaly Funstein

A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is at best futile, and at worst quite detrimental to responsiveness and o

Re: search performance

2014-06-03 Thread Jamie

FYI: We are also using a multireader to search over multiple index readers. Search under a million documents yields good response times. When you get into the 60M territory, search slows to a crawl. On 2014/06/03, 11:47 AM, Jamie wrote: Sure... see below: --

Re: search performance

2014-06-03 Thread Jamie

Sure... see below: protected void search(Query query, Filter queryFilter, Sort sort) throws BlobSearchException { try { logger.debug("start search {searchquery='" + getSearchQuery() + "',query='"+query.toString()+"',filterQuery='"+queryFilter+"',sort='"+sort

Re: search performance

2014-06-03 Thread Rob Audenaerde

Hi Jamie, What is included in the 5 minutes? Just the call to the searcher? seacher.search(...) ? Can you show a bit more of the code you use? On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote: > Vitaly > > Thanks for the contribution. Unfortunately, we cannot use Lucene's > pagination function

Re: search performance

2014-06-03 Thread Jamie

Vitaly Thanks for the contribution. Unfortunately, we cannot use Lucene's pagination function, because in reality the user can skip pages to start the search at any point, not just from the end of the previous search. Even the first search (without any pagination), with a max of 1000 hits, tak

Re: search performance

2014-06-03 Thread Vitaly Funstein

Something doesn't quite add up. TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, > false, false, true); > > We use pagination, so only returning 1000 documents or so at a time. > > You say you are using pagination, yet the API you are using to create your collector isn't

Re: search performance

2014-06-03 Thread Jamie

Toke Thanks for the contact. See below: On 2014/06/03, 9:17 AM, Toke Eskildsen wrote: On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different sy

Re: search performance

2014-06-03 Thread Toke Eskildsen

On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: > Unfortunately, in this instance, it is a live production system, so we > cannot conduct experiments. The number is definitely accurate. > > We have many different systems with a similar load that observe the same > performance issue. To my knowle

Re: search performance

2014-06-02 Thread Christoph Kaser

Can you take thread stacktraces (repeatedly) during those 5 minute searches? That might give you (or someone on the mailing list) a clue where all that time is spent. You could try using jstack for that: http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html Regards Christoph

Re: search performance

2014-06-02 Thread Jamie

Toke Thanks for the comment. Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different systems with a similar load that observe the same performance issue. To my knowledge, the Lucene integrati

Re: search performance

2014-06-02 Thread Toke Eskildsen

On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote: [200GB, 150M documents] > With NRT enabled, search speed is roughly 5 minutes on average. > The server resources are: > 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. 5 minutes is extremely long. Is that really the right number

Re: search performance

2014-06-02 Thread Tri Cao

This is an interesting performance problem and I think there is probably not a single answer here, so I'll just layout the steps I would take to tackle this: 1. What is the variance of the query latency? You said the average is 5 minutes, but is it due to some really bad queries or most queries h

Re: search performance

2014-06-02 Thread Jamie

I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's u

Re: search performance

2014-06-02 Thread Tincu Gabriel

My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd sti

Re: search performance

2014-06-02 Thread Jamie

I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that

Re: search performance

2014-06-02 Thread Tincu Gabriel

MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM

Re: search performance

2014-06-02 Thread Jamie

Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are t

Re: search performance

2014-06-02 Thread Jack Krupansky

Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB

Re: search performance

2014-06-02 Thread Jamie

Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08 deliver

Re: search performance

2014-06-02 Thread Tincu Gabriel

What kind of queries are you pushing into the index. Do they match a lot of documents ? Do you do any sorting on the result set? What is the average document size ? Do you have a lot of update traffic ? What kind of schema does your index use ? On Mon, Jun 2, 2014 at 6:51 AM, Jamie wrote: > Gre

RE: search time & number of segments

2014-05-21 Thread De Simone, Alessandro

i 2014 22:09 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have stopped optimizing the index because everybody told us it was a bad > idea. > It makes sense if you think about it. When

RE: search time & number of segments

2014-05-20 Thread Toke Eskildsen

De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have stopped optimizing the index because everybody told us it was a bad > idea. > It makes sense if you think about it. When you reopen the index not all > segments must be reopened then you have: > (1) better reload time >

RE: search time & number of segments

2014-05-20 Thread De Simone, Alessandro

ig impact on performance. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: mardi 20 mai 2014 15:46 To: java-user@lucene.apache.org Subject: Re: search time & number of segments On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote: Tok

Re: search time & number of segments

2014-05-20 Thread Toke Eskildsen

On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote: Toke: > > Using the calculator, I must admit that it is puzzling that you have > 2432 / 143 = 17.001 times the amount of seeks with 16 segments. > > Do you have any clue? Is there something I could test? If your segmented index was

RE: search time & number of segments

2014-05-20 Thread De Simone, Alessandro

iginal Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: lundi 19 mai 2014 16:43 To: java-user@lucene.apache.org Subject: Re: search time & number of segments On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote: [24GB index, 8GB disk cache, only indexed fields] &

Re: search time & number of segments

2014-05-19 Thread Toke Eskildsen

On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote: [24GB index, 8GB disk cache, only indexed fields] > The "IO calls" I was referring to is the number of time the > "BufferedIndexInput.refill()" function is called. So it means that we > have 16 times more bytes read when there are 16

RE: search time & number of segments

2014-05-19 Thread De Simone, Alessandro

iginal Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: samedi 17 mai 2014 20:04 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimiz

RE: search time & number of segments

2014-05-17 Thread Toke Eskildsen

De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimizing the index. We > are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on > Windows 2008R2. How much RAM does your search machine have? > For instance, a s

Re: Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Ian Lea

If you're using Solr you'd be better off asking this on the Solr list: http://lucene.apache.org/solr/discussion.html. You might also like to clarify what you want with regard to sentence vs document. If you want to display the sentences of a matched doc, surely you just do it: store what you need

Re: Search in a specific ScoreDiopoc result

2013-09-17 Thread Thomas Guttesen

Kkkutterujjjbbb hgggja Den 17/09/2013 12.55 skrev "David Miranda" : > > Hi, > > I want to do a kind of 'facet search', that initial research in a field of > all documents in the Lucene index, and second search in other field of the > documents returned to the first research. > > Currently I'm do th

Re: Search in a specific ScoreDoc result

2013-09-17 Thread Erick Erickson

Why not? You can use a standard query as a filter query from the Solr side, so it's got to be possible in Lucene. What about using filters doesn't seem to work for this case? Best, Erick On Tue, Sep 17, 2013 at 6:54 AM, David Miranda wrote: > Hi, > > I want to do a kind of 'facet search', that

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-28 Thread Ankit Murarka

Bingo..!!..Your solution worked for me. Thanks a ton. I went through queryparser so many number of times never knew it can server the purpose so easily. Never figured out the true significance as I thought I can always create a normal PhraseQuery with PhraseQuery pq=new PhraseQuery() and then

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-27 Thread Michael McCandless

On Sat, Jul 27, 2013 at 3:20 AM, Ankit Murarka wrote: > Ok.I went through the Javadoc of PhraseQuery and tried using position > argument to phrasequery. > > Problem encountered: > > My text contains : Still it is not happening and generally i will be able to > complete it at the earliest. > > The

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-27 Thread Ankit Murarka

Ok.I went through the Javadoc of PhraseQuery and tried using position argument to phrasequery. Problem encountered: My text contains : Still it is not happening and generally i will be able to complete it at the earliest. The user enters search string : 1. still happening and 2. still it is

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-26 Thread Michael McCandless

Have a look at the position argument to PhraseQuery.add: it lets you control where this new term is in the phrase. So to search for "wizard of oz" when of is a stopword you would add "wizard" at position 0 and "oz" at position 2. This is different from slop, which allows for "fuzzy" matching of t

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-26 Thread Ankit Murarka

Hello can you elaborate more on this.. I seem to be lost over here.. Since I am new to lucene, so yesterday I was going through ShingleFilter and its application. Seems like its a kind of a N-Gram thing and it bloats the index as Mike have mentioned. As of now I am only concerned with the app

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless

With PhraseQuery you can specify where each term must occur in the phrase. So X must occur in position 0, David in position 1, and then manager in position 4 (skipping 2 holes). QueryParser does this for you: when it analyzes the users phrase, if the resulting tokens have holes, then it sets the

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Dawn Zoë Raison

Did you consider using shingles? It solves the "to be or not to be" problem quite nicely. Dawn On 24/07/2013 12:34, Ankit Murarka wrote: I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The w

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Ankit Murarka

I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The whole string to be searched is still 1 single term. How do I skip the holes created by stopwords. I do not know before hand how many stop

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless

PhraseQuery? You can skip the holes created by stopwords ... e.g. QueryParser does this. Ie, the PhraseQuery becomes "X David _ _ manager _ _ company" if is/a/of/the are stop words, which isn't perfect (could return false matches) but should work well in practice ... Mike McCandless http://blog

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward

IIRC, SpanQueries try and match on the smallest interval possible. So if you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 09:56, Sébastien Druon wrote: > Thanks Alan, > > Do you know if the search would exclude other

Re: Search for a token appearing after another

2013-07-09 Thread Sébastien Druon

Thanks Alan, Do you know if the search would exclude other occurences of T1 between T1 and T2? ex: T1 (...)* T1 (...)* T2 would not match? Thanks again Sébastien On 9 July 2013 09:48, Alan Woodward wrote: > You can use Integer.MAX_VALUE as the slop parameter. > > Alan Woodward > www.flax.co

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward

You can use Integer.MAX_VALUE as the slop parameter. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 07:55, Sébastien Druon wrote: > Hello, > > I am looking for a way to search for a token appearing after another and > retrieve tehir positions. > > ex: T1 (...)* T2 > > I know the SpanTermQuer

RE: search-time facetting in Lucene

2013-05-06 Thread Toke Eskildsen

kiwi clive [kiwi_cl...@yahoo.com]: > Thanks very much for the reply. I see there is not a quick win here but as > we are going through an index consolidation process, it may pay to make > the leap to 4.3 and put in facetting while I'm in there. We will get facetting > slowly through the back door w

Re: search-time facetting in Lucene

2013-05-06 Thread Shai Erera

he time to explain the situation. > > Clive > > > > > From: Shai Erera > To: "java-user@lucene.apache.org" ; kiwi > clive > Sent: Monday, May 6, 2013 5:56 AM > Subject: Re: search-time facetting in Lucene > > > Hi Clive, > &

Re: search-time facetting in Lucene

2013-05-06 Thread kiwi clive

i clive Sent: Monday, May 6, 2013 5:56 AM Subject: Re: search-time facetting in Lucene Hi Clive, In order to use Lucene facets you need to make indexing time decisions. It's not that you don't make these decisions anyway, even with Solr -- for example, you need to decide how to tokeniz

1 2 3 4 5 6 7 8 >

1 - 100 of 722 matches

Mail list logo