Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner
I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-DoesLucenesupportauto-suggest/autocomplete? I will also try to provide an example, for example https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36 https://

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner
Am 08.10.21 um 18:49 schrieb Michael Sokolov: Thank you for offering to add to the FAQ! Indeed it should mention the suggester capability. I think you have permissions to edit that wiki? yes :-) Please go ahead and I think add a link to the suggest module javadocs ok, will do! Thanks M

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Sokolov
Thank you for offering to add to the FAQ! Indeed it should mention the suggester capability. I think you have permissions to edit that wiki? Please go ahead and I think add a link to the suggest module javadocs On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner wrote: > > Thanks very much for your fe

Re: Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense? - "Does Lucene support incremental search?" - "Does Lucene supp

Re: Search while typing (incremental search)

2021-10-06 Thread Robert Muir
TLDR: use the lucene suggest/ package. Start with building suggester from your query logs (either a file or index them). These have a lot of flexibility about how the matches happen, for example pure prefixes, edit distance typos, infix matching, analysis chain, even now Japanese input-method integ

RE: Search results/criteria validation

2021-03-17 Thread Siraj Haider
Ceccarelli Subject: Re: Search results/criteria validation See https://issues.apache.org/jira/browse/LUCENE-9640 On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht wrote: > > Explain is a heavyweight thing. Maybe it helps you, maybe you need > something high-performance. > > I was a

RE: Search results/criteria validation

2021-03-17 Thread Siraj Haider
Thanks for the response Paul, it would be great if you can point me to that discussion. -- Regards -Siraj Haider (212) 306-0154 -Original Message- From: Paul Libbrecht Sent: Wednesday, March 17, 2021 4:02 PM To: java-user@lucene.apache.org; Diego Ceccarelli Subject: Re: Search

Re: Search results/criteria validation

2021-03-17 Thread Michael Sokolov
See https://issues.apache.org/jira/browse/LUCENE-9640 On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht wrote: > > Explain is a heavyweight thing. Maybe it helps you, maybe you need > something high-performance. > > I was asking a similar question ~10 years ago and got a very interesting > answer on

Re: Search results/criteria validation

2021-03-17 Thread Paul Libbrecht
Explain is a heavyweight thing. Maybe it helps you, maybe you need something high-performance. I was asking a similar question ~10 years ago and got a very interesting answer on this list. If you want I can try to dig this to find it. At the end, and with some limitation in the number of queri

Re: Search in lines, so need to index lines?

2018-08-02 Thread Tomoko Uchida
August 1, 2018 2:35 PM > To: java-user@lucene.apache.org > Subject: Re: Search in lines, so need to index lines? > > Ira, > > I do not understand your requirements, but essentially lucene is not for > regex searching. > There are tools for fast regular expression search, if you do

RE: Search in lines, so need to index lines?

2018-08-01 Thread Gordin, Ira
:35 PM To: java-user@lucene.apache.org Subject: Re: Search in lines, so need to index lines? Ira, I do not understand your requirements, but essentially lucene is not for regex searching. There are tools for fast regular expression search, if you do not satisfy with java standard library, for

Re: Search in lines, so need to index lines?

2018-08-01 Thread Michael Sokolov
> > > Hi Tomoko, > > > > I need to search in many files and we use Lucene for this purpose. > > > > Thanks, > > Ira > > > > -Original Message- > > From: Tomoko Uchida > > Sent: Wednesday, August 1, 2018 1:49 PM > > To: jav

Re: Search in lines, so need to index lines?

2018-08-01 Thread Tomoko Uchida
. Tomoko 2018年8月1日(水) 20:01 Gordin, Ira : > Hi Tomoko, > > I need to search in many files and we use Lucene for this purpose. > > Thanks, > Ira > > -Original Message- > From: Tomoko Uchida > Sent: Wednesday, August 1, 2018 1:49 PM > To: java-user@lucene.ap

Re: Search in lines, so need to index lines?

2018-08-01 Thread Robert Muir
Wednesday, August 1, 2018 1:49 PM > To: java-user@lucene.apache.org > Subject: Re: Search in lines, so need to index lines? > > Hi Ira, > >> I am trying to implement regex search in file > > Why are you using Lucene for regular expression search? > You can impleme

RE: Search in lines, so need to index lines?

2018-08-01 Thread Gordin, Ira
Hi Tomoko, I need to search in many files and we use Lucene for this purpose. Thanks, Ira -Original Message- From: Tomoko Uchida Sent: Wednesday, August 1, 2018 1:49 PM To: java-user@lucene.apache.org Subject: Re: Search in lines, so need to index lines? Hi Ira, > I am trying

Re: Search in lines, so need to index lines?

2018-08-01 Thread Tomoko Uchida
ile the same as in editors, in > Notepad++ for example. > > Thanks, > Ira > > -Original Message- > From: Uwe Schindler > Sent: Tuesday, July 31, 2018 6:12 PM > To: java-user@lucene.apache.org > Subject: RE: Search in lines, so need to index lines? > > Hi

RE: Search in lines, so need to index lines?

2018-07-31 Thread Gordin, Ira
Hi Uwe, I am trying to implement regex search in file the same as in editors, in Notepad++ for example. Thanks, Ira -Original Message- From: Uwe Schindler Sent: Tuesday, July 31, 2018 6:12 PM To: java-user@lucene.apache.org Subject: RE: Search in lines, so need to index lines? Hi

RE: Search in lines, so need to index lines?

2018-07-31 Thread Uwe Schindler
Hi, you need to create your own tokenizer that splits tokens on \n or \r. Instead of using WhitespaceTokenizer, you can use: Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r' || ch=='\n'); But I would first think of how to implement the whole thing correctly. Using a re

Re: search any field name having a specific value

2017-03-17 Thread Marco Reis
You can add a new field called "full_text" and during the indexing time you concatenate all the values of the other fields in it. Do you think it's a good idea for this case? On Fri, Mar 17, 2017 at 6:27 PM Lokesh Madan wrote: > May be, index the field names as metadata file. When when queryin

Re: search any field name having a specific value

2017-03-17 Thread Lokesh Madan
May be, index the field names as metadata file. When when querying, first get list of alls fields and then shoot a query. You can do this 2 hop query, or else maintain some cache and then shoot a query. > On Mar 17, 2017, at 11:53 AM, Cristian Lorenzetto > wrote: > > It permits to search in a

Re: search any field name having a specific value

2017-03-17 Thread Ahmet Arslan
Hi, You can retrieve the list of field names using LukeRequestHandler. Ahmet On Friday, March 17, 2017 9:53 PM, Cristian Lorenzetto wrote: It permits to search in a predefined lists of fields that you have to know in advance. In my case i dont know what is the fieldname. maybe WildcardQuer

Re: search any field name having a specific value

2017-03-17 Thread Corbin, J.D.
Hi, I am not sure if there is a way to specify a search against all fields in the index without knowing the fields. WildcardQuery probably won't work since it does target a specific field within the index. The specification of the index field comes in the definition of the Term that is passed as

Re: search any field name having a specific value

2017-03-17 Thread Cristian Lorenzetto
It permits to search in a predefined lists of fields that you have to know in advance. In my case i dont know what is the fieldname. maybe WildcardQuery? 2017-03-17 19:30 GMT+01:00 Corbin, J.D. : > ​You might take a look at MultiFieldQueryParser. I believe it allows you > to search multiple inde

Re: search any field name having a specific value

2017-03-17 Thread Corbin, J.D.
​You might take a look at MultiFieldQueryParser. I believe it allows you to search multiple index fields at the same time. J.D. Corbin Senior Research Engineer Advanced Computing & Data Science Lab 3075 W. Ray Road Suite 200 Chandler, AZ 85226-2495 USA M: (303) 912-0958 E: jd.cor...@pears

Re: Search Performance with NRT

2015-05-27 Thread kiwi clive
Hi Mike, Thanks for the very prompt and clear response. We look forward to using the new (new for us) Lucenene goodies :-) Clive From: Michael McCandless To: Lucene Users ; kiwi clive Sent: Thursday, May 28, 2015 2:34 AM Subject: Re: Search Performance with NRT As long as you

Re: Search Performance with NRT

2015-05-27 Thread Michael McCandless
As long as you call SM.maybeRefresh from a dedicated refresh thread (not from a query's thread) it will work well. You may want to use a warmer so that the new searcher is warmed before becoming visible to incoming queries ... this ensures any lazy data structures are initialized by the time a que

Re: search on a field by a single word

2015-02-11 Thread wangdong
thanks a lot for your answer i find the second way may be useful to me. it is new to me and i will try it. thanks andrew 在 2015/2/11 21:39, Ian Lea 写道: If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be a

Re: search on a field by a single word

2015-02-11 Thread Ian Lea
If you only ever want to retrieve based on exact match you could index the name field using org.apache.lucene.document.StringField. Do be aware that it is exact: if you do nothing else, a search for "a" will not match "A" or "A ". Or you could so something with start and end markers e.g. index yo

Re: Search "_all" field with a term

2014-10-11 Thread haiwei.xie-soulinfo
> You should ask this on the elasticsearch mailing list. > > BTW, look at elasticsearch copy_to feature. Better than _all field. > > My 2 cents. I will try it. thanks. > > > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > > Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo"

RE: Search "_all" field with a term

2014-10-11 Thread Uwe Schindler
Hi, > 'Internally this is indexing every field a second time into the "_all" > field.' > This sentence mean second indexing has total different analyzer and > indexing compared with my first indexing? Exactly. > So I need rewrite the second > process to fix my problem? In Elasticsearc

Re: Search "_all" field with a term

2014-10-11 Thread David Pilato
You should ask this on the elasticsearch mailing list. BTW, look at elasticsearch copy_to feature. Better than _all field. My 2 cents. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo" a > écrit : > > Hi, > > Thanks for y

Re: Search "_all" field with a term

2014-10-11 Thread haiwei.xie-soulinfo
Hi, Thanks for your advise, SimpleQueryParser api is not enough in my case. Actually, I want to index data from database, there are so many fields, I have tested "_all" parameter in ElasticSearch system, but the result of '_all' and 'fieldname' are different for chinese term. 'Interna

RE: Search "_all" field with a term

2014-10-11 Thread Uwe Schindler
Hi, by default there is no "_all" field. E.g., Elasticsearch adds this special field depending on your index mapping at the time of indexing the data. Internally this is indexing every field a second time into the "_all" field. With Lucene you have to do this on yourself. An alternative would

RE: Search with term intersection

2014-10-10 Thread aurelien . mazoyer
Hi Mike and Uwe, Thank you for your answers. It is clear, now. Regards, Aurélien On 10.10.2014 12:32, Uwe Schindler wrote: Hi, every segment is executed on its own (every segment is its own index). Every segment returns its own document ids and the result is the union of them ranked by score

RE: Search with term intersection

2014-10-10 Thread Uwe Schindler
Hi, every segment is executed on its own (every segment is its own index). Every segment returns its own document ids and the result is the union of them ranked by score using a PriorityQueue. There is no cross-segment term dictionary and posting lists in Lucene. It was like that before Lucene

Re: Search with term intersection

2014-10-10 Thread Michael McCandless
By intersection, do you mean a MUST clause on a BooleanQuery? Lucene uses "doc at a time" scoring, so for BooleanQuery, all MUST'd clauses are visiting the same doc (if they match) at a time, so we do the intersection for that document all at once, within each segment, across the N clauses. Mike

Re: search performance

2014-06-20 Thread Vitaly Funstein
If you are using stored fields in your index, consider playing with compression settings, or perhaps turning stored field compression off altogether. Ways to do this have been discussed in this forum on numerous occasions. This is highly use case dependent though, as your indexing performance may o

RE: search performance

2014-06-20 Thread Uwe Schindler
Hi, > Am I correct that using SearchManager can't be used with a MultiReader and > NRT? I would appreciate all suggestions on how to optimize our search > performance further. Search time has become a usability issue. Just have a SearcherManger for every index. MultiReader construction is cheap

Re: search performance

2014-06-20 Thread Jamie
Greetings Lucene Users As a follow-up to my earlier mail: We are also using Lucene segment warmers, as per recommendation, segments per tier is now set to five, buffer memory is set to (Runtime.getRuntime().totalMemory()*.08)/1024/1024; See below for code used to instantiate writer:

Re: search performance

2014-06-20 Thread Jamie
Hi All Thank you for all your suggestions. Some of the recommendations hadn't yet been implemented, as our code base was using older versions of Lucene with reduced capabilities. Thus, far, all the recommendations for fast search have been implemented (e.g. using pagination with searchAfter,

RE: Search degradation on Windows when upgrading from lucene 3.6 to lucene 4.7.2

2014-06-18 Thread De Simone, Alessandro
Hi! We have switched from Lucene 3.6 to >=Lucene 4.7 (java7) and we are also experiencing a distinct slowdown using the same dataset. We are running the software under Windows 2008R2. In our case, we have identified that there a lot more IO calls (= number of time the buffer is refilled in Ind

Re: search performance

2014-06-06 Thread Jamie
Jon I ended up adapting your approach. The solution involves keeping a LRU cache of page boundary scoredocs and their respective positions. New positions are added to the cache as new pages are discovered. To cut down on searches, when scrolling backwards and forwards, the search begins from

RE: search performance

2014-06-03 Thread Toke Eskildsen
Jamie [ja...@mailarchiva.com] wrote: > It would be nice if, in future, the Lucene API could provide a > searchAfter that takes a position (int). It would not really help with large result sets. At least not with the current underlying implementations. This is tied into your current performance pr

Re: search performance

2014-06-03 Thread Jamie
Thanks Jon I'll investigate your idea further. It would be nice if, in future, the Lucene API could provide a searchAfter that takes a position (int). Regards Jamie On 2014/06/03, 3:24 PM, Jon Stewart wrote: With regards to pagination, is there a way for you to cache the IndexSearcher, Que

Re: search performance

2014-06-03 Thread Jon Stewart
With regards to pagination, is there a way for you to cache the IndexSearcher, Query, and TopDocs between user pagination requests (a lot of webapp frameworks have object caching mechanisms)? If so, you may have luck with code like this: void ensureTopDocs(final int rank) throws IOException {

Re: search performance

2014-06-03 Thread Jamie
Robert. Thanks, I've already done a similar thing. Results on my test platform are encouraging.. On 2014/06/03, 2:41 PM, Robert Muir wrote: Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good).

Re: search performance

2014-06-03 Thread Robert Muir
Reopening for every search is not a good idea. this will have an extremely high cost (not as high as what you are doing with "paging" but still not good). Instead consider making it near-realtime, by doing this every second or so instead. Look at SearcherManager for code that helps you do this. O

Re: search performance

2014-06-03 Thread Jamie
Robert FYI: I've modified the code to utilize the experimental function.. DirectoryReader dirReader = DirectoryReader.openIfChanged(cachedDirectoryReader,writer, true); In this case, the IndexReader won't be opened on each search, unless absolutely necessary. Regards Jamie On 2014/06

Re: search performance

2014-06-03 Thread Jamie
Robert Hmmm. why did Mike go to all the trouble of implementing NRT search, if we are not supposed to be using it? The user simply wants the latest result set. To me, this doesn't appear out of scope for the Lucene project. Jamie On 2014/06/03, 1:17 PM, Robert Muir wrote: No, you are

Re: search performance

2014-06-03 Thread Robert Muir
No, you are incorrect. The point of a search engine is to return top-N most relevant. If you insist you need to open an indexreader on every single search, and then return huge amounts of docs, maybe you should use a database instead. On Tue, Jun 3, 2014 at 6:42 AM, Jamie wrote: > Vitality / Rob

Re: search performance

2014-06-03 Thread Jamie
Vitality / Robert I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. Unless I am mistaken, the Lucene library's pagination mechanism, makes the assumption that you will cache the scoredocs for the entire result set. This is not practical when you have a result set that e

Re: search performance

2014-06-03 Thread Vitaly Funstein
Jamie, What if you were to forget for a moment the whole pagination idea, and always capped your search at 1000 results for testing purposes only? This is just to try and pinpoint the bottleneck here; if, regardless of the query parameters, the search latency stays roughly the same and well below

Re: search performance

2014-06-03 Thread Robert Muir
Check and make sure you are not opening an indexreader for every search. Be sure you don't do that. On Mon, Jun 2, 2014 at 2:51 AM, Jamie wrote: > Greetings > > Despite following all the recommended optimizations (as described at > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in so

Re: search performance

2014-06-03 Thread Jamie
Vitaly See below: On 2014/06/03, 12:09 PM, Vitaly Funstein wrote: A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is

Re: search performance

2014-06-03 Thread Vitaly Funstein
A couple of questions. 1. What are you trying to achieve by setting the current thread's priority to max possible value? Is it grabbing as much CPU time as possible? In my experience, mucking with thread priorities like this is at best futile, and at worst quite detrimental to responsiveness and o

Re: search performance

2014-06-03 Thread Jamie
FYI: We are also using a multireader to search over multiple index readers. Search under a million documents yields good response times. When you get into the 60M territory, search slows to a crawl. On 2014/06/03, 11:47 AM, Jamie wrote: Sure... see below: --

Re: search performance

2014-06-03 Thread Jamie
Sure... see below: protected void search(Query query, Filter queryFilter, Sort sort) throws BlobSearchException { try { logger.debug("start search {searchquery='" + getSearchQuery() + "',query='"+query.toString()+"',filterQuery='"+queryFilter+"',sort='"+sort

Re: search performance

2014-06-03 Thread Rob Audenaerde
Hi Jamie, What is included in the 5 minutes? Just the call to the searcher? seacher.search(...) ? Can you show a bit more of the code you use? On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote: > Vitaly > > Thanks for the contribution. Unfortunately, we cannot use Lucene's > pagination function

Re: search performance

2014-06-03 Thread Jamie
Vitaly Thanks for the contribution. Unfortunately, we cannot use Lucene's pagination function, because in reality the user can skip pages to start the search at any point, not just from the end of the previous search. Even the first search (without any pagination), with a max of 1000 hits, tak

Re: search performance

2014-06-03 Thread Vitaly Funstein
Something doesn't quite add up. TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, > false, false, true); > > We use pagination, so only returning 1000 documents or so at a time. > > You say you are using pagination, yet the API you are using to create your collector isn't

Re: search performance

2014-06-03 Thread Jamie
Toke Thanks for the contact. See below: On 2014/06/03, 9:17 AM, Toke Eskildsen wrote: On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different sy

Re: search performance

2014-06-03 Thread Toke Eskildsen
On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote: > Unfortunately, in this instance, it is a live production system, so we > cannot conduct experiments. The number is definitely accurate. > > We have many different systems with a similar load that observe the same > performance issue. To my knowle

Re: search performance

2014-06-02 Thread Christoph Kaser
Can you take thread stacktraces (repeatedly) during those 5 minute searches? That might give you (or someone on the mailing list) a clue where all that time is spent. You could try using jstack for that: http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html Regards Christoph

Re: search performance

2014-06-02 Thread Jamie
Toke Thanks for the comment. Unfortunately, in this instance, it is a live production system, so we cannot conduct experiments. The number is definitely accurate. We have many different systems with a similar load that observe the same performance issue. To my knowledge, the Lucene integrati

Re: search performance

2014-06-02 Thread Toke Eskildsen
On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote: [200GB, 150M documents] > With NRT enabled, search speed is roughly 5 minutes on average. > The server resources are: > 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. 5 minutes is extremely long. Is that really the right number

Re: search performance

2014-06-02 Thread Tri Cao
This is an interesting performance problem and I think there is probably not a single answer here, so I'll just layout the steps I would take to tackle this: 1. What is the variance of the query latency? You said the average is 5 minutes, but is it due to some really bad queries or most queries h

Re: search performance

2014-06-02 Thread Jamie
I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's u

Re: search performance

2014-06-02 Thread Tincu Gabriel
My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd sti

Re: search performance

2014-06-02 Thread Jamie
I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that

Re: search performance

2014-06-02 Thread Tincu Gabriel
MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM

Re: search performance

2014-06-02 Thread Jamie
Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are t

Re: search performance

2014-06-02 Thread Jack Krupansky
Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB

Re: search performance

2014-06-02 Thread Jamie
Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08 deliver

Re: search performance

2014-06-02 Thread Tincu Gabriel
What kind of queries are you pushing into the index. Do they match a lot of documents ? Do you do any sorting on the result set? What is the average document size ? Do you have a lot of update traffic ? What kind of schema does your index use ? On Mon, Jun 2, 2014 at 6:51 AM, Jamie wrote: > Gre

RE: search time & number of segments

2014-05-21 Thread De Simone, Alessandro
i 2014 22:09 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have stopped optimizing the index because everybody told us it was a bad > idea. > It makes sense if you think about it. When

RE: search time & number of segments

2014-05-20 Thread Toke Eskildsen
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have stopped optimizing the index because everybody told us it was a bad > idea. > It makes sense if you think about it. When you reopen the index not all > segments must be reopened then you have: > (1) better reload time >

RE: search time & number of segments

2014-05-20 Thread De Simone, Alessandro
ig impact on performance. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: mardi 20 mai 2014 15:46 To: java-user@lucene.apache.org Subject: Re: search time & number of segments On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote: Tok

Re: search time & number of segments

2014-05-20 Thread Toke Eskildsen
On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote: Toke: > > Using the calculator, I must admit that it is puzzling that you have > 2432 / 143 = 17.001 times the amount of seeks with 16 segments. > > Do you have any clue? Is there something I could test? If your segmented index was

RE: search time & number of segments

2014-05-20 Thread De Simone, Alessandro
iginal Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: lundi 19 mai 2014 16:43 To: java-user@lucene.apache.org Subject: Re: search time & number of segments On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote: [24GB index, 8GB disk cache, only indexed fields] &

Re: search time & number of segments

2014-05-19 Thread Toke Eskildsen
On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote: [24GB index, 8GB disk cache, only indexed fields] > The "IO calls" I was referring to is the number of time the > "BufferedIndexInput.refill()" function is called. So it means that we > have 16 times more bytes read when there are 16

RE: search time & number of segments

2014-05-19 Thread De Simone, Alessandro
iginal Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: samedi 17 mai 2014 20:04 To: java-user@lucene.apache.org Subject: RE: search time & number of segments De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimiz

RE: search time & number of segments

2014-05-17 Thread Toke Eskildsen
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote: > We have a performance issue ever since we stopped optimizing the index. We > are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on > Windows 2008R2. How much RAM does your search machine have? > For instance, a s

Re: Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Ian Lea
If you're using Solr you'd be better off asking this on the Solr list: http://lucene.apache.org/solr/discussion.html. You might also like to clarify what you want with regard to sentence vs document. If you want to display the sentences of a matched doc, surely you just do it: store what you need

Re: Search in a specific ScoreDiopoc result

2013-09-17 Thread Thomas Guttesen
Kkkutterujjjbbb hgggja Den 17/09/2013 12.55 skrev "David Miranda" : > > Hi, > > I want to do a kind of 'facet search', that initial research in a field of > all documents in the Lucene index, and second search in other field of the > documents returned to the first research. > > Currently I'm do th

Re: Search in a specific ScoreDoc result

2013-09-17 Thread Erick Erickson
Why not? You can use a standard query as a filter query from the Solr side, so it's got to be possible in Lucene. What about using filters doesn't seem to work for this case? Best, Erick On Tue, Sep 17, 2013 at 6:54 AM, David Miranda wrote: > Hi, > > I want to do a kind of 'facet search', that

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-28 Thread Ankit Murarka
Bingo..!!..Your solution worked for me. Thanks a ton. I went through queryparser so many number of times never knew it can server the purpose so easily. Never figured out the true significance as I thought I can always create a normal PhraseQuery with PhraseQuery pq=new PhraseQuery() and then

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-27 Thread Michael McCandless
On Sat, Jul 27, 2013 at 3:20 AM, Ankit Murarka wrote: > Ok.I went through the Javadoc of PhraseQuery and tried using position > argument to phrasequery. > > Problem encountered: > > My text contains : Still it is not happening and generally i will be able to > complete it at the earliest. > > The

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-27 Thread Ankit Murarka
Ok.I went through the Javadoc of PhraseQuery and tried using position argument to phrasequery. Problem encountered: My text contains : Still it is not happening and generally i will be able to complete it at the earliest. The user enters search string : 1. still happening and 2. still it is

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-26 Thread Michael McCandless
Have a look at the position argument to PhraseQuery.add: it lets you control where this new term is in the phrase. So to search for "wizard of oz" when of is a stopword you would add "wizard" at position 0 and "oz" at position 2. This is different from slop, which allows for "fuzzy" matching of t

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-26 Thread Ankit Murarka
Hello can you elaborate more on this.. I seem to be lost over here.. Since I am new to lucene, so yesterday I was going through ShingleFilter and its application. Seems like its a kind of a N-Gram thing and it bloats the index as Mike have mentioned. As of now I am only concerned with the app

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless
With PhraseQuery you can specify where each term must occur in the phrase. So X must occur in position 0, David in position 1, and then manager in position 4 (skipping 2 holes). QueryParser does this for you: when it analyzes the users phrase, if the resulting tokens have holes, then it sets the

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Dawn Zoë Raison
Did you consider using shingles? It solves the "to be or not to be" problem quite nicely. Dawn On 24/07/2013 12:34, Ankit Murarka wrote: I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The w

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Ankit Murarka
I tried using Phrase Query with slops. Now since I am specifying the slop I also need to specify the 2nd term. In my case the 2nd term is not present. The whole string to be searched is still 1 single term. How do I skip the holes created by stopwords. I do not know before hand how many stop

Re: Search a Part of the Sentence/Complete sentence in lucene 4.3

2013-07-24 Thread Michael McCandless
PhraseQuery? You can skip the holes created by stopwords ... e.g. QueryParser does this. Ie, the PhraseQuery becomes "X David _ _ manager _ _ company" if is/a/of/the are stop words, which isn't perfect (could return false matches) but should work well in practice ... Mike McCandless http://blog

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
IIRC, SpanQueries try and match on the smallest interval possible. So if you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 09:56, Sébastien Druon wrote: > Thanks Alan, > > Do you know if the search would exclude other

Re: Search for a token appearing after another

2013-07-09 Thread Sébastien Druon
Thanks Alan, Do you know if the search would exclude other occurences of T1 between T1 and T2? ex: T1 (...)* T1 (...)* T2 would not match? Thanks again Sébastien On 9 July 2013 09:48, Alan Woodward wrote: > You can use Integer.MAX_VALUE as the slop parameter. > > Alan Woodward > www.flax.co

Re: Search for a token appearing after another

2013-07-09 Thread Alan Woodward
You can use Integer.MAX_VALUE as the slop parameter. Alan Woodward www.flax.co.uk On 9 Jul 2013, at 07:55, Sébastien Druon wrote: > Hello, > > I am looking for a way to search for a token appearing after another and > retrieve tehir positions. > > ex: T1 (...)* T2 > > I know the SpanTermQuer

RE: search-time facetting in Lucene

2013-05-06 Thread Toke Eskildsen
kiwi clive [kiwi_cl...@yahoo.com]: > Thanks very much for the reply. I see there is not a quick win here but as > we are going through an index consolidation process, it may pay to make > the leap to 4.3 and put in facetting while I'm in there. We will get facetting > slowly through the back door w

Re: search-time facetting in Lucene

2013-05-06 Thread Shai Erera
he time to explain the situation. > > Clive > > > > > From: Shai Erera > To: "java-user@lucene.apache.org" ; kiwi > clive > Sent: Monday, May 6, 2013 5:56 AM > Subject: Re: search-time facetting in Lucene > > > Hi Clive, > &

Re: search-time facetting in Lucene

2013-05-06 Thread kiwi clive
i clive Sent: Monday, May 6, 2013 5:56 AM Subject: Re: search-time facetting in Lucene Hi Clive, In order to use Lucene facets you need to make indexing time decisions. It's not that you don't make these decisions anyway, even with Solr -- for example, you need to decide how to tokeniz

  1   2   3   4   5   6   7   8   >