I have added a QnA
https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-DoesLucenesupportauto-suggest/autocomplete?
I will also try to provide an example, for example
https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36
https://
Am 08.10.21 um 18:49 schrieb Michael Sokolov:
Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?
yes :-)
Please go ahead and I think add a link to the suggest module javadocs
ok, will do!
Thanks
M
Thank you for offering to add to the FAQ! Indeed it should mention the
suggester capability. I think you have permissions to edit that wiki?
Please go ahead and I think add a link to the suggest module javadocs
On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner
wrote:
>
> Thanks very much for your fe
Thanks very much for your feedback!
I will try it :-)
As I wrote I would like to add a summary to the Lucene FAQ
(https://cwiki.apache.org/confluence/display/lucene/lucenefaq)
Would the following questions make sense?
- "Does Lucene support incremental search?"
- "Does Lucene supp
TLDR: use the lucene suggest/ package. Start with building suggester
from your query logs (either a file or index them).
These have a lot of flexibility about how the matches happen, for
example pure prefixes, edit distance typos, infix matching, analysis
chain, even now Japanese input-method integ
Ceccarelli
Subject: Re: Search results/criteria validation
See https://issues.apache.org/jira/browse/LUCENE-9640
On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht
wrote:
>
> Explain is a heavyweight thing. Maybe it helps you, maybe you need
> something high-performance.
>
> I was a
Thanks for the response Paul, it would be great if you can point me to that
discussion.
--
Regards
-Siraj Haider
(212) 306-0154
-Original Message-
From: Paul Libbrecht
Sent: Wednesday, March 17, 2021 4:02 PM
To: java-user@lucene.apache.org; Diego Ceccarelli
Subject: Re: Search
See https://issues.apache.org/jira/browse/LUCENE-9640
On Wed, Mar 17, 2021 at 4:02 PM Paul Libbrecht
wrote:
>
> Explain is a heavyweight thing. Maybe it helps you, maybe you need
> something high-performance.
>
> I was asking a similar question ~10 years ago and got a very interesting
> answer on
Explain is a heavyweight thing. Maybe it helps you, maybe you need
something high-performance.
I was asking a similar question ~10 years ago and got a very interesting
answer on this list. If you want I can try to dig this to find it. At
the end, and with some limitation in the number of queri
August 1, 2018 2:35 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search in lines, so need to index lines?
>
> Ira,
>
> I do not understand your requirements, but essentially lucene is not for
> regex searching.
> There are tools for fast regular expression search, if you do
:35 PM
To: java-user@lucene.apache.org
Subject: Re: Search in lines, so need to index lines?
Ira,
I do not understand your requirements, but essentially lucene is not for
regex searching.
There are tools for fast regular expression search, if you do not satisfy
with java standard library, for
>
> > Hi Tomoko,
> >
> > I need to search in many files and we use Lucene for this purpose.
> >
> > Thanks,
> > Ira
> >
> > -Original Message-
> > From: Tomoko Uchida
> > Sent: Wednesday, August 1, 2018 1:49 PM
> > To: jav
.
Tomoko
2018年8月1日(水) 20:01 Gordin, Ira :
> Hi Tomoko,
>
> I need to search in many files and we use Lucene for this purpose.
>
> Thanks,
> Ira
>
> -Original Message-
> From: Tomoko Uchida
> Sent: Wednesday, August 1, 2018 1:49 PM
> To: java-user@lucene.ap
Wednesday, August 1, 2018 1:49 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search in lines, so need to index lines?
>
> Hi Ira,
>
>> I am trying to implement regex search in file
>
> Why are you using Lucene for regular expression search?
> You can impleme
Hi Tomoko,
I need to search in many files and we use Lucene for this purpose.
Thanks,
Ira
-Original Message-
From: Tomoko Uchida
Sent: Wednesday, August 1, 2018 1:49 PM
To: java-user@lucene.apache.org
Subject: Re: Search in lines, so need to index lines?
Hi Ira,
> I am trying
ile the same as in editors, in
> Notepad++ for example.
>
> Thanks,
> Ira
>
> -Original Message-
> From: Uwe Schindler
> Sent: Tuesday, July 31, 2018 6:12 PM
> To: java-user@lucene.apache.org
> Subject: RE: Search in lines, so need to index lines?
>
> Hi
Hi Uwe,
I am trying to implement regex search in file the same as in editors, in
Notepad++ for example.
Thanks,
Ira
-Original Message-
From: Uwe Schindler
Sent: Tuesday, July 31, 2018 6:12 PM
To: java-user@lucene.apache.org
Subject: RE: Search in lines, so need to index lines?
Hi
Hi,
you need to create your own tokenizer that splits tokens on \n or \r. Instead
of using WhitespaceTokenizer, you can use:
Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r' ||
ch=='\n');
But I would first think of how to implement the whole thing correctly. Using a
re
You can add a new field called "full_text" and during the indexing time you
concatenate all the values of the other fields in it.
Do you think it's a good idea for this case?
On Fri, Mar 17, 2017 at 6:27 PM Lokesh Madan wrote:
> May be, index the field names as metadata file. When when queryin
May be, index the field names as metadata file. When when querying, first get
list of alls fields and then shoot a query. You can do this 2 hop query, or
else maintain some cache and then shoot a query.
> On Mar 17, 2017, at 11:53 AM, Cristian Lorenzetto
> wrote:
>
> It permits to search in a
Hi,
You can retrieve the list of field names using LukeRequestHandler.
Ahmet
On Friday, March 17, 2017 9:53 PM, Cristian Lorenzetto
wrote:
It permits to search in a predefined lists of fields that you have to know
in advance. In my case i dont know what is the fieldname.
maybe WildcardQuer
Hi, I am not sure if there is a way to specify a search against all fields
in the index without knowing the fields.
WildcardQuery probably won't work since it does target a specific field
within the index. The specification of the index field comes in the
definition of the Term that is passed as
It permits to search in a predefined lists of fields that you have to know
in advance. In my case i dont know what is the fieldname.
maybe WildcardQuery?
2017-03-17 19:30 GMT+01:00 Corbin, J.D. :
> You might take a look at MultiFieldQueryParser. I believe it allows you
> to search multiple inde
You might take a look at MultiFieldQueryParser. I believe it allows you
to search multiple index fields at the same time.
J.D. Corbin
Senior Research Engineer
Advanced Computing & Data Science Lab
3075 W. Ray Road
Suite 200
Chandler, AZ 85226-2495
USA
M: (303) 912-0958
E: jd.cor...@pears
Hi Mike,
Thanks for the very prompt and clear response. We look forward to using the new
(new for us) Lucenene goodies :-)
Clive
From: Michael McCandless
To: Lucene Users ; kiwi clive
Sent: Thursday, May 28, 2015 2:34 AM
Subject: Re: Search Performance with NRT
As long as you
As long as you call SM.maybeRefresh from a dedicated refresh thread
(not from a query's thread) it will work well.
You may want to use a warmer so that the new searcher is warmed before
becoming visible to incoming queries ... this ensures any lazy data
structures are initialized by the time a que
thanks a lot for your answer
i find the second way may be useful to me. it is new to me and i will
try it.
thanks
andrew
在 2015/2/11 21:39, Ian Lea 写道:
If you only ever want to retrieve based on exact match you could index
the name field using org.apache.lucene.document.StringField. Do be
a
If you only ever want to retrieve based on exact match you could index
the name field using org.apache.lucene.document.StringField. Do be
aware that it is exact: if you do nothing else, a search for "a" will
not match "A" or "A ".
Or you could so something with start and end markers e.g. index yo
> You should ask this on the elasticsearch mailing list.
>
> BTW, look at elasticsearch copy_to feature. Better than _all field.
>
> My 2 cents.
I will try it. thanks.
>
>
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> > Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo"
Hi,
> 'Internally this is indexing every field a second time into the "_all"
> field.'
> This sentence mean second indexing has total different analyzer and
> indexing compared with my first indexing?
Exactly.
> So I need rewrite the second
> process to fix my problem?
In Elasticsearc
You should ask this on the elasticsearch mailing list.
BTW, look at elasticsearch copy_to feature. Better than _all field.
My 2 cents.
--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
> Le 11 oct. 2014 à 11:31, "haiwei.xie-soulinfo" a
> écrit :
>
> Hi,
>
> Thanks for y
Hi,
Thanks for your advise, SimpleQueryParser api is not enough in my case.
Actually, I want to index data from database, there are so many fields, I have
tested "_all" parameter in ElasticSearch system, but the result of '_all'
and 'fieldname' are different for chinese term.
'Interna
Hi,
by default there is no "_all" field. E.g., Elasticsearch adds this special
field depending on your index mapping at the time of indexing the data.
Internally this is indexing every field a second time into the "_all" field.
With Lucene you have to do this on yourself. An alternative would
Hi Mike and Uwe,
Thank you for your answers. It is clear, now.
Regards,
Aurélien
On 10.10.2014 12:32, Uwe Schindler wrote:
Hi,
every segment is executed on its own (every segment is its own index).
Every segment returns its own document ids and the result is the union
of them ranked by score
Hi,
every segment is executed on its own (every segment is its own index). Every
segment returns its own document ids and the result is the union of them ranked
by score using a PriorityQueue. There is no cross-segment term dictionary and
posting lists in Lucene. It was like that before Lucene
By intersection, do you mean a MUST clause on a BooleanQuery?
Lucene uses "doc at a time" scoring, so for BooleanQuery, all MUST'd
clauses are visiting the same doc (if they match) at a time, so we do
the intersection for that document all at once, within each segment,
across the N clauses.
Mike
If you are using stored fields in your index, consider playing with
compression settings, or perhaps turning stored field compression off
altogether. Ways to do this have been discussed in this forum on numerous
occasions. This is highly use case dependent though, as your indexing
performance may o
Hi,
> Am I correct that using SearchManager can't be used with a MultiReader and
> NRT? I would appreciate all suggestions on how to optimize our search
> performance further. Search time has become a usability issue.
Just have a SearcherManger for every index. MultiReader construction is cheap
Greetings Lucene Users
As a follow-up to my earlier mail:
We are also using Lucene segment warmers, as per recommendation,
segments per tier is now set to five, buffer memory is set to
(Runtime.getRuntime().totalMemory()*.08)/1024/1024;
See below for code used to instantiate writer:
Hi All
Thank you for all your suggestions. Some of the recommendations hadn't
yet been implemented, as our code base was using older versions of
Lucene with reduced capabilities. Thus, far, all the recommendations
for fast search have been implemented (e.g. using pagination with
searchAfter,
Hi!
We have switched from Lucene 3.6 to >=Lucene 4.7 (java7) and we are also
experiencing a distinct slowdown using the same dataset. We are running the
software under Windows 2008R2.
In our case, we have identified that there a lot more IO calls (= number of
time the buffer is refilled in Ind
Jon
I ended up adapting your approach. The solution involves keeping a LRU
cache of page boundary scoredocs and their respective positions. New
positions are added to the cache as new pages are discovered. To cut
down on searches, when scrolling backwards and forwards, the search
begins from
Jamie [ja...@mailarchiva.com] wrote:
> It would be nice if, in future, the Lucene API could provide a
> searchAfter that takes a position (int).
It would not really help with large result sets. At least not with the current
underlying implementations. This is tied into your current performance pr
Thanks Jon
I'll investigate your idea further.
It would be nice if, in future, the Lucene API could provide a
searchAfter that takes a position (int).
Regards
Jamie
On 2014/06/03, 3:24 PM, Jon Stewart wrote:
With regards to pagination, is there a way for you to cache the
IndexSearcher, Que
With regards to pagination, is there a way for you to cache the
IndexSearcher, Query, and TopDocs between user pagination requests (a
lot of webapp frameworks have object caching mechanisms)? If so, you
may have luck with code like this:
void ensureTopDocs(final int rank) throws IOException {
Robert. Thanks, I've already done a similar thing. Results on my test
platform are encouraging..
On 2014/06/03, 2:41 PM, Robert Muir wrote:
Reopening for every search is not a good idea. this will have an
extremely high cost (not as high as what you are doing with "paging"
but still not good).
Reopening for every search is not a good idea. this will have an
extremely high cost (not as high as what you are doing with "paging"
but still not good).
Instead consider making it near-realtime, by doing this every second
or so instead. Look at SearcherManager for code that helps you do
this.
O
Robert
FYI: I've modified the code to utilize the experimental function..
DirectoryReader dirReader =
DirectoryReader.openIfChanged(cachedDirectoryReader,writer, true);
In this case, the IndexReader won't be opened on each search, unless
absolutely necessary.
Regards
Jamie
On 2014/06
Robert
Hmmm. why did Mike go to all the trouble of implementing NRT search,
if we are not supposed to be using it?
The user simply wants the latest result set. To me, this doesn't appear
out of scope for the Lucene project.
Jamie
On 2014/06/03, 1:17 PM, Robert Muir wrote:
No, you are
No, you are incorrect. The point of a search engine is to return top-N
most relevant.
If you insist you need to open an indexreader on every single search,
and then return huge amounts of docs, maybe you should use a database
instead.
On Tue, Jun 3, 2014 at 6:42 AM, Jamie wrote:
> Vitality / Rob
Vitality / Robert
I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes.
Unless I am mistaken, the Lucene library's pagination mechanism, makes
the assumption that you will cache the scoredocs for the entire result
set. This is not practical when you have a result set that e
Jamie,
What if you were to forget for a moment the whole pagination idea, and
always capped your search at 1000 results for testing purposes only? This
is just to try and pinpoint the bottleneck here; if, regardless of the
query parameters, the search latency stays roughly the same and well below
Check and make sure you are not opening an indexreader for every
search. Be sure you don't do that.
On Mon, Jun 2, 2014 at 2:51 AM, Jamie wrote:
> Greetings
>
> Despite following all the recommended optimizations (as described at
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in so
Vitaly
See below:
On 2014/06/03, 12:09 PM, Vitaly Funstein wrote:
A couple of questions.
1. What are you trying to achieve by setting the current thread's priority
to max possible value? Is it grabbing as much CPU time as possible? In my
experience, mucking with thread priorities like this is
A couple of questions.
1. What are you trying to achieve by setting the current thread's priority
to max possible value? Is it grabbing as much CPU time as possible? In my
experience, mucking with thread priorities like this is at best futile, and
at worst quite detrimental to responsiveness and o
FYI: We are also using a multireader to search over multiple index readers.
Search under a million documents yields good response times. When you
get into the 60M territory, search slows to a crawl.
On 2014/06/03, 11:47 AM, Jamie wrote:
Sure... see below:
--
Sure... see below:
protected void search(Query query, Filter queryFilter, Sort sort)
throws BlobSearchException {
try {
logger.debug("start search {searchquery='" +
getSearchQuery() +
"',query='"+query.toString()+"',filterQuery='"+queryFilter+"',sort='"+sort
Hi Jamie,
What is included in the 5 minutes?
Just the call to the searcher?
seacher.search(...) ?
Can you show a bit more of the code you use?
On Tue, Jun 3, 2014 at 11:32 AM, Jamie wrote:
> Vitaly
>
> Thanks for the contribution. Unfortunately, we cannot use Lucene's
> pagination function
Vitaly
Thanks for the contribution. Unfortunately, we cannot use Lucene's
pagination function, because in reality the user can skip pages to start
the search at any point, not just from the end of the previous search.
Even the
first search (without any pagination), with a max of 1000 hits, tak
Something doesn't quite add up.
TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true,
> false, false, true);
>
> We use pagination, so only returning 1000 documents or so at a time.
>
>
You say you are using pagination, yet the API you are using to create your
collector isn't
Toke
Thanks for the contact. See below:
On 2014/06/03, 9:17 AM, Toke Eskildsen wrote:
On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote:
Unfortunately, in this instance, it is a live production system, so we
cannot conduct experiments. The number is definitely accurate.
We have many different sy
On Tue, 2014-06-03 at 08:17 +0200, Jamie wrote:
> Unfortunately, in this instance, it is a live production system, so we
> cannot conduct experiments. The number is definitely accurate.
>
> We have many different systems with a similar load that observe the same
> performance issue. To my knowle
Can you take thread stacktraces (repeatedly) during those 5 minute
searches? That might give you (or someone on the mailing list) a clue
where all that time is spent.
You could try using jstack for that:
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html
Regards
Christoph
Toke
Thanks for the comment.
Unfortunately, in this instance, it is a live production system, so we
cannot conduct experiments. The number is definitely accurate.
We have many different systems with a similar load that observe the same
performance issue. To my knowledge, the Lucene integrati
On Mon, 2014-06-02 at 08:51 +0200, Jamie wrote:
[200GB, 150M documents]
> With NRT enabled, search speed is roughly 5 minutes on average.
> The server resources are:
> 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux.
5 minutes is extremely long. Is that really the right number
This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:
1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries h
I assume you meant 1000 documents. Yes, the page size is in fact
configurable. However, it only obtains the page size * 3. It preloads
the following and previous page too. The point is, it only obtains the
documents that are needed.
On 2014/06/02, 3:03 PM, Tincu Gabriel wrote:
My bad, It's u
My bad, It's using the RamDirectory as a cache and a delegate directory
that you pass in the constructor to do the disk operations, limiting the
use of the RamDirectory to files that fit a certain size. So i guess the
underlying Directory implementation will be whatever you choose it to be.
I'd sti
I was under the impression that NRTCachingDirectory will instantiate an
MMapDirectory if a 64 bit platform is detected? Is this not the case?
On 2014/06/02, 2:09 PM, Tincu Gabriel wrote:
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that
MMapDirectory will do the job for you. RamDirectory has a big warning in
the class description stating that the performance will get killed by an
index larger than a few hundred MB, and NRTCachingDirectory is a wrapper
for RamDirectory and suitable for low update rates. MMap will use the
system RAM
Jack
First off, thanks for applying your mind to our performance problem.
On 2014/06/02, 1:34 PM, Jack Krupansky wrote:
Do you have enough system memory to fit the entire index in OS system
memory so that the OS can fully cache it instead of thrashing with
I/O? Do you see a lot of I/O or are t
Do you have enough system memory to fit the entire index in OS system memory
so that the OS can fully cache it instead of thrashing with I/O? Do you see
a lot of I/O or are the queries compute-bound?
You said you have a 128GB machine, so that sounds small for your index. Have
you tried a 256GB
Tom
Thanks for the offer of assistance.
On 2014/06/02, 12:02 PM, Tincu Gabriel wrote:
What kind of queries are you pushing into the index.
We are indexing regular emails + attachments.
Typical query is something like:
filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08
deliver
What kind of queries are you pushing into the index. Do they match a lot of
documents ? Do you do any sorting on the result set? What is the average
document size ? Do you have a lot of update traffic ? What kind of schema
does your index use ?
On Mon, Jun 2, 2014 at 6:51 AM, Jamie wrote:
> Gre
i 2014 22:09
To: java-user@lucene.apache.org
Subject: RE: search time & number of segments
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote:
> We have stopped optimizing the index because everybody told us it was a bad
> idea.
> It makes sense if you think about it. When
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote:
> We have stopped optimizing the index because everybody told us it was a bad
> idea.
> It makes sense if you think about it. When you reopen the index not all
> segments must be reopened then you have:
> (1) better reload time
>
ig impact on
performance.
-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: mardi 20 mai 2014 15:46
To: java-user@lucene.apache.org
Subject: Re: search time & number of segments
On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote:
Tok
On Tue, 2014-05-20 at 15:04 +0200, De Simone, Alessandro wrote:
Toke:
> > Using the calculator, I must admit that it is puzzling that you have
> 2432 / 143 = 17.001 times the amount of seeks with 16 segments.
>
> Do you have any clue? Is there something I could test?
If your segmented index was
iginal Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: lundi 19 mai 2014 16:43
To: java-user@lucene.apache.org
Subject: Re: search time & number of segments
On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote:
[24GB index, 8GB disk cache, only indexed fields]
&
On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote:
[24GB index, 8GB disk cache, only indexed fields]
> The "IO calls" I was referring to is the number of time the
> "BufferedIndexInput.refill()" function is called. So it means that we
> have 16 times more bytes read when there are 16
iginal Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: samedi 17 mai 2014 20:04
To: java-user@lucene.apache.org
Subject: RE: search time & number of segments
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote:
> We have a performance issue ever since we stopped optimiz
De Simone, Alessandro [alessandro.desim...@bvdinfo.com] wrote:
> We have a performance issue ever since we stopped optimizing the index. We
> are using Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on
> Windows 2008R2.
How much RAM does your search machine have?
> For instance, a s
If you're using Solr you'd be better off asking this on the Solr list:
http://lucene.apache.org/solr/discussion.html.
You might also like to clarify what you want with regard to sentence
vs document. If you want to display the sentences of a matched doc,
surely you just do it: store what you need
Kkkutterujjjbbb hgggja
Den 17/09/2013 12.55 skrev "David Miranda" :
>
> Hi,
>
> I want to do a kind of 'facet search', that initial research in a field of
> all documents in the Lucene index, and second search in other field of the
> documents returned to the first research.
>
> Currently I'm do th
Why not? You can use a standard query as a filter query
from the Solr side, so it's got to be possible in Lucene.
What about using filters doesn't seem to work for this case?
Best,
Erick
On Tue, Sep 17, 2013 at 6:54 AM, David Miranda wrote:
> Hi,
>
> I want to do a kind of 'facet search', that
Bingo..!!..Your solution worked for me.
Thanks a ton. I went through queryparser so many number of times never
knew it can server the purpose so easily.
Never figured out the true significance as I thought I can always create
a normal PhraseQuery with PhraseQuery pq=new PhraseQuery() and then
On Sat, Jul 27, 2013 at 3:20 AM, Ankit Murarka
wrote:
> Ok.I went through the Javadoc of PhraseQuery and tried using position
> argument to phrasequery.
>
> Problem encountered:
>
> My text contains : Still it is not happening and generally i will be able to
> complete it at the earliest.
>
> The
Ok.I went through the Javadoc of PhraseQuery and tried using position
argument to phrasequery.
Problem encountered:
My text contains : Still it is not happening and generally i will be
able to complete it at the earliest.
The user enters search string : 1. still happening and 2. still it is
Have a look at the position argument to PhraseQuery.add: it lets you
control where this new term is in the phrase.
So to search for "wizard of oz" when of is a stopword you would add
"wizard" at position 0 and "oz" at position 2.
This is different from slop, which allows for "fuzzy" matching of t
Hello can you elaborate more on this.. I seem to be lost over here..
Since I am new to lucene, so yesterday I was going through ShingleFilter
and its application. Seems like its a kind of a N-Gram thing and it
bloats the index as Mike have mentioned.
As of now I am only concerned with the app
With PhraseQuery you can specify where each term must occur in the phrase.
So X must occur in position 0, David in position 1, and then manager
in position 4 (skipping 2 holes).
QueryParser does this for you: when it analyzes the users phrase, if
the resulting tokens have holes, then it sets the
Did you consider using shingles?
It solves the "to be or not to be" problem quite nicely.
Dawn
On 24/07/2013 12:34, Ankit Murarka wrote:
I tried using Phrase Query with slops. Now since I am specifying the
slop I also need to specify the 2nd term.
In my case the 2nd term is not present. The w
I tried using Phrase Query with slops. Now since I am specifying the
slop I also need to specify the 2nd term.
In my case the 2nd term is not present. The whole string to be searched
is still 1 single term.
How do I skip the holes created by stopwords. I do not know before hand
how many stop
PhraseQuery?
You can skip the holes created by stopwords ... e.g. QueryParser does
this. Ie, the PhraseQuery becomes "X David _ _ manager _ _ company"
if is/a/of/the are stop words, which isn't perfect (could return false
matches) but should work well in practice ...
Mike McCandless
http://blog
IIRC, SpanQueries try and match on the smallest interval possible. So if
you've got T1 … T1 … T2, then SpanNear(T1, T2) will match from the second T1.
Alan Woodward
www.flax.co.uk
On 9 Jul 2013, at 09:56, Sébastien Druon wrote:
> Thanks Alan,
>
> Do you know if the search would exclude other
Thanks Alan,
Do you know if the search would exclude other occurences of T1 between T1
and T2?
ex: T1 (...)* T1 (...)* T2 would not match?
Thanks again
Sébastien
On 9 July 2013 09:48, Alan Woodward wrote:
> You can use Integer.MAX_VALUE as the slop parameter.
>
> Alan Woodward
> www.flax.co
You can use Integer.MAX_VALUE as the slop parameter.
Alan Woodward
www.flax.co.uk
On 9 Jul 2013, at 07:55, Sébastien Druon wrote:
> Hello,
>
> I am looking for a way to search for a token appearing after another and
> retrieve tehir positions.
>
> ex: T1 (...)* T2
>
> I know the SpanTermQuer
kiwi clive [kiwi_cl...@yahoo.com]:
> Thanks very much for the reply. I see there is not a quick win here but as
> we are going through an index consolidation process, it may pay to make
> the leap to 4.3 and put in facetting while I'm in there. We will get facetting
> slowly through the back door w
he time to explain the situation.
>
> Clive
>
>
>
>
> From: Shai Erera
> To: "java-user@lucene.apache.org" ; kiwi
> clive
> Sent: Monday, May 6, 2013 5:56 AM
> Subject: Re: search-time facetting in Lucene
>
>
> Hi Clive,
>
&
i clive
Sent: Monday, May 6, 2013 5:56 AM
Subject: Re: search-time facetting in Lucene
Hi Clive,
In order to use Lucene facets you need to make indexing time decisions.
It's not that you don't make these decisions anyway, even with Solr -- for
example, you need to decide how to tokeniz
1 - 100 of 722 matches
Mail list logo