Can the BooleanQuery execution be optimized with same term queries?

2023-09-18 Thread YouPeng Yang
Hi All During my unemployment time ,the happiest thing is diving to study the Lucene Source Code ,thanks for all the work . About the BooleanQuery.I am encounterd by a question about the execution of BooleanQuery:although,BooleanQuery#rewrite has done some works to remove duplicate FILTER,

Question about threading in search

2017-09-03 Thread Peilin Yang
I was wondering if anyone can shed some light on an issue we're having: we're comparing two different indexes on the same collection - one with lots of different segments (default settings), and one with a force merged into one segment. It seems that search is sometimes faster with multiple segmen

Re: Upgrading from lucene 2.9.1 to 5.X

2015-07-17 Thread Shuangyang Yang
Erick, Thank you very much. Is there any performance comparison between these two versions? Is there any data on the performance? Thank you very much Best Regards --- Shuangyang Yang linkedin.com/in/everyoung On 7/17/15, 10:06 AM, "Erick Erickson" wrote: >Please look at th

Re: Upgrading from lucene 2.9.1 to 5.X

2015-07-17 Thread Shuangyang Yang
Erick, Amish, Thank you very much for your reply. They are really helpful. Can you tell me what are the compelling features from 2.9.1 to 5.X? I¹m sure there are many. Thank you very much Best Regards --- Shuangyang Yang linkedin.com/in/everyoung On 7/16/15, 8:41 PM, "Erick Eri

Upgrading from lucene 2.9.1 to 5.X

2015-07-16 Thread Shuangyang Yang
Hi there, We are using lucene 2.9.1 in our product and thinking of upgrading to 5.X. Is it plausible? Is the file format compatible? Is there any guide document that can give tips on how to do it? Thank you very much Best Regards --- Shuangyang Yang linkedin.com/in/everyoung

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
(num, DateTools.Resolution.SECOND)); Then you get dt as a string in the right format. Ming- On Fri, Jun 14, 2013 at 4:24 PM, Mingfeng Yang wrote: > I did System.println(d.get('date')), and the output is > "stored,binary,omitNorms,indexOptions=DOCS_ONLY" > > Emmm

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
I did System.println(d.get('date')), and the output is "stored,binary,omitNorms,indexOptions=DOCS_ONLY" Emmm. On Fri, Jun 14, 2013 at 4:05 PM, Chris Hostetter wrote: > > : I used solr to query the index, and verified that each document does > have a > : non-blank date field. I suspect that i

Re: Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
Hoss, I did in two ways. The first is the 1) in your list, q=date:* match q=*:*. And all fields are stored in the index. I got a doc id (say 3315), do q=id:3315, the output contain the datefield and value. Anyway, I am 100% sure every doc has a date field and value indexed and stored there.

Read an solr index with two different lucene formats

2013-06-14 Thread Mingfeng Yang
I have a solr index built with solr 1.4 a few years ago, and later upgraded to solr 3.6, and now the index is consisting of 150 million documents. Now I want to read all values of a DateField from the index. But it turns out that for nearly 100 million documents, document.get('date') return null

Re: OutOfMemoryError when opening the index ?

2012-06-13 Thread Yang
ok, found it: we are using Cloudera CDHu3u, they change the ulimit for child jobs. but I still don't know how to change their default settings yet On Wed, Jun 13, 2012 at 2:15 PM, Yang wrote: > I got the OutOfMemoryError when I tried to open an Lucene index. > > it's very

Re: lucene (search) performance tuning

2012-05-26 Thread Yang
I'm using disjunction (OR) query. unfortunately all of the clauses are optional On Sat, May 26, 2012 at 4:38 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > On Sat, May 26, 2012 at 2:59 AM, Yang wrote: > > I tested with more threads / processes. indeed this i

Re: lucene (search) performance tuning

2012-05-25 Thread Yang
, so the search quality is not a huge issue here) , so that, for example, fewer fields are evaluated or a simpler scoring function is used? thanks Yang On Fri, May 25, 2012 at 5:47 PM, Yang wrote: > thanks a lot guys > > > On Tue, May 22, 2012 at 1:34 AM, Ian Lea wrote: > &

Re: lucene (search) performance tuning

2012-05-25 Thread Yang
ributed searching. > > if the cpu is not fully used, yuo can do this in one physical machine > > > > 在 2012-5-22 上午8:50,"Li Li" 写道: > >> > >> > >> 在 2012-5-22 凌晨4:59,"Yang" 写道: > >> > >> > > >> > I&#x

use index, big or small?

2012-05-04 Thread Yang
o no copy overhead is incurred. the only argument in favor of sharding is that a smaller index might be faster. but since index search is only O(lg(n)) time, maybe this time saving is very small. so will sharding be worth the effort? thanks yang --

Re: lucene algorithm ?

2012-04-27 Thread Yang
yes, that's why many search engines will not allow user visit page > number greater than a threshold. for most application, users usually > only visit top results. That's why ranking algorithm is important. if > you found your users always turn to next page, I think you should > consider your appli

Re: lucene algorithm ?

2012-04-27 Thread Yang
Thanks Ralf. basically you are talking about selectivity of columns in a JOIN, right? but in my above example, "yellow dog", both terms are very common, and both have long postings lists. Yang On Thu, Apr 26, 2012 at 12:17 AM, Ralf Heyde wrote: > Hi, > > i do

Re: lucene algorithm ?

2012-04-25 Thread Yang
ingle term query would quickly return the top-k, but if it's multi-term, they would have to traverse the entire lists to find the insersection set, because the lists are not sorted by docId, as in the Lucene paper case) On Wed, Apr 25, 2012 at 2:13 PM, Yang wrote: > I read the paper b

Re: some basic questions on how Lucene/search engines work

2011-04-13 Thread Yang
thanks a lot for the detailed info! On Wed, Apr 13, 2011 at 4:43 AM, Grant Ingersoll wrote: > > On Apr 7, 2011, at 9:17 PM, Yang wrote: > >> I'm new to lucene/search engine , and have been struggling with these >> questions recently. >> I'd appreciate a lot

some basic questions on how Lucene/search engines work

2011-04-07 Thread Yang
me good articles on how lucene/search engines work? I've read the "anatomy of a search engine" (google Sergey Brin & Larry Page paper), "introduction to information retrieval (Manning et al ) " , "Lucene in action" Thanks Yang -

Re: minimum string length for fuzzy search

2011-03-30 Thread Andy Yang
quot;~5 work? If not, why not? > > You might get some use from > http://lucene.apache.org/java/2_4_0/queryparsersyntax.html > > Or if that's not germane, perhaps you can explain your use case. > > Best > Erick > > On Wed, Mar 30, 2011 at 5:49 PM, Andy Yang wrote:

Re: minimum string length for proximity search

2011-03-30 Thread Andy Yang
p://lucene.apache.org/java/2_4_0/queryparsersyntax.html > > Or if that's not germane, perhaps you can explain your use case. > > Best > Erick > > On Wed, Mar 30, 2011 at 5:49 PM, Andy Yang wrote: >> Is there a minimum string length requirement for proximity search?

minimum string length for proximity search

2011-03-30 Thread Andy Yang
Is there a minimum string length requirement for proximity search? For example, would "a~" or "an~" trigger proximity search? The result would be horrible if there is no such requirement. Thanks, Andy - To unsubscribe, e-mail: ja

What is the difference between the "AND" and "+" operator?

2010-11-29 Thread yang Yang
What is the difference between the "AND" and "+" operator? ALso,what is the difference between a query and a filter? For example String[] fields={"name","address","classId"}; If I want to search the document whose classId is '4" and whose name or address contain "Zhongzhou Road No 200",I can use t

Re: IndexWriter Class

2010-11-28 Thread jiandong yang
function? after the add is done, should the reader be reopen? or someone show me a simple example, thx a lot!!! -- 祝一切顺利~ Best Regards, 杨建东 = Jiandong Yang Mobile Phone:15921536660 email: yangjiand...@snda.com Architect management office

Re: lucene anchor-distance based search

2010-11-18 Thread yang Yang
BWT,for some condition-required search I can make the condition as a filter and then filter the result. Also I can build a BooleanQuery according to the condition just like the code in the range search,I wonder which is better? 2010/11/18 yang Yang > Thank you very much!!! :) > > I wi

Re: lucene anchor-distance based search

2010-11-17 Thread yang Yang
k at the various approaches > there are for the same. Have a look at the contrib module in lucene. > http://wiki.apache.org/lucene-java/SpatialSearch > For your understanding, you could have a look at the bounding box approach. > > -- > Anshum Gupta > http://ai-cafe.blogspot.com

lucene anchor-distance based search

2010-11-17 Thread yang Yang
We are using the hibernate search which is based on lucene as the search engine to build a full text search for our position-related data in the MYSQL db. This is the main structure of the table(it save the id,coordinate and name of one Surface_Feature): +++-++ | id

combining MultiFieldQueryParserparser with FuzzyQuery

2010-10-18 Thread Andy Yang
I would like to use MultiFieldQueryParser to serach multiple fields, then in each field, I want to use fuzzy search. How can that be done? Any example will be appreciated. Thanks, Andy

does lucene support Database full text search

2010-09-10 Thread yang Yang
Hi: I am using MySql,and I want to use the full text search is rather weak. So I use the Sphinx,however I found it can not support Chinese work searching prefectly. So I wonder if Lucene can work better?

A question about IndexerReader.termPositions()

2008-01-15 Thread Terry Yang
Hi,ALL Playing with an algorithm(Summarize/Highlight Based on Slide Windows), i find that IndexerReader.termPositions(Term term) not support wildcard term. Is it meaningful or not to write a patch to support wildcard term? - To u

Re: using custom sort method

2006-04-18 Thread Yang Sun
e new classes to address the problem. To be able to customize ranking is very important to a search engine. Yang Urvashi Gadi wrote: No...the information is available only at search time Quoting Erik Hatcher <[EMAIL PROTECTED]>: Could your computation be done at indexing time rather tha

Multiple Indexes Search

2006-04-06 Thread Yang Sun
rch keyword1 in content and keyword2 in record and they should also have the same pid. Is there anyway to do this? Or is there any relational database can be integrated with lucene? Thanks, Yang - To unsubscribe, e-mail: [EMAIL

RE: Lucene Ranking/scoring

2006-03-08 Thread Yang Sun
y. Don't know if I can figure out something. Any suggestions? Thanks. Yang -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 2006年3月8日 21:35 To: java-user@lucene.apache.org Subject: Re: Lucene Ranking/scoring Hi Yang, Boosting works at query time as well as

Lucene Ranking/scoring

2006-03-08 Thread Yang Sun
nd an answer. Implements the SortDocComparator seems the closest, but it can only sort the result by one field. The Field boost does not work because the boosting factor has to be set during index time. What I need is setting the weight at query time. Please help. Thanks.