Re: (~) opertor query....

2007-12-18 Thread Chris Hostetter
: You can look at org.apache.lucene.search.MultiPhraseQuery which does : something similar to what you ask. From its javadoc: good call .. funny thing, i was just pointing out MultiPhraseQuery as a way to meat this need only a few days ago, but for some reason it didn't occur to me in the threa

Re: Can I do boosting based on term postions?

2007-12-18 Thread Peter Keegan
This is a nice alternative to using payloads and BoostingTermQuery. Is there any reason not to make this change to SpanFirstQuery, in particular: >This modification to SpanFirstQuery would be that the Spans >returned by SpanFirstQuery.getSpans() must always return 0 >from its start() method. Shou

Re: lucene-core-2.2.0.jar broken? CorruptIndexException?

2007-12-18 Thread Grant Ingersoll
Hey Bill, Any status on this? On Dec 2, 2007, at 10:37 PM, Bill Janssen wrote: Hmmm, it still sounds like you are hitting a threading issue that is probably exacerbated by the multicore platform of the newer machine. Exactly what I was thinking. What are the details of the CPUs of these two

Infrastructure Question

2007-12-18 Thread v k
Hello, I am using Lucene to build an index from roughly 10 million documents in number. The documents are about 4 TB in total. After some trial runs, indexing a subset of the documents I am trying to figure out a hosting service configuration to create a full index from the entire 10 TB of data

Re: Infrastructure Question

2007-12-18 Thread Grant Ingersoll
Hi Venkat, There is no need to post your question multiple times or cross-post. People are distributed all around the world on this list and aren't always available or capable to answer your question. Having to wait 11 hours for an answer on a free mailing list is not at all unreasonabl

Re: Infrastructure Question

2007-12-18 Thread v k
Sorry about that. For some reason, my post did not show up in the mailing list and I still cannot see it ( maybe a settings issue). I don't mean to barrage the mailing list with the same question. Thanks for the advise. On Dec 18, 2007 11:43 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Hi V

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
Yes... If my query phrase is "Health Safety", docs with "Health and Safety", "Health or Safety" are being returned... So... Is there any other way to handle this situation... Especially in the above mentioned case, the user is expecting around 5 records and the query is fetching more than 550 rec

Re: Can I do boosting based on term postions?

2007-12-18 Thread Paul Elschot
On Tuesday 18 December 2007 14:59:45 Peter Keegan wrote: > > Should I open a Jira issue? > What shall I say? http://www.apache.org/foundation/how-it-works.html Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: Phrase Query Problem

2007-12-18 Thread mark harwood
You could write a custom analyzer that drops stopwords but adds an extra 1 to the "positionIncrement" property for the next valid Token after each omiited stop word. This would retain the benefit of removing stopwords from your index and yet prevent your example phrases matching (because the re

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
Hi, In case you donot want to toss away any stop words and even preserve case, WhiteSpaceAnalyzer can be used, also using WhiteSpaceTokenizer would serve as a test (but need to reindex whole data set first), to make sure there is no other problems. Best regards, Lisheng -Original Message-

Lucene multifield query problem

2007-12-18 Thread Rakesh Shete
Hi all, I am facing problem with the following multifield query: i_title:indoor* i_description:indoor* -i_published:false +i_topicsClasses.id:1_1_*_* The above query returns me even results which should not be there. Ideally I would like the query resullts as: (i_title:indoor* i_description:

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
ok, thnx... I will implement using the WhiteSpaceAnalyzer... Let me check the indexing speed... I mean time taken to index my data set... If that takes too long then probably I will look into implementing a custom analyzer... Zhang, Lisheng wrote: > > Hi, > > In case you donot want to toss awa

RE: Lucene multifield query problem

2007-12-18 Thread Steven A Rowe
Hi Rakesh, Set the default QueryParser operator to AND (default default operator :) is OR): Steve On 12/18/2007 at 1:22 PM, Rakesh Shet

Re: Indexing Wikipedia dumps

2007-12-18 Thread Marcelo Ochoa
Hi All: Just to add simple hack, I had posted at my Blog an entry named "Uploading WikiPedia Dumps to Oracle databases": http://marceloochoa.blogspot.com/2007_12_01_archive.html with instructions to upload WikiPedia Dumps to Oracle XMLDB, it means transforming an XML file to an object-relationa

RE: Phrase Query Problem

2007-12-18 Thread Zhang, Lisheng
Hi, 1) Whenever we change to a different analyzer, we need to reindex whole dataset, whether or not using WhiteSpaceAnalyzer. 2) Using WhiteSpaceAnalyzer may increase disk space and slow-down indexing because more tokens are indexed, how much can be slowed I donot know. 3) WhiteSpaceAnaly

RE: Lucene multifield query problem

2007-12-18 Thread Rakesh Shete
Thanks for the suggestion Steve. My problem is with getting the correct results. Let me put in words the query : Fetch all documents such that the search string "indoor*" is either part of the 'i_title' field or 'i_description' field, eliminate if not published (-i_published:false) but should

RE: Lucene multifield query problem

2007-12-18 Thread Steven A Rowe
Hi Rakesh, This doesn't look like a user-generated query. Have you considered building the Query via the API instead of using QueryParser? With QueryParser, you should get the results you want with syntax like: +(i_title:indoor* OR i_description:indoor*) -i_published:false +i_topicsClasses.id

RE: Phrase Query Problem

2007-12-18 Thread Sirish Vadala
Hmmm... I had come up with a temporary solution for the time being. This is how I am initializing the StandardAnalyzer to fix my problem. String[] STOP_WORDS = {}; this.analyzer = new StandardAnalyzer(STOP_WORDS); This now indexes all my stop words, and gladly it didn't increase my indexing time

Re: Analyzer to use with MultiSearcher using various indexes for multiple languages

2007-12-18 Thread Daniel Naber
On Dienstag, 18. Dezember 2007, Jay Hill wrote: > We > have a requirement to search across multiple languages, so I'm planning > to use MultiSearcher, passing an array of all IndexSearchers for each > language. You will need to analyze the query once per language and then build a new BooleanQuer

Re: Lucene multifield query problem

2007-12-18 Thread Doron Cohen
Hi Rakesh, Perhaps the confusion comes from the asymmetry between +X and -X. I.e., for the query: A B -C +D one might think that, similar to how -C only disqualifies docs containing C (but not qualifying docs not containing C), also +D only disqualifies docs not containing D. But this is i

Re: Phrase Query Problem

2007-12-18 Thread Erick Erickson
This will, indeed, NOT remove stop words. If that is all you need, you're done. But you will now have useless words in your index like the, is, etc. Making your own analyzer by subclassing a suitable existing analyzer, or composing one will fix you right up if having the extra words in your index

RE: Lucene multifield query problem

2007-12-18 Thread Rakesh Shete
Hi Doren, Steve. Your suggestions make sense but dont give me the desired results. Here is the code how I generate the query: String searchCriteria = "Indoor*";QueryParser queryparser = new QueryParser("i_title",new StandardAnalyzer());Query q1 = queryparser.parse(searchCri

Re: Lucene multifield query problem

2007-12-18 Thread Doron Cohen
Hi Rakseh, It just occurred to me that your code has String searchCriteria = "Indoor*"; Assuming StandardAnalyzer used at indexing time, all text words were lowercased. Now, QueryParser by default does not lowercase wildcard queries. You can however instruct it to do so by calling: myQu

RE: Lucene multifield query problem

2007-12-18 Thread Steven A Rowe
Hi Rakesh, Here's a version that should work (warning: untested): TermQuery notPublishedQuery = new TermQuery(new Term("i_published", "false")); PrefixQuery topicQuery = new PrefixQuery (new Term("i_topicsClasses.id", "1_1_*_*")); String searchCriteria = "Indoor*"; Term titleTerm = new Term