Re: full text search using Lucene

2006-04-13 Thread Chris Lu
I don't want to advertise DBSight there to polute the mailing list. But this recurring lucene database question often comes out. So here is the reply. And Tony, these questions you can contact us directly later. What you asked is exactly what DBSight does. And all you need to do to get it up and r

Re: full text search using Lucene

2006-04-13 Thread Tony Qian
Chris, Thanks for your reply. I'm currently evaluating different products for full text search. I visited DBSight web site briefly. Basically, I have a main table which has a number of fields for Ids from look-up tables, for example category_id, brand_id, product_id etc. Main table also has a

Re: I just don't get wildcards at all.

2006-04-13 Thread Erik Hatcher
Wildcard stuff gets fun and interesting. Have a look at SpanRegexQuery in the contrib/regex codebase. It is a more generalized version of WildcardQuery, but within the SpanQuery family. Erik On Apr 13, 2006, at 11:22 AM, Erick Erickson wrote: More of the same So, now all I

Re: Boosting Fields (in index) or Queries

2006-04-13 Thread Erik Hatcher
On Apr 13, 2006, at 8:55 PM, Jeremy Hanna wrote: Looking at the results, the first document in the results should hopefully be near the bottom and the Explanation for this document has a Description/Details (using the toString() on the Explanation) of: product of: 0.0 = sum of: 0.0 =

Re: Boosting Fields (in index) or Queries

2006-04-13 Thread Jeremy Hanna
Thanks for the tip. I'm trying to decipher what the explanation tells me right now. Btw, here is the code that I'm currently running: QueryParser nameParser = new QueryParser("name", analyzer); QueryParser categoryParser = new QueryParser("category", analyzer); QueryParser descripti

Re: Boosting Fields (in index) or Queries

2006-04-13 Thread Erik Hatcher
The best recommendation is to have a look at the Explanation returned from IndexSearcher.explain() for a specific query and document to trace how things are being scored. Is it possible you're boosting all documents by the same amount? Erik On Apr 13, 2006, at 6:29 PM, Jeremy Han

Boosting Fields (in index) or Queries

2006-04-13 Thread Jeremy Hanna
I have a situation where I'm indexing database entries and have fields such as: name sku model category name description features specifications I am trying to set a priority higher for the name, category name, and description. I've tried setting the fields' boost values as I've indexed th

Boosting Fields (in index) or Queries

2006-04-13 Thread Jeremy Hanna
I have a situation where I'm indexing database entries and have fields such as: name sku model category name description features specifications I am trying to set a priority higher for the name, category name, and description. I've tried setting the fields' boost values as I've indexed th

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
Erick, Don't get me wrong. I agree with you 100 percent on everything you just said, and have been advocating what you are saying. I turned to the forum to get other peoples thoughts on the issue, feeling that my perspective may be a little warped, and wanted to see what the community thinks. I thi

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Erick Erickson
On 4/13/06, Ananth T. Sarathy <[EMAIL PROTECTED]> wrote: > > No we do have drop downs selects that would allow for the substitution, > but > we also have a free text fields to allow the user to search. That solution > would I think work for the DB query replacement, but you would need a > regular n

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Chris Lu
Depending on what you are trying to search. Let's say your table A is the table with the "real" content, and B and C are your lookup table. You should build one index in this case. And select A's content together with B and C's lookup value into the documents. I strongly recommend you take a look

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Chris Hostetter
: Also we need to address the Join Between A and B and C, which I don't know : see how with out taking out values out of the hit list. When discussing Index structure strategies, speaking in generalities like A B and C is hard .. because there is no 100% generaic solution about how to "join" X an

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
No we do have drop downs selects that would allow for the substitution, but we also have a free text fields to allow the user to search. That solution would I think work for the DB query replacement, but you would need a regular non underscored field to allow for free text. On 4/13/06, Erick Erick

Re: I just don't get wildcards at all.

2006-04-13 Thread Erick Erickson
OK, we've defined the problem away for the present. Which is probably a very good thing. Scratch that, there's no doubt that it's a very good thing . I'm still interested in anything you have to say, of course, but for now I'm off and running Thanks again for all your help Erick

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Erick Erickson
Well, that's a problem you must already be solving. Somewhere, you have to construct your DB query and recognize what constitutes a "term". From your previous mail, you imply you can construct this query... select count(distinct Crew_ID) from Crew_TItles where Title="Producer" Where did you get "

RE: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Pasha Bizhan
Hi, > From: Ananth T. Sarathy [mailto:[EMAIL PROTECTED] > > How would a second index solve the 1:N relationship issue? > For the record I agree that Lucene is Document Centric. Why Lucene should solve this problem? What is your search document? Or what is your search result? What are you goin

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
How would that work is some on a form type Assistant Producer? How would that match the indexed Field if the value added is Assistant_Producer? On 4/13/06, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Warning: I'm quite new to lucene, so this may not be very accurate > > What analyzers are yo

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
Also we need to address the Join Between A and B and C, which I don't know see how with out taking out values out of the hit list. On 4/13/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote: > > I am not sure having an index each for each table solves the problem. > > (Going by the schema I pu

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Erick Erickson
Warning: I'm quite new to lucene, so this may not be very accurate What analyzers are you using for indexing and searching? StandardAnalyzer (like in most of the examples)? Because it looks like you're having a tokenizer problem. That is, when you index "Assistant producer", you actually index

RE: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Satuluri, Venu_Madhav
I am not sure having an index each for each table solves the problem. (Going by the schema I put in the earlier mail) You have an index each for tables A, B and C. What is the lucene-equivalent of the db query A.field1 == value1 and B.field2 == value2 and C.field3 == value3. You cant use MultiSe

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
How would a second index solve the 1:N relationship issue? For the record I agree that Lucene is Document Centric. On 4/13/06, Chris Lu <[EMAIL PROTECTED]> wrote: > > I agree with Jelda. > > Lucene is more document-centric. Storing the relationship is not a > good idea. It's better to simply have

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Chris Lu
I agree with Jelda. Lucene is more document-centric. Storing the relationship is not a good idea. It's better to simply have 2 indexes. Usually when users search, they can choose which index they want. Of course, building the indexes will take more time to process-data. Lucene can not replace re

RE: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ramana Jelda
No.. I don't see your solution is performant.. If each lucene Document corresponds to a row in 'A join B' then Index explodes.. Index size drastically increases. Why not then creating two indexs A and B. And search for A and then from obtained A documents information search in B. It seems for me

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
Yes, this is pretty much what I was trying to do. I am not sure how you solve this with out he maintenance and the removal of duplicate hits logic taking up so much time that any performance benefit from querying lucene is lost. On 4/13/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote: > > I th

RE: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Satuluri, Venu_Madhav
I think you are asking if we can retain 1:n relationships in lucene. Ok, I'll go out on a limb and give my solution. Say you have a table A and table B with B having multiple rows associated to each row in A. Also your documents are centered around A, i.e. all your queries return some row(s) of A,

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
Sorry, hit submit in mid email Ok, Some of the stuff makes some sense. I was a little loopy from lack of sleep and some of these solutions don't really cover my concerns Let's take this movie example. If each member of a production Crew can have multiple titles that come from a lookup tabl

Re: Lucene Seaches VS. Relational database Queries

2006-04-13 Thread Ananth T. Sarathy
Ok, Some of the stuff makes some sense. I was a little loopy from lack of sleep and some of these solutions don't really cover my concerns Let's take this movie example. If each member of a production Crew can have multiple titles that come from a lookup table of Distinct Jobs Titles Assis

Re: I just don't get wildcards at all.

2006-04-13 Thread Erick Erickson
More of the same So, now all I want to do is a SpanNearQuery on a wildcard. No problem if I can use a wildcard query, but I can't because of the "TooManyClauses" issue. We have two types of wildcards, truncation (which I can handle by indexing successively shorter terms and a special characte

Re: Lucene Help

2006-04-13 Thread Erick Erickson
I really, really recommend buying a copy of "Lucene In Action", especially if you don't already have a good grasp of what indexing and searching is all about. It's well worth the effort to read. Sure, you'll have a ton of questions after you're done with it (I certainly had/do), but at least you'll

RE: Index all the files in a directory

2006-04-13 Thread Nick Vincent
Hi Kostas, I am assuming you need to find all of the text files in the directory. Assuming you're using JDK1.5 you need to do something like this: File directory = new File("c:\my\directory"); FilenameFilter findTextFiles = new FilenameFilter() { public boolean accept(File dir, String nam

Index all the files in a directory

2006-04-13 Thread Kostas Vel
Hi, I have a problem that has to do more with java than with lucene. I have a folder that has about 524 text files (.txt) that I want to index. I have made a program that works very well. It does indexing searching etc... ... IndexWriter writer; GreekAnalyzer anal = ne

RE: Lucene Help

2006-04-13 Thread Krovi, DVSR_Sarma
You can use text extractors for the document formats you mentioned. Lucene as such does not deal with this text extraction process. Following are the extractors we generally use: PDF -> PDFBox: Java API to read PDF documents http://www.pdfbox.org. WORD-> Antiword: http://www

Lucene Help

2006-04-13 Thread Shajahan
Hi all, i am new to Lucene. i want to work indexing for PDF,word,txt files. can any one tell me how to dun indexing by Lucene. please give some informetion. Thanking you shaik -- View this message in context: http://www.nabble.com/Lucene-Help-t1442764.html#a3896122 Sent from the Lucene - Java U