Re: Lucene Seaches VS. Relational database Queries

2006-04-14 Thread Paul Elschot
Gentlemen, A join like operation between Lucene indexes can be done with (at least) reasonable performance by using a few standard methods from RDB's: sort before going to disk, and cache whenever possible. The steps are: - query the first Lucene index with the low level search API to get the Lu

Max Frequency and Tf/Idf

2006-04-14 Thread Danilo Cicognani
Hello everybody. We are building a complex automatic classification system using Lucene. We need to manage normalized Tf/Idf (Term Frequency / Inverse Document Frequency). We understood that Lucene can give us Tf and Df and we are using these values to calculate the normalized Tf/Idf but we would l

Re: Max Frequency and Tf/Idf

2006-04-14 Thread Grant Ingersoll
The Term Vector code can be used to get the term frequencies from a specific document. Search this list, see the Lucene In Action book or look at http://www.cnlp.org/apachecon2005 for examples on how to use Term Vectors Danilo Cicognani wrote: Hello everybody. We are building a complex autom

query analysis

2006-04-14 Thread karl wettin
Hello list, I want to know if a human written query passed through the QueryParser is "clean" from fields, boolean clauses and query indicators. Easy way out would of course to add a boolean that resets at ReInit(), but maybe there is a smart way to do it. Perhaps it is possible to treat

Syntax help

2006-04-14 Thread Bill Snyder
Hello, We am using Lucene to facilitate searching of our applications log files. I am noticing some inconsistencies in result sets when searching on certain fields. One field we index is the file path. I am using a simple query like "location:Z:\logs\someLogFile.log". However, I can never get pat

Re: Syntax help

2006-04-14 Thread Rajesh Munavalli
It would be helpful to download Luke (http://www.getopt.org/luke/) and analyze whats getting indexed. Have you tried that? On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote: > > Hello, > > We am using Lucene to facilitate searching of our applications log files. > I > am noticing some inconsistenc

Re: Syntax help

2006-04-14 Thread karl wettin
14 apr 2006 kl. 16.37 skrev Bill Snyder: One field we index is the file path. I am using a simple query like "location:Z:\logs\someLogFile.log". However, I can never get path searches like this to come back with any results. Tried escaping the backslashes and colon. Nothing seems to work.

Re: Syntax help

2006-04-14 Thread Bill Snyder
Oh, cool. Look at that. A neat tool made with thinlets. I had not heard of this...I'll see if it helps me figure out whats going on. --Bill On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> wrote: > > It would be helpful to download Luke (http://www.getopt.org/luke/) and > analyze whats getting in

Re: Syntax help

2006-04-14 Thread Bill Snyder
AHA! I am using the Search tab and have enteres the query : location:Z:\install\logs\archive.log.D20060406.T141958 the query details says the query was parsed to location:z so if I escape the colon I see the new parsed query as location:"z installlogsarchive.log.d20060406.t141958" So, lucenc

Re: Syntax help

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.11 skrev Bill Snyder: so if I escape the colon I see the new parsed query as location:"z installlogsarchive.log.d20060406.t141958" So, lucence does not store the file path exactly?! It converts it all lower case! Is there some property I should turn on? It is the Anal

Re: Syntax help

2006-04-14 Thread Rajesh Munavalli
On 4/14/06, Bill Snyder <[EMAIL PROTECTED]> wrote: > > AHA! I am using the Search tab and have enteres the query : > > location:Z:\install\logs\archive.log.D20060406.T141958 > > the query details says the query was parsed to > > location:z > > so if I escape the colon I see the new parsed query as

Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
I would like to store all in my application rather than using the Lucene persistency mechanism for tokens. I only want the search mechanism. I do not need the IndexReader and IndexWriter as that will be a natural part of my application. I only want to use the Searchable. So I looked at exte

Re: Syntax help

2006-04-14 Thread Bill Snyder
Thanks! OK, how do I get the file separator to be part of the term? Luke shows the parsed query as ignoring the file separator. so location:Z\:\\/install/logs\\jetspeedservices.log becomes location:"z install logs jetspeedservices.log" --Bill On 4/14/06, Rajesh Munavalli <[EMAIL PROTECTED]> w

Re: query analysis

2006-04-14 Thread Chris Lu
tried MultiFieldQueryParser? Chris Lu --- Full-Text Lucene Search on Any Databases http://www.dbsight.net Faster to Setup than reading marketing materials! On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: > Hello list, > > I want to know if a human written qu

Re: Syntax help

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.22 skrev karl wettin: It is the Analyzer that does that. Try creating your IndexSearcher with a KeywordAnalyzer (it think). err It is the Analyzer that does that. Try using a KeywordAnalyzer (it think).

Re: query analysis

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.41 skrev Chris Lu: On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: I want to know if a human written query passed through the QueryParser is "clean" from fields, boolean clauses and query indicators. Easy way out would of course to add a boolean that resets at ReInit(),

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Chris Lu
use Store.NO when creating Field Chris Lu --- Full-Text Lucene Search on Any Databases http://www.dbsight.net Faster to Setup than reading marketing materials! On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: > I would like to store all in my application rat

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.46 skrev Chris Lu: On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: I would like to store all in my application rather than using the Lucene persistency mechanism for tokens. I only want the search mechanism. I do not need the IndexReader and IndexWriter as that will be a

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Christophe
On 14 Apr 2006, at 08:51, karl wettin wrote: You missunderstand all my questions. I must admit I was not sure I understood your question, either. In order to search, Lucene needs an index. That index is maintained by the IndexReader and IndexWriter classes. Are you contemplating having

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.51 skrev karl wettin: 14 apr 2006 kl. 17.46 skrev Chris Lu: On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: I would like to store all in my application rather than using the Lucene persistency mechanism for tokens. I only want the search mechanism. I do not need the In

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.51 skrev Christophe: Are you contemplating having your own index and index format? In that case, it's not clear to me how much leverage you will be getting using Lucene at all. Could you explain in more detail what you are trying to do? I want to use the parts of Luc

Re: Syntax help

2006-04-14 Thread Bill Snyder
oops, thought that you were just referring to the lowercase... :) On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: > > > 14 apr 2006 kl. 17.22 skrev karl wettin: > > > > It is the Analyzer that does that. Try creating your IndexSearcher > > with a KeywordAnalyzer (it think). > > err > > It is th

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Christophe
On 14 Apr 2006, at 08:55, karl wettin wrote: I don't want to use Lucene for persistence. I do not want to store tokens nor field text in a FSDirectory or in a RAMDirectory. I want to take store the the tokens in my application. If I understand your question, I think that the first answer wa

Re: query analysis

2006-04-14 Thread Chris Lu
Sorry, really misunderstood you. And you already know Lucene a lot. :) Basically you want to restore the original query from the Query object. But it may have already passed a lot of composition, like Boolean, Span, Wildcard. I don't feel it's possible to reconstruct the original human query. Ch

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 17.56 skrev karl wettin: 14 apr 2006 kl. 17.51 skrev Christophe: Are you contemplating having your own index and index format? In that case, it's not clear to me how much leverage you will be getting using Lucene at all. Could you explain in more detail what you are tr

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Chris Lu
Thanks, Christophe. Hi, Kevin, I think your question means you want to store the Analyzed tokens yourself? If so, you can use Analyzer to directly process the text, and save the analyzed results in your application, maybe later use it in some RDBMS? or BerkelyDB? Chris Lu ---

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 18.01 skrev Christophe: On 14 Apr 2006, at 08:55, karl wettin wrote: I don't want to use Lucene for persistence. I do not want to store tokens nor field text in a FSDirectory or in a RAMDirectory. I want to take store the the tokens in my application. If I understand your

Lucene probabilistic

2006-04-14 Thread Malcolm Clark
Hi all, I came across an old mail list item from 2003 exploring the possibilities of a more probabilistic approach to using Lucene. Do the online experts know if anyone achieved this since? Thanks for any advice, Malc

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Doug Cutting
karl wettin wrote: I would like to store all in my application rather than using the Lucene persistency mechanism for tokens. I only want the search mechanism. I do not need the IndexReader and IndexWriter as that will be a natural part of my application. I only want to use the Searchable.

Re: Boosting Fields (in index) or Queries

2006-04-14 Thread Jeremy Hanna
Wow, I finally found out why I was getting results in the wrong order - I got the results in the correct order from the Lucene index. I got the explanation of each of the results along with their database id and found the ordering mismatch. The problem is in the database call. I am calling

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread karl wettin
14 apr 2006 kl. 18.31 skrev Doug Cutting: karl wettin wrote: I would like to store all in my application rather than using the Lucene persistency mechanism for tokens. I only want the search mechanism. I do not need the IndexReader and IndexWriter as that will be a natural part of my a

RE: Boosting Fields (in index) or Queries

2006-04-14 Thread Bryzek.Michael
We tried two approaches: 1) Pull data from the db in arbitrary order and then sort in the application AFTER the retrieve. This will require two passes over the results. 2) Add an order by clause to the select. In Oracle, you could do something like "order by decode(444,1,333,2,555,3,888,4,

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Yonik Seeley
On 4/14/06, karl wettin <[EMAIL PROTECTED]> wrote: > Do I have to worry about passing a null Directory to the default > constructor? That's not an easy road you are trying to take, but it should be doable. There are some final methods you can't override, but just set directoryOwner=false and close

Re: Syntax help

2006-04-14 Thread Erick Erickson
Something that took me a while to get was that the analyzer is important BOTH in the indexing phase and in the searching phase (assuming you're using the QueryParser). For you experiment, you probably want to use the WhitespaceAnalyzer. See page 119 of "Lucene in Action". The other three most-comm

Re: Syntax help

2006-04-14 Thread Bill Snyder
On 4/14/06, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Something that took me a while to get was that the analyzer is important > BOTH in the indexing phase and in the searching phase (assuming you're > using > the QueryParser). For you experiment, you probably want to use the > WhitespaceAnalyz

Re: Boosting Fields (in index) or Queries

2006-04-14 Thread Jeremy Hanna
I would use a database function to force the ordering like the one your provided that works in Oracle, but it doesn't look like mysql 5 supports that. If anyone else knows of a way to force the ordering using mysql 5 queries, please respond. I think I'll just resort them when they get bac

Re: Using Lucene for searching tokens, not storing them.

2006-04-14 Thread Doug Cutting
karl wettin wrote: Do I have to worry about passing a null Directory to the default constructor? A null Directory should not cause you problems. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mai

Re: Boosting Fields (in index) or Queries

2006-04-14 Thread Michael D. Curtin
Jeremy Hanna wrote: I would use a database function to force the ordering like the one your provided that works in Oracle, but it doesn't look like mysql 5 supports that. If anyone else knows of a way to force the ordering using mysql 5 queries, please respond. I think I'll just resort th

Re: Boosting Fields (in index) or Queries

2006-04-14 Thread Jeremy Hanna
I still have a similar problem with the boost factor. I change the name to have the AND operator and set that query's boost to a very high value in relation to the others. I also have a regular OR based name so that it doesn't rule those out. However whenever I change the boost values wi

Re: Lucene Seaches VS. Relational database Queries

2006-04-14 Thread Jeryl Cook
Im the co-worker who suggested to Ananth( I've think we have been debating this for 3 days now,from the post it seems he is winning :)... ) Anway, as Ananth stated I suggested this because I am wondering if lucene could solve a bottle neck query that is taking a deathly long time to complete(rea

Catching BooleanQuery.TooManyClauses

2006-04-14 Thread bb
Hi Lucene Users, I would like to catch BooleanQuery.TooManyClauses exception for certain wildcard searches and display a 'subset' of results. I have used the WildcardTermEnum to give me the first X documents matching the wildcard query. Below is the code I use to implement the solution. Witho