Count the total # of docs in the index?
Hi Is it possible to count the total number of documents in the index without requesting a search? I would like to count the total documents in the index within a date range. Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Count the total # of docs in the index?
You've asked to different questions - you can use IndexReader.numDocs () to find the total number of documents. Within a date range - how did you index the dates? If the dates are in lexicographical order, you can walk all the terms in that range using TermEnum from IndexReader.terms(Term t) where t is the first term in the date range. You will then need to get the termDocs(t) for each of the matching terms. So it is possible without a search. Erik On Aug 7, 2005, at 7:47 AM, Ben wrote: Hi Is it possible to count the total number of documents in the index without requesting a search? I would like to count the total documents in the index within a date range. Thanks, Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
binary of highlighting?
Where can I get the binary of all the classes for highlighting? thx -- Riccardo Daviddi University of Siena - Information Engeneering [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???
I don't know where I am wrong... I just do this: IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), !IndexReader.indexExists(indexDir)); writer.setUseCompoundFile(true); Document document = new Document(); document.add(Field.Keyword("DocId", Integer.toString(docId))); Field f = Field.Text("boostfield", "text"); f.setBoost(3.0f); document.add(f); writer.addDocument(document); writer.optimize(); writer.close(); if then i try to get the boost factor of the boostfield System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost()); for the only one document indexed I get 1.0 instead of 3.0! where is the error? thx On 8/4/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Yes. use 1.2f there. That method accepts floats, not doubles. That > could be an error in the Lucene book. > > Otis > > > --- Riccardo Daviddi <[EMAIL PROTECTED]> wrote: > > > Why I got this error by writing for example: > > > > Field senderNameField = Field.Text("senderName", senderName); > > Field subjectField = Field.Text("subject", subject); > > subjectField.setBoost(1.2); > > > > as in the manual lucene in action?? > > > > 1.2 is a double, but the method wants a float? > > -- > > Riccardo Daviddi > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Riccardo Daviddi University of Siena - Information Engeneering [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???
A Lucene Highlighter Jar is included in the Lucene in Action code. The link to downloadable code is at http://lucenebook.com/ Otis --- Riccardo Daviddi <[EMAIL PROTECTED]> wrote: > I don't know where I am wrong... > > I just do this: > > IndexWriter writer = new IndexWriter(indexDir, new > StandardAnalyzer(), > > !IndexReader.indexExists(indexDir)); > writer.setUseCompoundFile(true); > Document document = new Document(); > document.add(Field.Keyword("DocId", Integer.toString(docId))); > Field f = Field.Text("boostfield", "text"); > f.setBoost(3.0f); > document.add(f); > writer.addDocument(document); > writer.optimize(); > writer.close(); > > if then i try to get the boost factor of the boostfield > > System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost()); > > for the only one document indexed I get 1.0 instead of 3.0! > > where is the error? > > thx > > On 8/4/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Yes. use 1.2f there. That method accepts floats, not doubles. > That > > could be an error in the Lucene book. > > > > Otis > > > > > > --- Riccardo Daviddi <[EMAIL PROTECTED]> wrote: > > > > > Why I got this error by writing for example: > > > > > > Field senderNameField = Field.Text("senderName", senderName); > > > Field subjectField = Field.Text("subject", subject); > > > subjectField.setBoost(1.2); > > > > > > as in the manual lucene in action?? > > > > > > 1.2 is a double, but the method wants a float? > > > -- > > > Riccardo Daviddi > > > > > > > - > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > -- > Riccardo Daviddi > University of Siena - Information Engeneering > [EMAIL PROTECTED] > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???
: Field f = Field.Text("boostfield", "text"); : f.setBoost(3.0f); : document.add(f); : if then i try to get the boost factor of the boostfield : : System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost()); : : for the only one document indexed I get 1.0 instead of 3.0! : : where is the error? Did you read the documentation for getBoost? http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html#getBoost() if you search past messages for getBoost and setBoost you should be able to find some explanations of how Document based boosts (as opposed to Query boosts) are used at indexing time. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???
Ah, ok. So what I am doing is correct, just the way to see the boost factor was uncorrect. sorry if I do newbie questions... On 8/7/05, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : Field f = Field.Text("boostfield", "text"); > : f.setBoost(3.0f); > : document.add(f); > > : if then i try to get the boost factor of the boostfield > : > : > System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost()); > : > : for the only one document indexed I get 1.0 instead of 3.0! > : > : where is the error? > > Did you read the documentation for getBoost? > > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html#getBoost() > > if you search past messages for getBoost and setBoost you should be able > to find some explanations of how Document based boosts (as opposed to > Query boosts) are used at indexing time. > > > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Riccardo Daviddi University of Siena - Information Engeneering [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: binary of highlighting?
On Aug 7, 2005, at 12:17 PM, Riccardo Daviddi wrote: Where can I get the binary of all the classes for highlighting? There have never been any official releases of the Sandbox/contrib pieces (though that will change with Lucene 1.9/2.0 and beyond). A Lucene 1.4.3 compatible binary exists within the Lucene in Action download available from http://www.lucenebook.com Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
New Site Live Using Lucene
Not sure if this is appropriate or not, but I just put live a web site that I have been working on for over a year, and it uses Lucene for all it's searching. I have 46 million documents in 15 Lucene index's, although the vast majority of those consist of only a few words. The Lucene index's take up about 6GB of space. I wrote a Java daemon to listen on a socket, and accept connections from my PHP scripts in order to do the searching. The results from Lucene include ID numbers that are linked up with MySQL records thus forming the resulting web page. You can see the site here: http://csourcesearch.net It's a website that allows you to search over 99 million lines of open source C/C++ code :) Anyways, just wanted to say thanks a lot for such a great product (even if it is java *snicker*) Thanks again Lucene! :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Site Live Using Lucene
This is cool! Seems you parsed the C/C++ code. Is this easy to extend to other languages, like Java? And you choose to display the data stored in database, any reason for that compared to reading it from Lucene index itself? I feel using Lucene's highlighter may make it easier to read the search results. -- Chris Lu Lucene Search RAD on Any Database http://www.dbsight.net On 8/7/05, Robert Schultz <[EMAIL PROTECTED]> wrote: > Not sure if this is appropriate or not, but I just put live a web site > that I have been working on for over a year, and it uses Lucene for all > it's searching. > > I have 46 million documents in 15 Lucene index's, although the vast > majority of those consist of only a few words. > The Lucene index's take up about 6GB of space. > > I wrote a Java daemon to listen on a socket, and accept connections from > my PHP scripts in order to do the searching. > > The results from Lucene include ID numbers that are linked up with MySQL > records thus forming the resulting web page. > > You can see the site here: http://csourcesearch.net > > It's a website that allows you to search over 99 million lines of open > source C/C++ code :) > > Anyways, just wanted to say thanks a lot for such a great product (even > if it is java *snicker*) > > Thanks again Lucene! :) > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Site Live Using Lucene
Yup, the C/C++ code is parsed using some templates I wrote utilizing CodeWorker. It would be possible to do the same thing to any other language such as Java or PHP or Perl. Although you'd need an expert understanding of that language's syntax in order to successfully parse it correctly :) Initially Lucene was never part of the site. I was using MySQL to store the data, and used MySQL's FULLTEXT searching. However once I reached 25 million+ rows in a single table, MySQL's FULLTEXT searching ground to a halt. After speaking with the MySQL folks, they told me to use Lucene as their FULLTEXT support doesn't scale well and Lucene is supposed to be one of the best engines around for that. Since I was already several months into the project with the vast majority of the website written to use the MySQL database, converting entirely over to Lucene would have meant a complete code re-write. I didn't want to do that so I combined both MySQL and Lucene and used both. It took over 5 FULL MONTHS of 24/7 100% CPU time to PARSE the C/C++ code and insert it into the database. And I only did 3,200 of the more than 25,000 projects I still need to parse. In hindsight I might have chosen to house everything in Lucene, however it would be a major re-write at this point and I'm happy enough right now with my 'merged' approach of PHP, MySQL and Lucene. Chris Lu wrote: This is cool! Seems you parsed the C/C++ code. Is this easy to extend to other languages, like Java? And you choose to display the data stored in database, any reason for that compared to reading it from Lucene index itself? I feel using Lucene's highlighter may make it easier to read the search results. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Reply Split Search Word
Hi Luceners Apologies. As I have already replied,Using Analysis I have tried on all Analyzers (including Standard Analyzer) But not able to achive the required COMPLETS WORD Split. My I/p String would be a lengthy one as below String sKey = "\"" + "Dough Cutting" + "\"" + " " + "Otis Gospodnetic" + " " + "\"" + "Erik Hatcher" + "\"" + " " +"Authors of " + "\"" + "Lucene In Action" + "\""; The required split of complete words should return 1) "Dough Cutting" 2) Otis Gospodnetic 3) "Erik Hatcher" 4) Authors of 5) "Lucene In Action" Plz Note :- Words with "\"" are complete split words I am shure some Analyzer code inside Lucene is handling this task. som how can one achive this task.. with regards Karthik -Original Message- From: Mordo, Aviran (EXP N-NANNATEK) [mailto:[EMAIL PROTECTED] Sent: Friday, August 05, 2005 7:58 PM To: java-user@lucene.apache.org Subject: RE: Split Search Word The StandardAnalyzer should work just fine with it, It will break the search string to 5 search terms. HTH Aviran http://www.aviransplace.com _ From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Friday, August 05, 2005 1:57 AM To: LUCENE Subject: Split Search Word Hi Luceners Apologies. I have along Search String as given below... SearchWord = "\"" + "Dough Cutting" + "\"" + " " + "Otis Gospodnetic" + " " + "\"" + "Erik Hatcher" + "\"" + " " + "Authors of " + "\"" + "Lucene In Action" +"\""; And prior to searching the Index ,I need the Words to be Split. SearchWord = 1) "\"" + "Dough Cutting" + "\"" 2) "Otis Gospodnetic" 3) "\"" + "Erik Hatcher" + "\"" 4) "Authors of " 5) "\"" +"Lucene In Action" +"\"" I am shure some Analyzer within Lucene is performin the task. So some body please Tell me Howto [ I already used Analysis/Paralysis code to check ,but no help ] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: New Site Live Using Lucene
: I feel using Lucene's highlighter may make it easier to read the search : results. I'm of the opinion that since the result pages are all source code, syntax highlighting is definitely the way to go, but given the existing presentation, it does seem like it would make sense to "highlight" the lines containing results by emphasising those line numbers ... perhaps by bolding or chaning the color of the line number (since that doesn't affect the syntax highlighting of the code). I would also suggest listing the line number(s) of matches at the top of hte page as links to local (named) anchors (one per line number with a match). -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]