Count the total # of docs in the index?

2005-08-07 Thread Ben
Hi

Is it possible to count the total number of documents in the index
without requesting a search? I would like to count the total documents
in the index within a date range.

Thanks,
Ben

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Count the total # of docs in the index?

2005-08-07 Thread Erik Hatcher
You've asked to different questions - you can use IndexReader.numDocs 
() to find the total number of documents.


Within a date range - how did you index the dates?  If the dates are  
in lexicographical order, you can walk all the terms in that range  
using TermEnum from IndexReader.terms(Term t) where t is the first  
term in the date range.  You will then need to get the termDocs(t)  
for each of the matching terms.  So it is possible without a search.


Erik


On Aug 7, 2005, at 7:47 AM, Ben wrote:


Hi

Is it possible to count the total number of documents in the index
without requesting a search? I would like to count the total documents
in the index within a date range.

Thanks,
Ben

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



binary of highlighting?

2005-08-07 Thread Riccardo Daviddi
Where can I get the binary of all the classes for highlighting?

thx
-- 
Riccardo Daviddi
University of Siena - Information Engeneering
[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-07 Thread Riccardo Daviddi
I don't know where I am wrong...

I just do this:

IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(),
   !IndexReader.indexExists(indexDir));
writer.setUseCompoundFile(true);
Document document = new Document();
document.add(Field.Keyword("DocId", Integer.toString(docId)));
Field f = Field.Text("boostfield", "text");
f.setBoost(3.0f);
document.add(f);
writer.addDocument(document);
writer.optimize();
writer.close();

if then i try to get the boost factor of the boostfield 

System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost());

for the only one document indexed I get 1.0 instead of 3.0!

where is the error?

thx

On 8/4/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Yes. use 1.2f there.  That method accepts floats, not doubles.  That
> could be an error in the Lucene book.
> 
> Otis
> 
> 
> --- Riccardo Daviddi <[EMAIL PROTECTED]> wrote:
> 
> > Why I got this error by writing for example:
> >
> > Field senderNameField = Field.Text("senderName", senderName);
> > Field subjectField = Field.Text("subject", subject);
> > subjectField.setBoost(1.2);
> >
> > as in the manual lucene in action??
> >
> > 1.2 is a double, but the method wants a float?
> > --
> > Riccardo Daviddi
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
Riccardo Daviddi
University of Siena - Information Engeneering
[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-07 Thread Otis Gospodnetic
A Lucene Highlighter Jar is included in the Lucene in Action code.
The link to downloadable code is at http://lucenebook.com/

Otis

--- Riccardo Daviddi <[EMAIL PROTECTED]> wrote:

> I don't know where I am wrong...
> 
> I just do this:
> 
> IndexWriter writer = new IndexWriter(indexDir, new
> StandardAnalyzer(),
>   
> !IndexReader.indexExists(indexDir));
> writer.setUseCompoundFile(true);
> Document document = new Document();
> document.add(Field.Keyword("DocId", Integer.toString(docId)));
> Field f = Field.Text("boostfield", "text");
> f.setBoost(3.0f);
> document.add(f);
> writer.addDocument(document);
> writer.optimize();
> writer.close();
> 
> if then i try to get the boost factor of the boostfield 
> 
>
System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost());
> 
> for the only one document indexed I get 1.0 instead of 3.0!
> 
> where is the error?
> 
> thx
> 
> On 8/4/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > Yes. use 1.2f there.  That method accepts floats, not doubles. 
> That
> > could be an error in the Lucene book.
> > 
> > Otis
> > 
> > 
> > --- Riccardo Daviddi <[EMAIL PROTECTED]> wrote:
> > 
> > > Why I got this error by writing for example:
> > >
> > > Field senderNameField = Field.Text("senderName", senderName);
> > > Field subjectField = Field.Text("subject", subject);
> > > subjectField.setBoost(1.2);
> > >
> > > as in the manual lucene in action??
> > >
> > > 1.2 is a double, but the method wants a float?
> > > --
> > > Riccardo Daviddi
> > >
> > >
> -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > 
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> -- 
> Riccardo Daviddi
> University of Siena - Information Engeneering
> [EMAIL PROTECTED]
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-07 Thread Chris Hostetter
: Field f = Field.Text("boostfield", "text");
: f.setBoost(3.0f);
: document.add(f);

: if then i try to get the boost factor of the boostfield
:
: 
System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost());
:
: for the only one document indexed I get 1.0 instead of 3.0!
:
: where is the error?

Did you read the documentation for getBoost?

http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html#getBoost()

if you search past messages for getBoost and setBoost you should be able
to find some explanations of how Document based boosts (as opposed to
Query boosts) are used at indexing time.



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-07 Thread Riccardo Daviddi
Ah, ok.

So what I am doing is correct, just the way to see the boost factor
was uncorrect.

sorry if I do newbie questions...

On 8/7/05, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> : Field f = Field.Text("boostfield", "text");
> : f.setBoost(3.0f);
> : document.add(f);
> 
> : if then i try to get the boost factor of the boostfield
> :
> : 
> System.out.println(IndexReader.open(indexDir).document(0).getField("boostfield").getBoost());
> :
> : for the only one document indexed I get 1.0 instead of 3.0!
> :
> : where is the error?
> 
> Did you read the documentation for getBoost?
> 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html#getBoost()
> 
> if you search past messages for getBoost and setBoost you should be able
> to find some explanations of how Document based boosts (as opposed to
> Query boosts) are used at indexing time.
> 
> 
> 
> -Hoss
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
Riccardo Daviddi
University of Siena - Information Engeneering
[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: binary of highlighting?

2005-08-07 Thread Erik Hatcher


On Aug 7, 2005, at 12:17 PM, Riccardo Daviddi wrote:

Where can I get the binary of all the classes for highlighting?


There have never been any official releases of the Sandbox/contrib  
pieces (though that will change with Lucene 1.9/2.0 and beyond).  A  
Lucene 1.4.3 compatible binary exists within the Lucene in Action  
download available from http://www.lucenebook.com


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



New Site Live Using Lucene

2005-08-07 Thread Robert Schultz
Not sure if this is appropriate or not, but I just put live a web site 
that I have been working on for over a year, and it uses Lucene for all 
it's searching.


I have 46 million documents in 15 Lucene index's, although the vast 
majority of those consist of only a few words.

The Lucene index's take up about 6GB of space.

I wrote a Java daemon to listen on a socket, and accept connections from 
my PHP scripts in order to do the searching.


The results from Lucene include ID numbers that are linked up with MySQL 
records thus forming the resulting web page.


You can see the site here: http://csourcesearch.net

It's a website that allows you to search over 99 million lines of open 
source C/C++ code :)


Anyways, just wanted to say thanks a lot for such a great product (even 
if it is java *snicker*)


Thanks again Lucene! :)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Site Live Using Lucene

2005-08-07 Thread Chris Lu
This is cool!

Seems you parsed the C/C++ code. Is this easy to extend to other
languages, like Java?

And you choose to display the data stored in database, any reason for
that compared to reading it from Lucene index itself?

I feel using Lucene's highlighter may make it easier to read the search results.

-- 
Chris Lu

Lucene Search RAD on Any Database
http://www.dbsight.net

On 8/7/05, Robert Schultz <[EMAIL PROTECTED]> wrote:
> Not sure if this is appropriate or not, but I just put live a web site
> that I have been working on for over a year, and it uses Lucene for all
> it's searching.
> 
> I have 46 million documents in 15 Lucene index's, although the vast
> majority of those consist of only a few words.
> The Lucene index's take up about 6GB of space.
> 
> I wrote a Java daemon to listen on a socket, and accept connections from
> my PHP scripts in order to do the searching.
> 
> The results from Lucene include ID numbers that are linked up with MySQL
> records thus forming the resulting web page.
> 
> You can see the site here: http://csourcesearch.net
> 
> It's a website that allows you to search over 99 million lines of open
> source C/C++ code :)
> 
> Anyways, just wanted to say thanks a lot for such a great product (even
> if it is java *snicker*)
> 
> Thanks again Lucene! :)
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Site Live Using Lucene

2005-08-07 Thread Robert Schultz
Yup, the C/C++ code is parsed using some templates I wrote utilizing 
CodeWorker.
It would be possible to do the same thing to any other language such as 
Java or PHP or Perl.
Although you'd need an expert understanding of that language's syntax in 
order to successfully parse it correctly :)


Initially Lucene was never part of the site.
I was using MySQL to store the data, and used MySQL's FULLTEXT searching.
However once I reached 25 million+ rows in a single table, MySQL's 
FULLTEXT searching ground to a halt.
After speaking with the MySQL folks, they told me to use Lucene as their 
FULLTEXT support doesn't scale well and Lucene is supposed to be one of 
the best engines around for that.


Since I was already several months into the project with the vast 
majority of the website written to use the MySQL database, converting 
entirely over to Lucene would have meant a complete code re-write.


I didn't want to do that so I combined both MySQL and Lucene and used both.

It took over 5 FULL MONTHS of 24/7 100% CPU time to PARSE the C/C++ code 
and insert it into the database.

And I only did 3,200 of the more than 25,000 projects I still need to parse.

In hindsight I might have chosen to house everything in Lucene, however 
it would be a major re-write at this point and I'm happy enough right 
now with my 'merged' approach of PHP, MySQL and Lucene.


Chris Lu wrote:

This is cool!

Seems you parsed the C/C++ code. Is this easy to extend to other
languages, like Java?

And you choose to display the data stored in database, any reason for
that compared to reading it from Lucene index itself?

I feel using Lucene's highlighter may make it easier to read the search results.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply Split Search Word

2005-08-07 Thread Karthik N S
Hi

Luceners

Apologies.

As I have already replied,Using Analysis I have tried on all Analyzers
(including Standard Analyzer)
But not able to achive the required COMPLETS WORD Split.

My I/p String would be a lengthy one as below

String sKey = "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis Gospodnetic"  +
"   " +
  "\"" + "Erik Hatcher" + "\""  + "  " +"Authors of " + "\"" +
"Lucene In Action" + "\"";

The required split of complete words should return

   1) "Dough Cutting"
   2) Otis Gospodnetic
   3) "Erik Hatcher"
   4) Authors of
   5) "Lucene In Action"

Plz Note :- Words with "\"" are complete split words

I am shure some Analyzer code inside Lucene is handling this task.


som how can one achive this task..

with regards
Karthik

-Original Message-
From: Mordo, Aviran (EXP N-NANNATEK) [mailto:[EMAIL PROTECTED]
Sent: Friday, August 05, 2005 7:58 PM
To: java-user@lucene.apache.org
Subject: RE: Split Search Word


The StandardAnalyzer should work just fine with it, It will break the
search string to 5 search terms.

HTH

Aviran
http://www.aviransplace.com

  _

From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: Friday, August 05, 2005 1:57 AM
To: LUCENE
Subject: Split Search Word



Hi Luceners

Apologies.

I  have along Search String as given below...



SearchWord =  "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
Gospodnetic"  +  "   " + "\"" + "Erik Hatcher" + "\""  + "  " +
   "Authors of " + "\"" + "Lucene In Action"
+"\"";

And prior to searching the Index ,I need the Words to be Split.

SearchWord   =

   1)   "\"" + "Dough Cutting" + "\""
   2)   "Otis Gospodnetic"
   3)  "\"" + "Erik Hatcher" + "\""
   4)  "Authors of "
   5) "\"" +"Lucene In Action" +"\""

I am shure some Analyzer within Lucene is performin the task.
So some body please Tell me Howto

[ I already used Analysis/Paralysis code to check ,but no help ]




WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: New Site Live Using Lucene

2005-08-07 Thread Chris Hostetter

: I feel using Lucene's highlighter may make it easier to read the search
: results.

I'm of the opinion that since the result pages are all source code, syntax
highlighting is definitely the way to go, but given the existing
presentation, it does seem like it would make sense to "highlight" the
lines containing results by emphasising those line numbers ... perhaps by
bolding or chaning the color of the line number (since that doesn't affect
the syntax highlighting of the code).  I would also suggest listing the
line number(s) of matches at the top of hte page as links to local (named)
anchors (one per line number with a match).


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]