scalability limit in terms of numbers of large documents

2010-08-13 Thread andynuss
Hi, Lets say that I am indexing large book documents broken into chapters. A typical book that you buy at amazon. What would be the approximate limit to the number of books that can be indexed slowly and searched quickly. The search unit would be a chapter, so assume that a book is divided int

Re: word frequency counting

2010-08-13 Thread Greg Gershman
Absolutely! Index your documents, then open an IndexReader and take a look at the terms() method. You can grab each term, and pass it to the IndexReader using the docFreq(Term t) method and get back the number of documents that term appears in. Greg From: S

Re: [ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-13 Thread Erik Hatcher
I have passed this report on to the folks that manage our webinars. Erik On Aug 13, 2010, at 4:51 AM, Stefan Trcek wrote: On Monday 09 August 2010 21:16:30 Mark Miller wrote: Lucid Imagination Presents a free technical webinar: Mastering the Lucene Index Wednesday, August 11, 2010 11

Re: [ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-13 Thread Stefan Trcek
On Monday 09 August 2010 21:16:30 Mark Miller wrote: > Lucid Imagination Presents a free technical webinar:  Mastering the > Lucene Index > Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET > > Sign up here: > http://www.eventsvc.com/lucidimagination/081110?trk-AP Did this work for

[ANN]VTD-XML 2.9

2010-08-13 Thread Jimmy Zhang
VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit https://sourceforge.net/projects/vtd-xml/files/ to download the latest version. a.. Strict Conformance a.. VTD-XML now fully conforms to XML namespace 1.0 spec b.. Performan