So you want to index different fields and search on those fields and are asking whether you can do that in lucene? The answer is yes.
I still think you should look at Solr but if you are determined to use Lucene, get hold of a copy of the second edition of Lucene In Action http://www.manning.com/hatcher3/. -- Ian. On Wed, Mar 7, 2012 at 11:13 AM, Prasad KVSH <prasad.kokep...@ness.com> wrote: > Hi Ian, > > Thanks for your quick reply. > > Our documents will have the following common key information like > > 1. Document Type ID, > 2. Document Date, > 3. Document Author ID, > 4. Document Status > 5. Document Group ID. > > While creating the indexing, we would like to add the above key values > along the content index. So that it will not read entire index and > search on Document Type ID or Date Range. Can we implement this > approach? > > Currently search text is being performed on indexing, then we are > filtering the documents by reading document record from database table > for the above key values. > > Thanks > Prasad > > > > -----Original Message----- > From: Ian Lea [mailto:ian....@gmail.com] > Sent: Wednesday, March 07, 2012 4:03 PM > To: java-user@lucene.apache.org > Subject: Re: Help on DOCX and XLSX > > You'll have to find something that parses the formats you are interested > in and extracts the text you want. Apache Tika comes to mind. > > Why are you using such an old version of Lucene? Why aren't you using > Solr? That might just work for you out of the box. See also > http://www.lucidimagination.com/devzone/technical-articles/content-extra > ction-tika > > As for the size, I wouldn't worry about it. Disk space is cheap. If > you really do care, scan the FAQ at > http://wiki.apache.org/lucene-java/LuceneFAQ. Lots of useful info on > all sorts of things. > > > -- > Ian. > > > On Wed, Mar 7, 2012 at 9:40 AM, Prasad KVSH <prasad.kokep...@ness.com> > wrote: >> Dear All, >> >> >> >> We started using Lucene version 3.0.3, we have different types of >> documents like PDF, XLS, XLSX, DOC, DOCX,TXT etc., at a specified >> folder. >> >> >> >> We have created index on these files(using IndexFiles.java), Indexing >> has took 17.2 MB for 69.4MB Documents. This index created using >> Standard Analyzer with limited index fields. And able to search a >> given text in PDF(text content only), *.doc and *.xls(MS Word >> 1997-2003) versions only. >> >> >> >> Now I need help on .docx and .xlsx files indexing. How I can run >> indexing on these files. These files are ignored when we do a string >> search >> >> >> >> Writer is defined as below: >> >> IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new >> StandardAnalyzer(Version.LUCENE_CURRENT), true, >> IndexWriter.MaxFieldLength.LIMITED); >> >> >> >> Another question is on the size of index folder, whether we can >> optimize the size >> >> >> >> Thanks >> >> Prasad >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org