--- Begin Message ---
Hello sir
Thank u for replying.
But my query is not regarding updating indexes, optimizing indexes and such 
others.
 
Sorry sir may be my earlier question was not that clear. Here i elaborate my 
problem clearly. 
I have to develop a search engine for my s website. We have some data on the 
local common drive which is accessible o everyone. The drive contains 
directories which further contain html pages, pdf files, power point 
presenations, xml files, word documents , xls files etc. Now the requirement is 
that people of my department wnat a website which contains a search engine so 
that they are able to search any file from this drive.
So i created a website and i am trying to make a search engine for my site.
 
I have used Lucene to create the search engine.As u must be knowing  Lucene 
first indexes all the files and then we can search them. Till here everything 
goes fine. But when I integrate lucene with my web server i.e. Apache Tomcat it 
just indexs Html documents and text documents. But my requirement is to index 
all files on the common drive i.e. .pdf,.xml,. ppt, .xls etc. For solution to 
this problem I had reffered to the FAQ's of lucene at the following link 
http://wiki.apache.org/jakarta-lucene/LuceneFAQ.  There Ii searched for "How to 
index pdf files" , power point presentations etc. Question 34 is the solution 
for PDF files. Please do refer to it may be u understand my problem better.
 
It says that we require a parser for this so that it can extract text from PDF  
files and convert it into a text file. Now after extracting it, Lucene converts 
that PDF document to text document. Then Lucene indexes it . Now when I search 
it in my browser the file is searched and displayed as a text document which is 
not required. I want it to be display it as PDF documents. So please give me a 
solution for this and tell me how can this problem be solved.
 
"THAT is  HOW CAN I INDEX  and SEARCH .pdf, .ppt,. xml, .doc etc DOCUMENTS  
WITH LUCENE."
I WILL BE REALLY HANKFUL IF U SOLVE MY PROBLEM.
 
 
Regards,
Himani Tandon
Project Engineer
Wipro Technologies(Gurgaon)
Email : [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 

________________________________

From: Chuck Williams [mailto:[EMAIL PROTECTED]
Sent: Wed 4/6/2005 11:49 AM
To: java-user@lucene.apache.org
Subject: Re: wildcarded phrase queries



Erik Hatcher writes (4/5/2005 5:57 PM):

> I have a need to implement wildcarded phrase queries, such as this:
>
>     "apach? luc*"
>
> which would match "apache lucene", for example.  This needs to also
> support ordered and unordered proximity like SpanNearQuery does:
>
>     "apach? luc*"~10
>
> I presume I'm going to have to key off of SpanQuery with a some
> specialized subclasses.
>
> What approach do you recommend for implementing something like this?

Hi Erik,

Might it be as easy as creating a SpanWilcardQuery that transforms into
a SpanOrQuery of SpanTermQuery's, and then use a SpanNearQuery of
SpanWildcardQuery's?  You could use a WildcardTermEnum.to generate the
list of terms for the SpanOrQuery.  This would have some issues like
computing the idf as the sum of all the pattern-matched terms, but it
looks like that issue still exists with WildcardQuery too.  I haven't
done much with SpanQuery's so this might not work out so simply, or be
acceptably efficient.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



<<winmail.dat>>


--- End Message ---

Confidentiality Notice 

The information contained in this electronic message and any attachments to 
this message are intended
for the exclusive use of the addressee(s) and may contain confidential or 
privileged information. If
you are not the intended recipient, please notify the sender at Wipro or [EMAIL 
PROTECTED] immediately
and destroy all copies of this message and any attachments.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to