Hi,
Lucene is a component that indexes data and allows you to search that indexed data, you need to be able to program in Java(various ports for other languages are available) or find a crawler you can adapt to download the required data of the internet (still requires basic knowledge of Java), from what I can tell you are wanting (i.e. a tool that downloads files and indexes it and allowing you to search it), you should use Nutch, it is a Application unlike Lucene which is a software component that interfaces with the programmers code to provide a search facility of some sort for their application.

_gk

----- Original Message ----- From: "Babu, KameshNarayana (GE, Research, consultant)" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Wednesday, March 29, 2006 11:14 AM
Subject: RE: Hi Experts


Thanks Aditya,
Lucene is used only to search in the local machine right? How can lucene search on the internet? Do we have any tools which can index on the internet self and displays the results. I know this is very silly.

-----Original Message-----
From: Aditya Liviandi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 29, 2006 11:34 AM
To: java-user@lucene.apache.org
Subject: RE: Hi Experts


The way lucene works is you need to have the index first.
Only then you can search it.

So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.


But if it is an external URL, then you would need to have a crawler
(which basically collects all the linked documents in the URL). However
you will not be able to get all the documents in the URL (those that are
not linked by any other document, will not be reached by the crawler,
unless you manually supply the URL of that document to the crawler,
otherwise I don't see how you can figure out the existence of that
document.).


--------------------------------------------------- I²R Disclaimer ------------------------------ This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
-------------------------------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to