Hi,
Lucene is a component that indexes data and allows you to search that
indexed data, you need to be able to program in Java(various ports for other
languages are available) or find a crawler you can adapt to download the
required data of the internet (still requires basic knowledge of Java), from
what I can tell you are wanting (i.e. a tool that downloads files and
indexes it and allowing you to search it), you should use Nutch, it is a
Application unlike Lucene which is a software component that interfaces with
the programmers code to provide a search facility of some sort for their
application.
_gk
----- Original Message -----
From: "Babu, KameshNarayana (GE, Research, consultant)"
<[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Wednesday, March 29, 2006 11:14 AM
Subject: RE: Hi Experts
Thanks Aditya,
Lucene is used only to search in the local machine right? How can lucene
search on the internet?
Do we have any tools which can index on the internet self and displays the
results. I know this is very silly.
-----Original Message-----
From: Aditya Liviandi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 29, 2006 11:34 AM
To: java-user@lucene.apache.org
Subject: RE: Hi Experts
The way lucene works is you need to have the index first.
Only then you can search it.
So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.
But if it is an external URL, then you would need to have a crawler
(which basically collects all the linked documents in the URL). However
you will not be able to get all the documents in the URL (those that are
not linked by any other document, will not be reached by the crawler,
unless you manually supply the URL of that document to the crawler,
otherwise I don't see how you can figure out the existence of that
document.).
--------------------------------------------------- I²R
Disclaimer ------------------------------
This email is confidential and may be privileged. If you are not the
intended recipient, please delete it and notify us immediately. Please do
not copy or use it for any purpose, or disclose its contents to any other
person. Thank you.
-------------------------------------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]