Joel Bernstein created SOLR-10017:
-------------------------------------
Summary: Add the crawl Streaming Expression
Key: SOLR-10017
URL: https://issues.apache.org/jira/browse/SOLR-10017
Project: Solr
Issue Type: New Feature
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein
The crawl Streaming Expression will wrap a stream that emits root URL's to
crawl. It will then crawl the URL's using a library such as Crawl4j. It will
emit tuples that can be indexed into a Solr Cloud collection using the update
function. Solr's classifier can be used to curate content as it's being crawled
or classify sites based on the content which it contains. The links between
pages and sites can be indexed as graphs and then explored and visualized with
graph expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]