Joel Bernstein created SOLR-10017:
-------------------------------------

             Summary: Add the crawl Streaming Expression
                 Key: SOLR-10017
                 URL: https://issues.apache.org/jira/browse/SOLR-10017
             Project: Solr
          Issue Type: New Feature
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Joel Bernstein


The crawl Streaming Expression will wrap a stream that emits root URL's to 
crawl. It will then crawl the URL's using a library such as Crawl4j. It will 
emit tuples that can be indexed into a Solr Cloud collection using the update 
function. Solr's classifier can be used to curate content as it's being crawled 
or classify sites based on the content which it contains. The links between 
pages and sites can be indexed as graphs and then explored and visualized with 
graph expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to