[
https://issues.apache.org/jira/browse/SOLR-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-10017:
----------------------------------
Description: The crawl Streaming Expression will wrap a stream that emits
root URL's to crawl. It will then crawl the URL's using a library such as
Crawler4j. It will emit tuples that can be indexed into a Solr Cloud collection
using the update function. Solr's classifier can be used to curate content as
it's being crawled or classify sites based on the content which it contains.
The links between pages and sites can be indexed as graphs and then explored
and visualized with graph expressions. (was: The crawl Streaming Expression
will wrap a stream that emits root URL's to crawl. It will then crawl the URL's
using a library such as Crawl4j. It will emit tuples that can be indexed into a
Solr Cloud collection using the update function. Solr's classifier can be used
to curate content as it's being crawled or classify sites based on the content
which it contains. The links between pages and sites can be indexed as graphs
and then explored and visualized with graph expressions.)
> Add the crawl Streaming Expression
> ----------------------------------
>
> Key: SOLR-10017
> URL: https://issues.apache.org/jira/browse/SOLR-10017
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Joel Bernstein
>
> The crawl Streaming Expression will wrap a stream that emits root URL's to
> crawl. It will then crawl the URL's using a library such as Crawler4j. It
> will emit tuples that can be indexed into a Solr Cloud collection using the
> update function. Solr's classifier can be used to curate content as it's
> being crawled or classify sites based on the content which it contains. The
> links between pages and sites can be indexed as graphs and then explored and
> visualized with graph expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]