[jira] [Commented] (SOLR-5244) Exporting Full Sorted Result Sets

Erik Hatcher (JIRA) Wed, 13 Aug 2014 12:07:09 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095946#comment-14095946
 ]


Erik Hatcher commented on SOLR-5244:
------------------------------------

[~joel.bernstein] could/should we go even further and hide the xport query 
parser and xsort response writer so that they aren't registered at all and 
/export may be a SearchHandler subclass that hides that stuff completely 
internally to it?   Or, if this feature is only useful for the "query" 
component, maybe it could be a non-SearchHandler request handler?   For a 
couple of reasons I bring this up: it seems awkward that a "query parser" is 
used to trigger this sort of thing and that neither of these components would 
be useful for anything but exporting.   Just thinking outloud.  At the very 
least, instead of them being defined in /export "defaults" they should be in 
"invariants", no?  

> Exporting Full Sorted Result Sets
> ---------------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0, 4.10
>
>         Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch, 
> SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch, SOLR-5244.patch
>
>
> This ticket allows Solr to export full sorted result sets. A new export 
> request handler has been created that sets up the default writer type 
> (SortingResponseWriter) and the required rank query (ExportQParserPlugin). 
> The syntax is:
> {code}
> /solr/collection1/export?q=*:*&fl=a,b,c&sort=a desc,b desc
> {code}
> This capability will open up Solr for a whole range of uses that were 
> typically done using aggregation engines like Hadoop. For example:
> *Large Distributed Joins*
> A client outside of Solr calls two different Solr collections and returns the 
> results sorted by a join key. The client iterates through both streams and 
> performs a merge join.
> *Fully Distributed Field Collapsing/Grouping*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and returns results sorted by the collapse key. The client 
> merge joins the sorted lists on the collapse key to perform the field 
> collapse.
> *High Cardinality Distributed Aggregation*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and sorts on a high cardinality field. The client then 
> merge joins the sorted lists to perform the high cardinality aggregation.
> *Large Scale Time Series Rollups*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on time dimensions. The client merge joins the sorted result sets 
> and rolls up the time dimensions as it iterates through the data.
> In these scenarios Solr is being used as a distributed sorting engine. 
> Developers can write clients that take advantage of this sorting capability 
> in any way they wish.
> *Session Analysis and Aggregation*
> A client outside Solr makes individual calls to all servers in a collection 
> and sorts on the sessionID. The client merge joins the sorted results and 
> aggregates sessions as it iterates through the results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5244) Exporting Full Sorted Result Sets

Reply via email to