[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Joel Bernstein (JIRA) Thu, 24 Jul 2014 13:59:00 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-5244:
---------------------------------

    Description: 
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sorted by the collapse key. The client merge 
joins the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.






  was:
This ticket allows Solr to export full sorted result sets. The proposed syntax 
is:

{code}
q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
{code}

Under the covers, the rows=-1 parameter will signal Solr to use the 
ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
results. The SortingResponseWriter will sort the results based on the sort 
criteria and stream the results out.

This capability will open up Solr for a whole range of uses that were typically 
done using aggregation engines like Hadoop. For example:

*Large Distributed Joins*

A client outside of Solr calls two different Solr collections and returns the 
results sorted by a join key. The client iterates through both streams and 
performs a merge join.

*Fully Distributed Field Collapsing/Grouping*

A client outside of Solr makes individual calls to all the servers in a single 
collection and returns results sort by the collapse key. The client merge joins 
the sorted lists on the collapse key to perform the field collapse.

*High Cardinality Distributed Aggregation*

A client outside of Solr makes individual calls to all the servers in a single 
collection and sorts on a high cardinality field. The client then merge joins 
the sorted lists to perform the high cardinality aggregation.

In these scenarios Solr is being used as a distributed sorting engine. 
Developers can write clients that take advantage of this sorting capability in 
any way they wish.







> Exporting Full Sorted Result Sets
> ---------------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0, 4.10
>
>         Attachments: 0001-SOLR_5244.patch, SOLR-5244.patch, SOLR-5244.patch
>
>
> This ticket allows Solr to export full sorted result sets. The proposed 
> syntax is:
> {code}
> q=*:*&rows=-1&wt=xsort&fl=a,b,c&sort=a desc,b desc
> {code}
> Under the covers, the rows=-1 parameter will signal Solr to use the 
> ExportQParserPlugin as a RankQuery, which will simply collect a BitSet of the 
> results. The SortingResponseWriter will sort the results based on the sort 
> criteria and stream the results out.
> This capability will open up Solr for a whole range of uses that were 
> typically done using aggregation engines like Hadoop. For example:
> *Large Distributed Joins*
> A client outside of Solr calls two different Solr collections and returns the 
> results sorted by a join key. The client iterates through both streams and 
> performs a merge join.
> *Fully Distributed Field Collapsing/Grouping*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and returns results sorted by the collapse key. The client 
> merge joins the sorted lists on the collapse key to perform the field 
> collapse.
> *High Cardinality Distributed Aggregation*
> A client outside of Solr makes individual calls to all the servers in a 
> single collection and sorts on a high cardinality field. The client then 
> merge joins the sorted lists to perform the high cardinality aggregation.
> In these scenarios Solr is being used as a distributed sorting engine. 
> Developers can write clients that take advantage of this sorting capability 
> in any way they wish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5244) Exporting Full Sorted Result Sets

Reply via email to