[jira] [Commented] (SOLR-5244) Full Search Result Export

Mikhail Khludnev (JIRA) Tue, 24 Dec 2013 13:28:56 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856461#comment-13856461
 ]


Mikhail Khludnev commented on SOLR-5244:
----------------------------------------

bq. Does it cause any issues with the normal response writer flow?
I don't think so. it hits dedicated handlers. So, it's well separated from 
regular flow.
bq. More testing of this feature shows
i wonder if you can post numbers and profiler stacktrace. 
How many fields are dumped in your test case? 
I have one thought: _BinaryDocValuesImpl.get(int, BytesRef)_ hits _docToOffset_ 
and _bytes_ after that per every given docnum. Asserting that sequential 
reading is faster than a random one it makes sense to buffer array of offsets 
and then look through it for reading  _bytes_. Also, looping by 
_binaryFieldWriters_ per every doc seems like a columnar performance killer. 
bq. I think we can build segment level caches..
can you highlight how it differs from old good FieldCaches (I mean what's 
produced by FieldCacheImpl.BinaryDocValuesCache) ?
bq. I'm shooting to achieve an export rate of 5+ million small records 
It sounds really ambitious to me. My expectation about average IO rate is 
100-200 MB/sec (and I might wrong here). so few millions might hit the ceiling. 


> Full Search Result Export
> -------------------------
>
>                 Key: SOLR-5244
>                 URL: https://issues.apache.org/jira/browse/SOLR-5244
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 5.0
>            Reporter: Joel Bernstein
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: SOLR-5244.patch
>
>
> It would be great if Solr could efficiently export entire search result sets 
> without scoring or ranking documents. This would allow external systems to 
> perform rapid bulk imports from Solr. It also provides a possible platform 
> for exporting results to support distributed join scenarios within Solr.
> This ticket provides a patch that has two pluggable components:
> 1) ExportQParserPlugin: which is a post filter that gathers a BitSet with 
> document results and does not delegate to ranking collectors. Instead it puts 
> the BitSet on the request context.
> 2) BinaryExportWriter: Is a output writer that iterates the BitSet and prints 
> the entire result as a binary stream. A header is provided at the beginning 
> of the stream so external clients can self configure.
> Note:
> These two components will be sufficient for a non-distributed environment. 
> For distributed export a new Request handler will need to be developed.
> After applying the patch and building the dist or example, you can register 
> the components through the following changes to solrconfig.xml
> Register export contrib libraries:
> <lib dir="../../../dist/" regex="solr-export-\d.*\.jar" />
>  
> Register the "export" queryParser with the following line:
>  
> <queryParser name="export" 
> class="org.apache.solr.export.ExportQParserPlugin"/>
>  
> Register the "xbin" writer:
>  
> <queryResponseWriter name="xbin" 
> class="org.apache.solr.export.BinaryExportWriter"/>
>  
> The following query will perform the export:
> {code}
> http://localhost:8983/solr/collection1/select?q=*:*&fq={!export}&wt=xbin&fl=join_i
> {code}
> Initial patch supports export of four data-types:
> 1) Single value trie int, long and float
> 2) Binary doc values.
> The numerics are currently exported from the FieldCache and the Binary doc 
> values can be in memory or on disk.
> Since this is designed to export very large result sets efficiently, stored 
> fields are not used for the export.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5244) Full Search Result Export

Reply via email to