One way to adapt Solrj would be to keep it's current memory structure for facets etc.. and then have it return a TupleStream for documents.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Nov 17, 2016 at 9:51 PM, Joel Bernstein <[email protected]> wrote: > It's possible that we could find a design where /select could behave like > /export. I think Noble's design of treating a Stream as an iterator is > promising. We could change all document result sets to iterators and hide > the implementation of how the docs are materialized. This would also impact > how output from other search components would be handled. Since result sets > aren't limited to top N, all summarized data, such as facets would need to > come before the documents. Then Solrj would need to be able to read the > summarized data into memory, and stream the documents. It's a nice design, > but quite a bit of work. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Nov 17, 2016 at 9:26 PM, Yonik Seeley <[email protected]> wrote: > >> On Thu, Nov 17, 2016 at 9:16 PM, Joel Bernstein <[email protected]> >> wrote: >> > There were two issues that make the regular /select hander problematic >> for >> > large result sets: >> > >> > 1) Use of stored fields, which require lots of disk access. I believe >> this >> > has been resolved now that the field list can be pulled from the >> docValues. >> > >> > 2) The /select handler sorts by loading the top N docs into a priority >> > queue. >> >> That feels like it could be optional though. PQ makes sense for small >> top-N that will go in the cache, but makes less sense when you want >> all documents back. >> >> Look at it from the other perspective: if one is retrieving all >> documents that match a query (and lets assume that the number of >> matches is large), is /export ever less efficient in that case? If >> /export is always better in that scenario, that sounds like an >> optimization, not a tradeoff or different design goal, and /select >> should always be using the superior algorithm/mechanism for that case. >> >> -Yonik >> >> >> > This approach becomes untenable at a certain point. The export >> > handler, iterates over a bitset of collected docs in multiple passes. >> This >> > keeps constant performance as the result set grows. This is harder to >> make >> > work without avoiding the current select logic. >> > >> > I'm not in full agreement that /select and /export need to come >> together. >> > They really do have different design goals. /select tries to be very >> > efficient and fast to support high QPS. /export tries to maintain >> constant >> > memory use and performance as the result set size increases. Trying to >> find >> > a way to accomplish both may just end up comprising the design so it >> doesn't >> > either use case. >> > >> > >> > >> > Joel Bernstein >> > http://joelsolr.blogspot.com/ >> > >> > On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]> >> wrote: >> >> >> >> On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden < >> [email protected]> >> >> wrote: >> >> > For reference, the SQL/JDBC piece needed ability to specify wildcard >> and >> >> > figure out the "schema" of the collection including defined dynamic >> >> > fields. >> >> >> >> Out of curiosity, how is this used (and in what contexts)? >> >> I'm wondering the implications of new fields appearing when new >> >> documents are added. Will this mess up the JDBC driver? >> >> >> >> > When testing lately with supporting "select *" type semantics, it >> would >> >> > be >> >> > nice to be able to limit to only DocValues fields. >> >> >> >> I'm not sure we should be segregating stored fields this way (by >> >> whether they are column/docValues or not). >> >> By default, all of our non-text fields already have docvalues enabled. >> >> If someone wants to retrieve or operate on row-stored text fields, it >> >> seems like they should be able to do so via the streaming API (or >> >> SQL). >> >> >> >> I guess we could also go the other direction and *only* support >> >> docValues (i.e. scrap row-stored fields). But that seems a little >> >> more extreme, and I'm also not sure if binary docValues would work as >> >> well or could hold text fields of the same size as row-stored fields >> >> can. >> >> >> >> -Yonik >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
