On Thu, Nov 17, 2016 at 9:16 PM, Joel Bernstein <[email protected]> wrote: > There were two issues that make the regular /select hander problematic for > large result sets: > > 1) Use of stored fields, which require lots of disk access. I believe this > has been resolved now that the field list can be pulled from the docValues. > > 2) The /select handler sorts by loading the top N docs into a priority > queue.
That feels like it could be optional though. PQ makes sense for small top-N that will go in the cache, but makes less sense when you want all documents back. Look at it from the other perspective: if one is retrieving all documents that match a query (and lets assume that the number of matches is large), is /export ever less efficient in that case? If /export is always better in that scenario, that sounds like an optimization, not a tradeoff or different design goal, and /select should always be using the superior algorithm/mechanism for that case. -Yonik > This approach becomes untenable at a certain point. The export > handler, iterates over a bitset of collected docs in multiple passes. This > keeps constant performance as the result set grows. This is harder to make > work without avoiding the current select logic. > > I'm not in full agreement that /select and /export need to come together. > They really do have different design goals. /select tries to be very > efficient and fast to support high QPS. /export tries to maintain constant > memory use and performance as the result set size increases. Trying to find > a way to accomplish both may just end up comprising the design so it doesn't > either use case. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]> wrote: >> >> On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden <[email protected]> >> wrote: >> > For reference, the SQL/JDBC piece needed ability to specify wildcard and >> > figure out the "schema" of the collection including defined dynamic >> > fields. >> >> Out of curiosity, how is this used (and in what contexts)? >> I'm wondering the implications of new fields appearing when new >> documents are added. Will this mess up the JDBC driver? >> >> > When testing lately with supporting "select *" type semantics, it would >> > be >> > nice to be able to limit to only DocValues fields. >> >> I'm not sure we should be segregating stored fields this way (by >> whether they are column/docValues or not). >> By default, all of our non-text fields already have docvalues enabled. >> If someone wants to retrieve or operate on row-stored text fields, it >> seems like they should be able to do so via the streaming API (or >> SQL). >> >> I guess we could also go the other direction and *only* support >> docValues (i.e. scrap row-stored fields). But that seems a little >> more extreme, and I'm also not sure if binary docValues would work as >> well or could hold text fields of the same size as row-stored fields >> can. >> >> -Yonik >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
