Re: wildcards for /export

Yonik Seeley Thu, 17 Nov 2016 18:26:48 -0800

On Thu, Nov 17, 2016 at 9:16 PM, Joel Bernstein <[email protected]> wrote:
> There were two issues that make the regular /select hander problematic for
> large result sets:
>
> 1) Use of stored fields, which require lots of disk access. I believe this
> has been resolved now that the field list can be pulled from the docValues.
>
> 2) The /select handler sorts by loading the top N docs into a priority
> queue.


That feels like it could be optional though.  PQ makes sense for small
top-N that will go in the cache, but makes less sense when you want
all documents back.

Look at it from the other perspective: if one is retrieving all
documents that match a query (and lets assume that the number of
matches is large), is /export ever less efficient in that case?  If
/export is always better in that scenario, that sounds like an
optimization, not a tradeoff or different design goal, and /select
should always be using the superior algorithm/mechanism for that case.

-Yonik


> This approach becomes untenable at a certain point. The export
> handler, iterates over a bitset of collected docs in multiple passes. This
> keeps constant performance as the result set grows. This is harder to make
> work without avoiding the current select logic.
>
> I'm not in full agreement that /select and /export need to come together.
> They really do have different design goals. /select tries to be very
> efficient and fast to support high QPS. /export tries to maintain constant
> memory use and performance as the result set size increases. Trying to find
> a way to accomplish both may just end up comprising the design so it doesn't
> either use case.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]> wrote:
>>
>> On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden <[email protected]>
>> wrote:
>> > For reference, the SQL/JDBC piece needed ability to specify wildcard and
>> > figure out the "schema" of the collection including defined dynamic
>> > fields.
>>
>> Out of curiosity, how is this used (and in what contexts)?
>> I'm wondering the implications of new fields appearing when new
>> documents are added.  Will this mess up the JDBC driver?
>>
>> > When testing lately with supporting "select *" type semantics, it would
>> > be
>> > nice to be able to limit to only DocValues fields.
>>
>> I'm not sure we should be segregating stored fields this way (by
>> whether they are column/docValues or not).
>> By default, all of our non-text fields already have docvalues enabled.
>> If someone wants to retrieve or operate on row-stored text fields, it
>> seems like they should be able to do so via the streaming API (or
>> SQL).
>>
>> I guess we could also go the other direction and *only* support
>> docValues (i.e. scrap row-stored fields).  But that seems a little
>> more extreme, and I'm also not sure if binary docValues would work as
>> well or could hold text fields of the same size as row-stored fields
>> can.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: wildcards for /export

Reply via email to