Re: wildcards for /export

Joel Bernstein Thu, 17 Nov 2016 19:01:35 -0800

One way to adapt Solrj would be to keep it's current memory structure for
facets etc.. and then have it return a TupleStream for documents.


Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Nov 17, 2016 at 9:51 PM, Joel Bernstein <[email protected]> wrote:

> It's possible that we could find a design where /select could behave like
> /export. I think Noble's design of treating a Stream as an iterator is
> promising. We could change all document result sets to iterators and hide
> the implementation of how the docs are materialized. This would also impact
> how output from other search components would be handled. Since result sets
> aren't limited to top N, all summarized data, such as facets would need to
> come before the documents. Then Solrj would need to be able to read the
> summarized data into memory, and stream the documents. It's a nice design,
> but quite a bit of work.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Nov 17, 2016 at 9:26 PM, Yonik Seeley <[email protected]> wrote:
>
>> On Thu, Nov 17, 2016 at 9:16 PM, Joel Bernstein <[email protected]>
>> wrote:
>> > There were two issues that make the regular /select hander problematic
>> for
>> > large result sets:
>> >
>> > 1) Use of stored fields, which require lots of disk access. I believe
>> this
>> > has been resolved now that the field list can be pulled from the
>> docValues.
>> >
>> > 2) The /select handler sorts by loading the top N docs into a priority
>> > queue.
>>
>> That feels like it could be optional though.  PQ makes sense for small
>> top-N that will go in the cache, but makes less sense when you want
>> all documents back.
>>
>> Look at it from the other perspective: if one is retrieving all
>> documents that match a query (and lets assume that the number of
>> matches is large), is /export ever less efficient in that case?  If
>> /export is always better in that scenario, that sounds like an
>> optimization, not a tradeoff or different design goal, and /select
>> should always be using the superior algorithm/mechanism for that case.
>>
>> -Yonik
>>
>>
>> > This approach becomes untenable at a certain point. The export
>> > handler, iterates over a bitset of collected docs in multiple passes.
>> This
>> > keeps constant performance as the result set grows. This is harder to
>> make
>> > work without avoiding the current select logic.
>> >
>> > I'm not in full agreement that /select and /export need to come
>> together.
>> > They really do have different design goals. /select tries to be very
>> > efficient and fast to support high QPS. /export tries to maintain
>> constant
>> > memory use and performance as the result set size increases. Trying to
>> find
>> > a way to accomplish both may just end up comprising the design so it
>> doesn't
>> > either use case.
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Thu, Nov 17, 2016 at 9:05 PM, Yonik Seeley <[email protected]>
>> wrote:
>> >>
>> >> On Thu, Nov 17, 2016 at 6:54 PM, Kevin Risden <
>> [email protected]>
>> >> wrote:
>> >> > For reference, the SQL/JDBC piece needed ability to specify wildcard
>> and
>> >> > figure out the "schema" of the collection including defined dynamic
>> >> > fields.
>> >>
>> >> Out of curiosity, how is this used (and in what contexts)?
>> >> I'm wondering the implications of new fields appearing when new
>> >> documents are added.  Will this mess up the JDBC driver?
>> >>
>> >> > When testing lately with supporting "select *" type semantics, it
>> would
>> >> > be
>> >> > nice to be able to limit to only DocValues fields.
>> >>
>> >> I'm not sure we should be segregating stored fields this way (by
>> >> whether they are column/docValues or not).
>> >> By default, all of our non-text fields already have docvalues enabled.
>> >> If someone wants to retrieve or operate on row-stored text fields, it
>> >> seems like they should be able to do so via the streaming API (or
>> >> SQL).
>> >>
>> >> I guess we could also go the other direction and *only* support
>> >> docValues (i.e. scrap row-stored fields).  But that seems a little
>> >> more extreme, and I'm also not sure if binary docValues would work as
>> >> well or could hold text fields of the same size as row-stored fields
>> >> can.
>> >>
>> >> -Yonik
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: wildcards for /export

Reply via email to