Re: Combine Response Writers and cursorMark

James Baster Thu, 12 Jun 2025 07:12:03 -0700

If we are talking about adding new things to achieve this, would it be
possible to add the nextCursorMark data as a HTTP response header? This is
good because there is no change to the CSV content which means it is
backwards compatible.


> It should be easy, ha?

:-) It should be possible but if there is an already well tested CSV output
already in Solr I'd like to use it.

Thanks,
James


On Tue, 10 Jun 2025 at 11:48, Mikhail Khludnev <m...@apache.org> wrote:

> It seems cursorMark is not supported in CSV format, and it could be
> developed after discussing a particular format.
> Another approach is to develop a result transformer, which prints cursor
> mark at every CSV row as a field value, I'm not 100% sure it may work.
>
> Btw, couldn't you just export docs to json lines via
>
> https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html#exporting-documents-to-a-file
>
> and then transform it to csv.
> It should be easy, ha?
>
>
> On Tue, Jun 10, 2025 at 12:01 PM James Baster <
> james.bas...@opendataservices.coop> wrote:
>
> > Hello Rahul,
> >
> > If I do that all I get back is the CSV. There is no "nextCursorMark" data
> > available. If I want to get the "nextCursorMark" data I seem to have to
> use
> > JSON output. This makes it impossible to combine the 2 features and get
> > more than 1 page of information.
> >
> > (I did think of a slightly better workaround right after posting, but am
> > curious if there is any way to combine these 2 features I've just
> missed?)
> >
> > Thanks,
> > James
> >
> >
> >
> > On Wed, 4 Jun 2025 at 17:41, Rahul Goswami <rahul196...@gmail.com>
> wrote:
> >
> > > Can you please explain why the 2 calls? Are you not able to get the
> > result
> > > the first time with wt=csv and cursorMark=* ?
> > >
> > > Rahul
> > >
> > >
> > > On Wed, Jun 4, 2025 at 10:45 AM James Baster <
> > > james.bas...@opendataservices.coop> wrote:
> > >
> > > > I know that when paging through a big set of results, using
> cursorMark
> > is
> > > > better than using start/rows pagination because cursorMark works
> better
> > > > when data may be inserted/updated/deleted during pagination and it
> can
> > > have
> > > > better performance.
> > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/pagination-of-results.html
> > > >
> > > > I know that there are Response Writers, so that if I want to get my
> > > results
> > > > in CSV I can, just by changing the wt parameter.
> > > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html
> > > >
> > > > So my question is, what if I want to combine them? Get a bunch of
> CSV's
> > > > nicely paginated with cursorMark?
> > > >
> > > > I can't see any options to do this - are there any?
> > > >
> > > > Are there any good workarounds?
> > > >
> > > > I could just page with start/row and accept the problems with that.
> > > However
> > > > if a row is inserted/deleted/moved above my current position, my data
> > > will
> > > > shift by 1 and that's not great.
> > > >
> > > > I could use cursorMark with 2 queries per page, like:
> > > >
> > > > * set cursorMark to last known cursorMark or "*" if it's the start.
> > > > * call API once with JSON response writer. Note the value of
> > > > nextCursorMark.
> > > > * call API a second time with CSV response writer. Save my CSV result
> > > > somewhere.
> > > > * maybe pause a second to avoid rate limiting.
> > > > * If nextCursorMark is different from last cursorMark there are more
> > > > results so loop over again.
> > > >
> > > > With this system, if a row is inserted/deleted/moved above my current
> > > > position, my data will not shift - great. However if a row is
> > > > inserted/deleted/moved in my current page between the 2 queries, I
> may
> > > miss
> > > > a row or double count a row.
> > > >
> > > > Any better options?
> > > >
> > > > Thank you in advance,
> > > > James
> > > >
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Combine Response Writers and cursorMark

Reply via email to