Re: GOAWAY signal

Gus Heck Mon, 19 May 2025 09:28:11 -0700

According to rfc 9113 GOAWAY
<https://datatracker.ietf.org/doc/html/rfc9113#name-goaway> just means that
the server wants to close the http connection. Solr doesn't write its own
HTTP handling code, and I expect that the libraries we use (the JDK based
on the class you say you are using) are following the spec, so there should
be an associated error code
<https://datatracker.ietf.org/doc/html/rfc9113#NO_ERROR> (which includes a
code for NO_ERROR). I haven't looked in detail at how we use those
libraries, but looking at the error code will hopefully give you more
information about why the connection is being closed. One code that sounds
like it *might* relate to your description of the conditions under which
you see this is


ENHANCE_YOUR_CALM (0x0b):
The endpoint detected that its peer is exhibiting a behavior that might be
generating excessive load

If our classes (HttpJdkSolrClient) are not reporting, misrepresenting, or
otherwise obfuscating the error code, then that might need to be addressed.

With respect to the question of "will things get added?" there's two points
to consider: As per the spec, the GOAWAY SHOULD specify what the last
processed stream was, so any higher numbered streams should be considered
not sent. Within the received streams, if Solr returned a 200 OK for a
request, then that's Solr's commitment to you that the items have been
added to a transaction log and will make it to the index. If you get a
response other than 200 OK the documents should not be added, and if you
don't get a response before the connection closes then there is no way to
know. However, sending the same document (with the same ID) twice is safe
in basic use cases  since it will merely cause a delete/re-add. Of course,
solr is highly customizable so there could be some impact if you have
installed custom classes that do other persistent work, are getting fancy
with UpdateRequestProcessors, or are leaning on the version field for
business logic etc.

Assuming no special use cases, the usual pseudocode logic is "If
(got200OK(request)) { successFulIndexing(docs) } else {
scheduleResend(docs) }"

A well designed indexing infrastructure will need to be able to initiate
new connections as needed, and retain data to be indexed until 200 OK
response code is received from Solr. Designs that need to receive large
bursts either need large available memory, or more often, they have an
intermediate persistence and/or queue. Maintaining a Solr cluster that can
handle the indexing burst may be MUCH more expensive than one that can
handle the average throughput. This depends on index latency requirements
of course. if the business case makes significant money by minimizing index
latency, that might justify a large Solr Cluster that can accept all bursts
as fast as possible. With enough money amazing things are possible. I have
seen 100 node clusters accept > 1 million small documents per second. It
wasn't cheap, but it was necessary to build an index with 450 Billion
documents in less than a week. Just be sure your business has a real need
for it first.

-Gus

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Re: GOAWAY signal

Reply via email to