[
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863798#comment-13863798
]
Hoss Man commented on SOLR-5477:
--------------------------------
A few small suggestions from someone who hasn't through much of this but has
done similar async setups in other systems in another lifetime...
1) on where the (core task) queues should live...
bq. I'm still debating between having even the CoreAdmin to use zk (which means
it'd only work in SolrCloud mode) or just have a local map of running taks.
I think it would be wise to keep them in ZK -- if for no other reason then
because the primary usecase you expect is for the async core calls to be made
by the async overseer calls; and by keeping the async core queues in zk, the
overseer can watch those queues directly for "completed" instead of needing ot
wake up, poll every replica, go back to sleep.
However, a secondary concern (i think) is what should happen if/when a node
gets rebooted -- if the core admin tasks queues are in RAM then you could
easily get in a situation where the overseer asks 10 replicas to do something,
replicaA succeeds or fails quickly and then reboots, the overseer checks back
once all replicas are done and finds that replicaA can't say one way or another
whether it succeeded or failed -- it's queues are totally empty.
2) on generating the task/request IDs.
in my experience, when implementing an async callback API like this, it can be
handy to require the *client* to specify the magical id that you use to keep
track of things -- you just ensure it's unique among the existing async jobs
you know about (either in the queue, or in the recently completed/failed
queues). Sometimes single threaded (or centrally manged) client apps can
generate a unique id easier then your distributed system, and/or they may
already have a one-to-one mapping between some id they've already got and the
task they are asking you to do, and re-using that id makes the client's life
easier for debuging/audit-logs.
in the case of async collection commands -> async core commands, it would also
mean the overseer could reuse whatever id the client passed in for the
collection commands when talking to each of the replicas.
> Async execution of OverseerCollectionProcessor tasks
> ----------------------------------------------------
>
> Key: SOLR-5477
> URL: https://issues.apache.org/jira/browse/SOLR-5477
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Anshum Gupta
> Attachments: SOLR-5477-CoreAdminStatus.patch
>
>
> Typical collection admin commands are long running and it is very common to
> have the requests get timed out. It is more of a problem if the cluster is
> very large.Add an option to run these commands asynchronously
> add an extra param async=true for all collection commands
> the task is written to ZK and the caller is returned a task id.
> as separate collection admin command will be added to poll the status of the
> task
> command=status&id=7657668909
> if id is not passed all running async tasks should be listed
> A separate queue is created to store in-process tasks . After the tasks are
> completed the queue entry is removed. OverSeerColectionProcessor will perform
> these tasks in multiple threads
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]