[ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998679#comment-13998679
 ] 

Shalin Shekhar Mangar commented on SOLR-5681:
---------------------------------------------

Thanks Anshum. Comments on the your last patch:

# The processedZKTasks, runningTasks and collectionWip are wrapped with 
Collections.synchronizedSet and yet they are again synchronised before usage. 
One of the two synchronisation should be removed.
# MultiThreadedOverseerCollectionProcessor.updateStats has a todo about 
removing the synchronisation which is correct. That synchronisation is not 
required here.
# MultiThreadedOverseerCollectionProcessor is just a Runnable and not really 
multi-threaded by itself. We should rename it something more suitable to e.g. 
CollectionActionRunner or OverseerCollectionProcessorThread or maybe just 
Runner.
# Any exception thrown during processMessage inside 
MultiThreadedOverseerCollectionProcessor is never logged. In fact if an 
exception is thrown by processMessage, the “response” will be null and it will 
cause NPE at head.setBytes(SolrResponse.serializable(response)); We should log 
such exceptions at the very minimum.
# There is a behaviour change between the earlier OCP and the multi-threaded 
one. If a processMessage results in an exception, the previous OCP will not 
remove the queue entry and continue to retry the task but the multi threaded 
OCP will mark the task as complete and remove it from the queue.
# The stats collection in MultiThreadedOverseerCollectionProcessor.run is not 
correct, the right way to time the operation is to use timerContext.stop() in a 
finally block right after processMessage
# There’s an empty catch (KeeperException e) block in 
MultiThreadedOverseerCollectionProcessor.run()
# Wrong code formatting in OCP.close()
# OCP.close() should use a shutDown(), awaitTermination(), shutdownNow() call 
pattern. The shutdown method by itself will not interrupt tasks in progress.
# There are still unrelated changes in OCP.prioritizeOverseerNodes

I'l still reviewing the changes.

> Make the OverseerCollectionProcessor multi-threaded
> ---------------------------------------------------
>
>                 Key: SOLR-5681
>                 URL: https://issues.apache.org/jira/browse/SOLR-5681
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Anshum Gupta
>            Assignee: Anshum Gupta
>         Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, 
> SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, 
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch
>
>
> Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
> anything long running would have it block processing of other mutually 
> exclusive tasks.
> When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
> truly non-blocking behavior by multi-threading the OCP itself.
> For example, a ShardSplit call on Collection1 would block the thread and 
> thereby, not processing a create collection task (which would stay queued in 
> zk) though both the tasks are mutually exclusive.
> Here are a few of the challenges:
> * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
> easy way to handle that is to only let 1 task per collection run at a time.
> * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
> The task from the workQueue is only removed on completion so that in case of 
> a failure, the new Overseer can re-consume the same task and retry. A queue 
> is not the right data structure in the first place to look ahead i.e. get the 
> 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
> which are not at the head of a queue is not really an 'intuitive' thing.
> Proposed solutions for task management:
> * Task funnel and peekAfter(): The parent thread is responsible for getting 
> and passing the request to a new thread (or one from the pool). The parent 
> method uses a peekAfter(last element) instead of a peek(). The peekAfter 
> returns the task after the 'last element'. Maintain this request information 
> and use it for deleting/cleaning up the workQueue.
> * Another (almost duplicate) queue: While offering tasks to workQueue, also 
> offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
> soon as a task from this is picked up for processing by the thread, it's 
> removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to