[
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001496#comment-14001496
]
Shalin Shekhar Mangar commented on SOLR-5681:
---------------------------------------------
Thanks Anshum.
# bq. The new createCollection method in CollectionAdminRequest is not
required. In fact we should clean up the existing methods which hard code
“implicit” router. I opened SOLR-6073 for it.
This patch still has the new createCollection method in CollectionAdminRequest.
Please remove that.
# bq. Another patch, integrates the patch for SOLR-6075. Will remove this
before committing once that goes into trunk.
Okay.
# Let's remove this DEBUG instance. I see this in a lot of other tests but I
cannot find where this instance is set to true. It's a bug in the other tests
too and we should fix them. I opened SOLR-6090
{code}
private static final boolean DEBUG = false;
{code}
# Should we rename processedZkTasks to runningZkTasks? They are 'processed' by
the main OCP thread but they are still 'running' so it may give a wrong
impression to someone reading the code.
# Let's document the purpose of each of the sets/maps we've introduced such as
completedTasks, processedZkTasks, runningTasks, collectionWip as a code comment.
# I think we should use use the return value of
collectionWip.add(collectionName) as a fail-safe and throw an exception if it
ever returns false.
# The OCP.Runner must call either markTaskComplete or resetTaskWithException
upon exit otherwise we'll have items in queue which will never be processed and
we'll never know why. It is not enough to call resetTaskWithException upon a
KeeperException or InterruptedException only.
# Similar to above, we should have debug level logging on items in our various
data structures before cleanUpWorkQueue, after cleanUpWorkQueue, before the
peekTopN call and the items returned by the peekTopN method. Also we should log
the item skipped by 'checkExclusivity' in debug level. Without this logging,
it'd be almost impossible to debug problems in production.
# If the maxParallelThreads is supposed to be a constant then it should renamed
accordingly as MAX_PARALLEL_THREADS
Let's make it a constant.
# We can improve MultiThreadedOCPTest.testTaskExclusivity by sending a shard
split for shard1_0 as the third collection action.
# There are still formatting problems in Overseer.Stats.success, error, time
methods.
# ZkStateReader has a new MAX_COLL_PROCESSOR_THREADS instance variable which is
never used.
> Make the OverseerCollectionProcessor multi-threaded
> ---------------------------------------------------
>
> Key: SOLR-5681
> URL: https://issues.apache.org/jira/browse/SOLR-5681
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Anshum Gupta
> Assignee: Anshum Gupta
> Attachments: SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
> SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
> SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch, SOLR-5681-2.patch,
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch,
> SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch
>
>
> Right now, the OverseerCollectionProcessor is single threaded i.e submitting
> anything long running would have it block processing of other mutually
> exclusive tasks.
> When OCP tasks become optionally async (SOLR-5477), it'd be good to have
> truly non-blocking behavior by multi-threading the OCP itself.
> For example, a ShardSplit call on Collection1 would block the thread and
> thereby, not processing a create collection task (which would stay queued in
> zk) though both the tasks are mutually exclusive.
> Here are a few of the challenges:
> * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An
> easy way to handle that is to only let 1 task per collection run at a time.
> * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue.
> The task from the workQueue is only removed on completion so that in case of
> a failure, the new Overseer can re-consume the same task and retry. A queue
> is not the right data structure in the first place to look ahead i.e. get the
> 2nd task from the queue when the 1st one is in process. Also, deleting tasks
> which are not at the head of a queue is not really an 'intuitive' thing.
> Proposed solutions for task management:
> * Task funnel and peekAfter(): The parent thread is responsible for getting
> and passing the request to a new thread (or one from the pool). The parent
> method uses a peekAfter(last element) instead of a peek(). The peekAfter
> returns the task after the 'last element'. Maintain this request information
> and use it for deleting/cleaning up the workQueue.
> * Another (almost duplicate) queue: While offering tasks to workQueue, also
> offer them to a new queue (call it volatileWorkQueue?). The difference is, as
> soon as a task from this is picked up for processing by the thread, it's
> removed from the queue. At the end, the cleanup is done from the workQueue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]