[
https://issues.apache.org/jira/browse/IGNITE-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329234#comment-16329234
]
Oleg Ignatenko edited comment on IGNITE-7456 at 1/18/18 12:06 PM:
------------------------------------------------------------------
As of now it looks good to go to master and to 2.4 branch: code changes are OK,
unit tests pass and examples run correctly. One thing worth doing prior to
merge is to get rid of double-whitespace in the text of the message in
MLPGroupTrainerExample: {{">>> Distributed multilayer perceptron example
started."}} (it's between words Distributed and multilayer).
Another interesting thing I noticed when I tried MLP group example with number
of steps larger that set in code (100 instead of 20): there were confusing
exceptions in example log. Note I only managed to get it on my machine; when
Artem tried it on his machine example run without exceptions. Because of that I
attached my execution log here:
[^IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log]. This issue is out of
scope of this ticket since it was with settings that aren't there but after
this change is merged to masted we better open a separate ticket to investigate
what could go wrong in my trial change.
*Update* as of now both issues mentioned above have been fixed.
was (Author: oignatenko):
As of now it looks good to go to master and to 2.4 branch: code changes are OK,
unit tests pass and examples run correctly. One thing worth doing prior to
merge is to get rid of double-whitespace in the text of the message in
MLPGroupTrainerExample: {{">>> Distributed multilayer perceptron example
started."}} (it's between words Distributed and multilayer).
Another interesting thing I noticed when I tried MLP group example with number
of steps larger that set in code (100 instead of 20): there were confusing
exceptions in example log. Note I only managed to get it on my machine; when
Artem tried it on his machine example run without exceptions. Because of that I
attached my execution log here:
[^IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log]. This issue is out of
scope of this ticket since it was with settings that aren't there but after
this change is merged to masted we better open a separate ticket to investigate
what could go wrong in my trial change.
> Fix wrong batch logic in distributed MLP training.
> --------------------------------------------------
>
> Key: IGNITE-7456
> URL: https://issues.apache.org/jira/browse/IGNITE-7456
> Project: Ignite
> Issue Type: Bug
> Components: ml
> Affects Versions: 2.4
> Reporter: Artem Malykh
> Assignee: Artem Malykh
> Priority: Major
> Fix For: 2.4
>
> Attachments: IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log
>
>
> Batch for training is created outside of training loop, therefore in each
> local step we work with the same batch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)