[jira] [Comment Edited] (IGNITE-7456) Fix wrong batch logic in distributed MLP training.

Oleg Ignatenko (JIRA) Thu, 18 Jan 2018 04:07:34 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329234#comment-16329234
 ]


Oleg Ignatenko edited comment on IGNITE-7456 at 1/18/18 12:06 PM:
------------------------------------------------------------------

As of now it looks good to go to master and to 2.4 branch: code changes are OK, 
unit tests pass and examples run correctly. One thing worth doing prior to 
merge is to get rid of double-whitespace in the text of the message in 
MLPGroupTrainerExample: {{">>> Distributed  multilayer perceptron example 
started."}} (it's between words Distributed and multilayer).

Another interesting thing I noticed when I tried MLP group example with number 
of steps larger that set in code (100 instead of 20): there were confusing 
exceptions in example log. Note I only managed to get it on my machine; when 
Artem tried it on his machine example run without exceptions. Because of that I 
attached my execution log here: 
[^IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log]. This issue is out of 
scope of this ticket since it was with settings that aren't there but after 
this change is merged to masted we better open a separate ticket to investigate 
what could go wrong in my trial change.

*Update* as of now both issues mentioned above have been fixed.


was (Author: oignatenko):
As of now it looks good to go to master and to 2.4 branch: code changes are OK, 
unit tests pass and examples run correctly. One thing worth doing prior to 
merge is to get rid of double-whitespace in the text of the message in 
MLPGroupTrainerExample: {{">>> Distributed  multilayer perceptron example 
started."}} (it's between words Distributed and multilayer).

 

Another interesting thing I noticed when I tried MLP group example with number 
of steps larger that set in code (100 instead of 20): there were confusing 
exceptions in example log. Note I only managed to get it on my machine; when 
Artem tried it on his machine example run without exceptions. Because of that I 
attached my execution log here: 
[^IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log]. This issue is out of 
scope of this ticket since it was with settings that aren't there but after 
this change is merged to masted we better open a separate ticket to investigate 
what could go wrong in my trial change.

> Fix wrong batch logic in distributed MLP training.
> --------------------------------------------------
>
>                 Key: IGNITE-7456
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7456
>             Project: Ignite
>          Issue Type: Bug
>          Components: ml
>    Affects Versions: 2.4
>            Reporter: Artem Malykh
>            Assignee: Artem Malykh
>            Priority: Major
>             Fix For: 2.4
>
>         Attachments: IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log
>
>
> Batch for training is created outside of training loop, therefore in each 
> local step we work with the same batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (IGNITE-7456) Fix wrong batch logic in distributed MLP training.

Reply via email to