[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421135#comment-15421135 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri], I'm redoing the implementation and have some questions about it: - I will remove the damping functionality to avoid having the old sent values for each node, is that right? - I'm thinking on implementing a method inside the AffinityPropagation class to transform a similarity Matrix to a graph, is that right? Maybe this could be done in the Gelly library instead of AP class. - Each of the I nodes should have the similarities with the other points, is that feasible from a memory prospective? Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421317#comment-15421317 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri] No worries :) No is not, damping factor is there to avoid oscillation in some cases, about convergence it will actually make it converge slower but they recommend it. I will analyze to implement a solution where the two options are available. Like 0 damping will not create the array with the old values. Makes sense. Maybe I will need some help to improve what is done now to convert from the input graph to the binary AP graph. I will continue with it. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió edited comment on FLINK-1707 at 8/17/16 9:12 PM: - Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! was (Author: joseprupi): Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió edited comment on FLINK-1707 at 8/17/16 9:12 PM: - Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed *If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: -- The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor -- There is no damping factor applied to messages -- The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! was (Author: joseprupi): Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: ** The values hash map is no longer needed ** If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. ** If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió edited comment on FLINK-1707 at 8/17/16 9:12 PM: - Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: ** The values hash map is no longer needed ** If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. ** If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! was (Author: joseprupi): Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: - The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor - There is no damping factor applied to messages - The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió edited comment on FLINK-1707 at 8/17/16 9:13 PM: - Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: -- The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor -- There is no damping factor applied to messages -- The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! was (Author: joseprupi): Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed *If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: -- The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor -- There is no damping factor applied to messages -- The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3298) Streaming connector for ActiveMQ
[ https://issues.apache.org/jira/browse/FLINK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430445#comment-15430445 ] Jean-Baptiste Onofré commented on FLINK-3298: - By the way guys, why only ActiveMQ when we can do JMS generic connector ? FYI, it's what I did in Beam (JMSIO). > Streaming connector for ActiveMQ > > > Key: FLINK-3298 > URL: https://issues.apache.org/jira/browse/FLINK-3298 > Project: Flink > Issue Type: New Feature > Components: Streaming Connectors >Reporter: Mohit Sethi >Assignee: Ivan Mushketyk >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground
[ https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434446#comment-15434446 ] Ismaël Mejía commented on FLINK-4326: - Just to make some progress here, I see three options: 1. I reopen my PR and we let it like it was (it already works and makes the intended goal but has the weirdness of calling flink-daemon.sh start-foreground ...). 2. I reopen my PR but rename flink-daemon to flink-service to avoid the mix of daemon+foreground (the negative part of this is that it may break some scripts for people using flink-daemon directly). 3. We create the additional flink-console script with the issue of copy pasting a small chunk of code at the beginning and keeping both in the future (well we can create some sort of common bash code library, but remember bash is not the nicest language for reuse). I prefer option 1 because it reuses all the code and it is already done and tested. and I probably could do option 2 or 3 if most people agree on one of those, but I will still need help from your part for the new review + some testing (in particular for case 3). WDYT or do you have other suggestions ? > Flink start-up scripts should optionally start services on the foreground > - > > Key: FLINK-4326 > URL: https://issues.apache.org/jira/browse/FLINK-4326 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.3 >Reporter: Elias Levy > > This has previously been mentioned in the mailing list, but has not been > addressed. Flink start-up scripts start the job and task managers in the > background. This makes it difficult to integrate Flink with most processes > supervisory tools and init systems, including Docker. One can get around > this via hacking the scripts or manually starting the right classes via Java, > but it is a brittle solution. > In addition to starting the daemons in the foreground, the start up scripts > should use exec instead of running the commends, so as to avoid forks. Many > supervisory tools assume the PID of the process to be monitored is that of > the process it first executes, and fork chains make it difficult for the > supervisor to figure out what process to monitor. Specifically, > jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and > flink-daemon.sh should exec java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442548#comment-15442548 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri] I've pushed a new version of AP with following changes: https://github.com/joseprupi/flink/blob/vertexcentric/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java - I've changed the model to the vertex centric avoiding having the values in the the vertices - I've added the option of no damping. When a 0 factor of damping is passed to the constructor the convergence condition is no changes on the exemplars on a certain number of iterations. This number of iterations is the last parameter of the constructor. If a damping factor different to 0 is used it keep working as before, having to hold the old sent values in the vertex. To do: - I have not changed the initialization of the graph yet. I posted a question in dev thread with no much luck. Maybe I will implement an initialization with a similarity matrix for now and will see how I can do it using gelly functionality later - I will try to change where are the weight values to be in the edges instead of vertices (I've created a new version of the design document with the diagrams too). This way vertices will only have the old values in case the damping factor has to be used. > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442548#comment-15442548 ] Josep Rubió edited comment on FLINK-1707 at 8/28/16 1:47 AM: - Hi [~vkalavri] I've pushed a new version of AP with following changes: https://github.com/joseprupi/flink/blob/vertexcentric/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java - I've changed the model to the vertex centric avoiding having the values in the the vertices - I've added the option of no damping. When a 0 factor of damping is passed to the constructor the convergence condition is no changes on the exemplars on a certain number of iterations, avoiding to have the old values in vertices. This number of iterations is the last parameter of the constructor. If a damping factor different to 0 is used it keeps working as before, having to hold the old sent values in the vertex. To do: - I have not changed the initialization of the graph yet. I posted a question in dev thread with no much luck. Maybe I will implement an initialization with a similarity matrix for now and will see how I can do it using gelly functionality later - I will try to change where are the weight values to be in the edges instead of vertices (I've created a new version of the design document with the diagrams too). This way vertices will only have the old values in case the damping factor has to be used. was (Author: joseprupi): Hi [~vkalavri] I've pushed a new version of AP with following changes: https://github.com/joseprupi/flink/blob/vertexcentric/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java - I've changed the model to the vertex centric avoiding having the values in the the vertices - I've added the option of no damping. When a 0 factor of damping is passed to the constructor the convergence condition is no changes on the exemplars on a certain number of iterations. This number of iterations is the last parameter of the constructor. If a damping factor different to 0 is used it keep working as before, having to hold the old sent values in the vertex. To do: - I have not changed the initialization of the graph yet. I posted a question in dev thread with no much luck. Maybe I will implement an initialization with a similarity matrix for now and will see how I can do it using gelly functionality later - I will try to change where are the weight values to be in the edges instead of vertices (I've created a new version of the design document with the diagrams too). This way vertices will only have the old values in case the damping factor has to be used. > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15442548#comment-15442548 ] Josep Rubió edited comment on FLINK-1707 at 8/28/16 1:48 AM: - Hi [~vkalavri], I've pushed a new version of AP with following changes: https://github.com/joseprupi/flink/blob/vertexcentric/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java - I've changed the model to the vertex centric avoiding having the values in the the vertices - I've added the option of no damping. When a 0 factor of damping is passed to the constructor the convergence condition is no changes on the exemplars on a certain number of iterations, avoiding to have the old values in vertices. This number of iterations is the last parameter of the constructor. If a damping factor different to 0 is used it keeps working as before, having to hold the old sent values in the vertex. To do: - I have not changed the initialization of the graph yet. I posted a question in dev thread with no much luck. Maybe I will implement an initialization with a similarity matrix for now and will see how I can do it using gelly functionality later - I will try to change where are the weight values to be in the edges instead of vertices (I've created a new version of the design document with the diagrams too). This way vertices will only have the old values in case the damping factor has to be used. Thanks!! was (Author: joseprupi): Hi [~vkalavri] I've pushed a new version of AP with following changes: https://github.com/joseprupi/flink/blob/vertexcentric/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java - I've changed the model to the vertex centric avoiding having the values in the the vertices - I've added the option of no damping. When a 0 factor of damping is passed to the constructor the convergence condition is no changes on the exemplars on a certain number of iterations, avoiding to have the old values in vertices. This number of iterations is the last parameter of the constructor. If a damping factor different to 0 is used it keeps working as before, having to hold the old sent values in the vertex. To do: - I have not changed the initialization of the graph yet. I posted a question in dev thread with no much luck. Maybe I will implement an initialization with a similarity matrix for now and will see how I can do it using gelly functionality later - I will try to change where are the weight values to be in the edges instead of vertices (I've created a new version of the design document with the diagrams too). This way vertices will only have the old values in case the damping factor has to be used. > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15443415#comment-15443415 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri], Yeps, I'll do that. I thought you said you did not want it that way :( Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."
[ https://issues.apache.org/jira/browse/FLINK-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorenz Bühmann updated FLINK-2662: -- Attachment: FlinkBug.scala A minimal example that leads to the reported exception. While keeping it as minimal as possible, the whole logic behind the program got lost - so please don't think about it's meaning. > CompilerException: "Bug: Plan generation for Unions picked a ship strategy > between binary plan operators." > -- > > Key: FLINK-2662 > URL: https://issues.apache.org/jira/browse/FLINK-2662 > Project: Flink > Issue Type: Bug > Components: Optimizer >Affects Versions: 0.9.1, 0.10.0 >Reporter: Gabor Gevay >Priority: Critical > Fix For: 1.0.0 > > Attachments: FlinkBug.scala > > > I have a Flink program which throws the exception in the jira title. Full > text: > Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: > Plan generation for Unions picked a ship strategy between binary plan > operators. > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402) > at > org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202) > at > org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalEnvironment.java:63) > at malom.Solver.main(Solver.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > The execution plan: > http://compalg.inf.elte.hu/~ggevay/flink/plan_3_4_0_0_without_verif.txt > (I obtained this by commenting out the line that throws the exception) > The code is here: > https://github.com/ggevay/flink/tree/plan-generation-bug > The class to run is "Solver". It needs a command line argument, which is a > directory where it would write output. (On first run, it generates some > lookuptables for a few minutes, which are then placed to /tmp/movegen) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."
[ https://issues.apache.org/jira/browse/FLINK-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464487#comment-15464487 ] Lorenz Bühmann edited comment on FLINK-2662 at 9/5/16 8:53 AM: --- [~fhueske] I attached a file. It contains (hopefully) a minimal example that leads to the reported exception. While keeping it as minimal as possible, the whole logic behind the program got lost - so please don't think about it's meaning. was (Author: lorenzb): A minimal example that leads to the reported exception. While keeping it as minimal as possible, the whole logic behind the program got lost - so please don't think about it's meaning. > CompilerException: "Bug: Plan generation for Unions picked a ship strategy > between binary plan operators." > -- > > Key: FLINK-2662 > URL: https://issues.apache.org/jira/browse/FLINK-2662 > Project: Flink > Issue Type: Bug > Components: Optimizer >Affects Versions: 0.9.1, 0.10.0 >Reporter: Gabor Gevay >Priority: Critical > Fix For: 1.0.0 > > Attachments: FlinkBug.scala > > > I have a Flink program which throws the exception in the jira title. Full > text: > Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: > Plan generation for Unions picked a ship strategy between binary plan > operators. > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402) > at > org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202) > at > org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalEnvironment.java:63) > at malom.Solver.main(Solver.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > The execution plan: > http://compalg.inf.elte.hu/~ggevay/flink/plan_3_4_0_0_without_verif.txt > (I obtained this by commenting out the line that throws the exception) > The code is here: > https://github.com/ggevay/flink/tree/plan-generation-bug > The class to run is "Solver". It needs a command line argument, which is a > directory where it would write output. (On first run, it generates some > lookuptables for a few minutes, which are then placed to /tmp/movegen) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."
[ https://issues.apache.org/jira/browse/FLINK-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464487#comment-15464487 ] Lorenz Bühmann edited comment on FLINK-2662 at 9/5/16 8:55 AM: --- [~fhueske] I attached a file. It (hopefully) contains a minimal example that leads to the reported exception. While keeping it as minimal as possible, the whole logic behind the program got lost - so please don't think about it's meaning. Flink version used was 1.1.0 via Maven was (Author: lorenzb): [~fhueske] I attached a file. It contains (hopefully) a minimal example that leads to the reported exception. While keeping it as minimal as possible, the whole logic behind the program got lost - so please don't think about it's meaning. > CompilerException: "Bug: Plan generation for Unions picked a ship strategy > between binary plan operators." > -- > > Key: FLINK-2662 > URL: https://issues.apache.org/jira/browse/FLINK-2662 > Project: Flink > Issue Type: Bug > Components: Optimizer >Affects Versions: 0.9.1, 0.10.0 >Reporter: Gabor Gevay >Priority: Critical > Fix For: 1.0.0 > > Attachments: FlinkBug.scala > > > I have a Flink program which throws the exception in the jira title. Full > text: > Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: > Plan generation for Unions picked a ship strategy between binary plan > operators. > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402) > at > org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202) > at > org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalEnvironment.java:63) > at malom.Solver.main(Solver.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > The execution plan: > http://compalg.inf.elte.hu/~ggevay/flink/plan_3_4_0_0_without_verif.txt > (I obtained this by commenting out the line that throws the exception) > The code is here: > https://github.com/ggevay/flink/tree/plan-generation-bug > The class to run is "Solver". It needs a command line argument, which is a > directory where it would write output. (On first run, it generates some > lookuptables for a few minutes, which are then placed to /tmp/movegen) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15466242#comment-15466242 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri], I have updated the code if you want to take a look to it. I have added a member function to create the AP graph from an array. I'll try to update the document this week. https://github.com/joseprupi/flink/blob/master/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/AffinityPropagation.java Thanks!! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425389#comment-15425389 ] Josep Rubió edited comment on FLINK-1707 at 9/8/16 3:49 AM: Hi [~vkalavri], I changed the implementation, now: * The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and messages will be damped * If a damping factor of 0 is provided to the constructor the oldValues HashMap will not be created I will update the document Thanks! was (Author: joseprupi): Hi [~vkalavri], I changed the implementation now having these members for E vertices: private HashMap weights; private HashMap oldValues; private long exemplar; private int convergenceFactor; private int convergenceFactorCounter; where before it was: private HashMap weights; private HashMap values; private HashMap oldValues; private long exemplar; So: * The values hash map is no longer needed * If a damping factor different to 0 is provided to the AP constructor, the oldValues HashMap will be populated and the algorithm will work the same way it was before. * If a damping factor of 0 is provided to the constructor: -- The convergence condition changes to local modifications to the vertices that decide to be exemplars instead of no modifications in messages. This should remain the same for certain number of steps that are defined in the constructor -- There is no damping factor applied to messages -- The oldValues HashMap is not used Now I need to change the initialization of the graph, I will post the question to the dev group and see if someone can help me on doing it the best way. Once I finish it I will clean and comment the code, update the document and submit a new pull request. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."
[ https://issues.apache.org/jira/browse/FLINK-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473506#comment-15473506 ] Lorenz Bühmann commented on FLINK-2662: --- No problem, just let me know if you need something more [~fhueske] > CompilerException: "Bug: Plan generation for Unions picked a ship strategy > between binary plan operators." > -- > > Key: FLINK-2662 > URL: https://issues.apache.org/jira/browse/FLINK-2662 > Project: Flink > Issue Type: Bug > Components: Optimizer >Affects Versions: 0.9.1, 0.10.0 >Reporter: Gabor Gevay >Priority: Critical > Fix For: 1.0.0 > > Attachments: FlinkBug.scala > > > I have a Flink program which throws the exception in the jira title. Full > text: > Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: > Plan generation for Unions picked a ship strategy between binary plan > operators. > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402) > at > org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202) > at > org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalEnvironment.java:63) > at malom.Solver.main(Solver.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) > The execution plan: > http://compalg.inf.elte.hu/~ggevay/flink/plan_3_4_0_0_without_verif.txt > (I obtained this by commenting out the line that throws the exception) > The code is here: > https://github.com/ggevay/flink/tree/plan-generation-bug > The class to run is "Solver". It needs a command line argument, which is a > directory where it would write output. (On first run, it generates some > lookuptables for a few minutes, which are then placed to /tmp/movegen) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4613) Extend ALS to handle implicit feedback datasets
Gábor Hermann created FLINK-4613: Summary: Extend ALS to handle implicit feedback datasets Key: FLINK-4613 URL: https://issues.apache.org/jira/browse/FLINK-4613 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Gábor Hermann Assignee: Gábor Hermann The Alternating Least Squares implementation should be extended to handle _implicit feedback_ datasets. These datasets do not contain explicit ratings by users, they are rather built by collecting user behavior (e.g. user listened to artist X for Y minutes), and they require a slightly different optimization objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22]. We do not need to modify much in the original ALS algorithm. See [Spark ALS implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala], which could be a basis for this extension. Only the updating factor part is modified, and most of the changes are in the local parts of the algorithm (i.e. UDFs). In fact, the only modification that is not local, is precomputing a matrix product Y^T * Y and broadcasting it to all the nodes, which we can do with broadcast DataSets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488555#comment-15488555 ] Josep Rubió commented on FLINK-1707: Hi [~vkalavri], I'm not sure I understand what you mean. Will not the matrix need to be read anyway? Would providing a Dataset of Tuple3 (id1, id2, similarity) and create the vertices and edges through transformations work? The function should check no duplicates exist and fill non existing similarities with 0 value. > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4632) when yarn nodemanager lost, flink hung
刘喆 created FLINK-4632: - Summary: when yarn nodemanager lost, flink hung Key: FLINK-4632 URL: https://issues.apache.org/jira/browse/FLINK-4632 Project: Flink Issue Type: Bug Components: Distributed Coordination, Streaming Affects Versions: 1.1.2, 1.2.0 Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 Reporter: 刘喆 When run flink streaming on yarn, using kafka as source, it runs well when start. But after long run, for example 8 hours, dealing 60,000,000+ messages, it hung: no messages consumed, one taskmanager is CANCELING, the exception show: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: connection timeout at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: 连接超时 at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more after applyhttps://issues.apache.org/jira/browse/FLINK-4625 it show: java.lang.Exception: TaskManager was lost/killed: ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ 38.slave.adh (dataPort=45349) at org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:162) at org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) at org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:138) at org.apache.flink.runtime.instance.Instance.markDead(Instance.java:167) at org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:224) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$handleTaskManagerTerminated(JobManager.scala:1054) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:458) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505243#comment-15505243 ] 刘喆 commented on FLINK-4632: --- Yes. The job's status is canceling, then hung. Web page is ok, client process is ok, but kafka consumer canceled. I invoke 700 mapper, 698 canceled, 2 is canceling. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:162) > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533)
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505243#comment-15505243 ] 刘喆 edited comment on FLINK-4632 at 9/20/16 1:26 AM: Yes. The job's status is canceling, then hung. Web page is ok, client process is ok, but kafka consumer canceled. I invoke 700 mapper, 698 canceled, 2 are canceling. When logging on the yarn nodemanager, it is running. Some log related as below: 2016-09-11 22:21:17,061 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - State backend is set to heap memory (checkpoint to jobmanager) 2016-09-12 02:40:10,357 INFO org.apache.flink.yarn.YarnTaskManagerRunner - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. 2016-09-12 02:40:10,533 INFO org.apache.flink.runtime.blob.BlobCache - Shutting down BlobCache 2016-09-12 02:40:10,821 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /data1/yarn/usercache/spa/appcache/application_1471620986643_563465/flink-io-ec6db7ad-e263-49fe-84a6-3dce983dc033 2016-09-12 02:40:10,895 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /data2/yarn/usercache/spa/appcache/application_1471620986643_563465/flink-io-4efbde55-1f64-44d2-87b8-85ed686c73c9 On ResourceManager, some log as below: [r...@4.master.adh hadooplogs]# grep _1471620986643_563465_01_90 yarn-yarn-resourcemanager-4.master.adh.log.31 2016-09-12 02:40:07,988 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1471620986643_563465_01_90 Container Transitioned from RUNNING to KILLED 2016-09-12 02:40:07,988 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_1471620986643_563465_01_90 in state: KILLED event:KILL 2016-09-12 02:40:07,988 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=spa OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1471620986643_563465 CONTAINERID=container_1471620986643_563465_01_90 2016-09-12 02:40:07,988 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_1471620986643_563465_01_90 of capacity on host 105.slave.adh:39890, which currently has 2 containers, used and available, release resources=true 2016-09-12 02:40:07,988 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1471620986643_563465_01 released container container_1471620986643_563465_01_90 on node: host: 105.slave.adh:39890 #containers=2 available=45904 used=4096 with event: KILL was (Author: liuzhe): Yes. The job's status is canceling, then hung. Web page is ok, client process is ok, but kafka consumer canceled. I invoke 700 mapper, 698 canceled, 2 is canceling. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) >
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508441#comment-15508441 ] 刘喆 commented on FLINK-4632: --- The container is killed by two reasons: 1, yarn preemption 2, yarn nodemanager unhealthy, kill all java children I killed one taskmanager manually, sometimes reproduct this hung, but sometimes it restart successfully. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:162) > at > org.apache.flink.runtime.instance.SlotSharingG
[jira] [Created] (FLINK-4652) Don't pass credentials explicitly to AmazonClient - use credentials provider instead
Kristian Frøhlich Hansen created FLINK-4652: --- Summary: Don't pass credentials explicitly to AmazonClient - use credentials provider instead Key: FLINK-4652 URL: https://issues.apache.org/jira/browse/FLINK-4652 Project: Flink Issue Type: Bug Components: Kinesis Connector Affects Versions: 1.2.0 Reporter: Kristian Frøhlich Hansen Priority: Minor Fix For: 1.2.0 By using the credentials explicitly we are responsible for checking and refreshing credentials before they expire. If no refreshment is done we will encounter AmazonServiceException: 'The security token included in the request is expired'. To utilize automatic refreshment of credentials pass the AWSCredentialsProvider direclty to AmazonClient by removing the getCredentials() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4663) Flink JDBCOutputFormat logs wrong WARN message
[ https://issues.apache.org/jira/browse/FLINK-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi updated FLINK-4663: -- Assignee: Swapnil Chougule > Flink JDBCOutputFormat logs wrong WARN message > -- > > Key: FLINK-4663 > URL: https://issues.apache.org/jira/browse/FLINK-4663 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.1.1, 1.1.2 > Environment: Across Platform >Reporter: Swapnil Chougule >Assignee: Swapnil Chougule > Fix For: 1.1.3 > > > Flink JDBCOutputFormat logs wrong WARN message as > "Column SQL types array doesn't match arity of passed Row! Check the passed > array..." > even if there is no mismatch is SQL types array & arity of passed Row. > > (flink/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCOutputFormat.java) > It logs lot of unnecessary warning messages (one per row passed) in log files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516530#comment-15516530 ] 刘喆 commented on FLINK-4632: --- It is (2). I make a new idle hadoop yarn cluster, and use the source from github September 22, and when the application running well, so it's order must be : kill jvm( yarn peemption or unhealthy), cancel application. But when I run it on busy yarn cluster, the problem comes up again. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSl
[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903272#comment-15903272 ] Ismaël Mejía commented on FLINK-3026: - Great, taking a look right now, I will rebase my PR adding the fix I see there, can I push directly in the docker-flink repo ? > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task > Components: Build System, Docker >Reporter: Omer Katz >Assignee: Patrick Lucas > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906513#comment-15906513 ] Ismaël Mejía commented on FLINK-3026: - Ok, we are going progress fast, that’s great. I agree with you I am 100% convinced now that those should not live in the Apache repo, but I still want to guarantee that the official image does not differ radically from the reference one in the apache repo. My proposition more concretely is that we (the docker image maintainers) guarantee that the order of changes in the future come from the apache/flink repo towards the docker-flink repo, if you see my refactorings in the docker-flink repo go in this direction, they are mostly to make this job easier (but currently they come from the future with your unmerged changes and my non-alpine version). I agree with you, in the future these will be minor changes, even if you look at the history of the image, we have had changes like each release and some small fixes in between. So changing the template should be rare. > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task > Components: Build System, Docker >Reporter: Omer Katz >Assignee: Patrick Lucas > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906514#comment-15906514 ] Ismaël Mejía commented on FLINK-3026: - One thing we need to discuss is the inclusion of the start-foreground scripts (btw, sorry if I removed that entrypoint from the docker-flink repo), I think we must not change any file of the release, because this can become a maintainance burden in the future, I prefer to do the least amount of changes possible, and concretely for the start-foreground case I prefer to include this only if it is part of the official distribution, e.g. by backporting this to flink 1.2.1, or start supporting this once it is released in flink 1.3. > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task > Components: Build System, Docker >Reporter: Omer Katz >Assignee: Patrick Lucas > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906515#comment-15906515 ] Ismaël Mejía commented on FLINK-3026: - I saw the images you built, probably it is a better idea to enable automated builds for the repo so those are rebuilt once we push the changes in github, have you tried this? And, I have another proposition to make, maybe it is a better idea to create a github organization for this, and move the repo there, to end up with something like (docker-flink/docker-flink) instead of your personal account or the dataartisans one, that way we can share the maintainance, and I or other future maintainer can trigger the builds too from docker hub if he has the permissions. Currently I can’t trigger the builds of docker-flink because they only let your own account and the organizations you belong too. > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task > Components: Build System, Docker >Reporter: Omer Katz >Assignee: Patrick Lucas > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-6003) Add docker image based on the openjdk one (debian+alpine)
[ https://issues.apache.org/jira/browse/FLINK-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940298#comment-15940298 ] Ismaël Mejía commented on FLINK-6003: - Just for reference, the current work on this JIRA is pending because we are prioritizing for the moment the official docker image work ongoing here FLINK-3026. https://github.com/docker-flink/docker-flink/ We will probably backport changes here once they are stable in the other repo. > Add docker image based on the openjdk one (debian+alpine) > - > > Key: FLINK-6003 > URL: https://issues.apache.org/jira/browse/FLINK-6003 > Project: Flink > Issue Type: Improvement > Components: Build System, Docker >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > The base docker image 'java' is deprecated since the end of the last year. > This issue will also re-introduce a Dockerfile for the debian-based openjdk > one in addition to the current one based on alpine. > Additional refactorings included: > - Move default version to 1.2.0 > - Refactor Dockerfile for consistency reasons (between both versions) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-6203) DataSet Transformations
苏拓 created FLINK-6203: - Summary: DataSet Transformations Key: FLINK-6203 URL: https://issues.apache.org/jira/browse/FLINK-6203 Project: Flink Issue Type: Bug Components: DataSet API Affects Versions: 1.2.0 Reporter: 苏拓 Priority: Minor Fix For: 1.2.0 the example of GroupReduce on sorted groups can't remove duplicate Strings in a DataSet. need to add "prev=t" such as: val output = input.groupBy(0).sortGroup(1, Order.ASCENDING).reduceGroup { (in, out: Collector[(Int, String)]) => var prev: (Int, String) = null for (t <- in) { if (prev == null || prev != t) out.collect(t) prev=t } } -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-4900) Implement Docker image support
[ https://issues.apache.org/jira/browse/FLINK-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mischa Krüger reassigned FLINK-4900: Assignee: Mischa Krüger (was: Eron Wright ) > Implement Docker image support > -- > > Key: FLINK-4900 > URL: https://issues.apache.org/jira/browse/FLINK-4900 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: Eron Wright >Assignee: Mischa Krüger > > Support the use of a docker image, with both the unified containerizer and > the Docker containerizer. > Use a configuration setting to explicitly configure which image and > containerizer to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4900) Implement Docker image support
[ https://issues.apache.org/jira/browse/FLINK-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mischa Krüger updated FLINK-4900: - Labels: release (was: ) > Implement Docker image support > -- > > Key: FLINK-4900 > URL: https://issues.apache.org/jira/browse/FLINK-4900 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: Eron Wright >Assignee: Mischa Krüger > Labels: release > > Support the use of a docker image, with both the unified containerizer and > the Docker containerizer. > Use a configuration setting to explicitly configure which image and > containerizer to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4900) Implement Docker image support
[ https://issues.apache.org/jira/browse/FLINK-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mischa Krüger updated FLINK-4900: - Labels: release review (was: release) > Implement Docker image support > -- > > Key: FLINK-4900 > URL: https://issues.apache.org/jira/browse/FLINK-4900 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: Eron Wright >Assignee: Mischa Krüger > Labels: release, review > > Support the use of a docker image, with both the unified containerizer and > the Docker containerizer. > Use a configuration setting to explicitly configure which image and > containerizer to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4900) Implement Docker image support
[ https://issues.apache.org/jira/browse/FLINK-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mischa Krüger updated FLINK-4900: - Labels: release (was: release review) > Implement Docker image support > -- > > Key: FLINK-4900 > URL: https://issues.apache.org/jira/browse/FLINK-4900 > Project: Flink > Issue Type: Sub-task > Components: Cluster Management >Reporter: Eron Wright >Assignee: Mischa Krüger > Labels: release > > Support the use of a docker image, with both the unified containerizer and > the Docker containerizer. > Use a configuration setting to explicitly configure which image and > containerizer to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5109) Invalid Content-Encoding Header in REST API responses
Móger Tibor László created FLINK-5109: - Summary: Invalid Content-Encoding Header in REST API responses Key: FLINK-5109 URL: https://issues.apache.org/jira/browse/FLINK-5109 Project: Flink Issue Type: Bug Components: Web Client, Webfrontend Affects Versions: 1.2.0 Reporter: Móger Tibor László On REST API calls the Flink runtime responds with the header Content-Encoding, containing the value "utf-8". According to the HTTP/1.1 standard this header is invalid. ( https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 ) Possible acceptable values are: gzip, compress, deflate. Or it should be omitted. The invalid header may cause malfunction in projects building against Flink. The invalid header may be present in earlier versions aswell. Proposed solution: Remove lines from the project, where CONTENT_ENCODING header is set to "utf-8". (I could do this in a PR.) Possible solution but may need further knowledge and skills than mine: Introduce a content-encoding. Doing so may need some configuration beacuse then Flink would have to encode the responses properly (even paying attention to the request's Accept-Encoding headers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5109) Invalid Content-Encoding Header in REST API responses
[ https://issues.apache.org/jira/browse/FLINK-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Móger Tibor László updated FLINK-5109: -- Description: On REST API calls the Flink runtime responds with the header Content-Encoding, containing the value "utf-8". According to the HTTP/1.1 standard this header is invalid. ( https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 ) Possible acceptable values are: gzip, compress, deflate. Or it should be omitted. The invalid header may cause malfunction in projects building against Flink. The invalid header may be present in earlier versions aswell. Proposed solution: Remove lines from the project, where CONTENT_ENCODING header is set to "utf-8". (I could do this in a PR.) Possible solution but may need further knowledge and skills than mine: Introduce content-encoding. Doing so may need some configuration beacuse then Flink would have to encode the responses properly (even paying attention to the request's Accept-Encoding headers). was: On REST API calls the Flink runtime responds with the header Content-Encoding, containing the value "utf-8". According to the HTTP/1.1 standard this header is invalid. ( https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 ) Possible acceptable values are: gzip, compress, deflate. Or it should be omitted. The invalid header may cause malfunction in projects building against Flink. The invalid header may be present in earlier versions aswell. Proposed solution: Remove lines from the project, where CONTENT_ENCODING header is set to "utf-8". (I could do this in a PR.) Possible solution but may need further knowledge and skills than mine: Introduce a content-encoding. Doing so may need some configuration beacuse then Flink would have to encode the responses properly (even paying attention to the request's Accept-Encoding headers). > Invalid Content-Encoding Header in REST API responses > - > > Key: FLINK-5109 > URL: https://issues.apache.org/jira/browse/FLINK-5109 > Project: Flink > Issue Type: Bug > Components: Web Client, Webfrontend >Affects Versions: 1.2.0 >Reporter: Móger Tibor László > Labels: http-headers, rest_api > > On REST API calls the Flink runtime responds with the header > Content-Encoding, containing the value "utf-8". According to the HTTP/1.1 > standard this header is invalid. ( > https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 ) > Possible acceptable values are: gzip, compress, deflate. Or it should be > omitted. > The invalid header may cause malfunction in projects building against Flink. > The invalid header may be present in earlier versions aswell. > Proposed solution: Remove lines from the project, where CONTENT_ENCODING > header is set to "utf-8". (I could do this in a PR.) > Possible solution but may need further knowledge and skills than mine: > Introduce content-encoding. Doing so may need some configuration beacuse then > Flink would have to encode the responses properly (even paying attention to > the request's Accept-Encoding headers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5109) Invalid Content-Encoding Header in REST API responses
[ https://issues.apache.org/jira/browse/FLINK-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Móger Tibor László updated FLINK-5109: -- Affects Version/s: 1.1.0 1.1.1 1.1.2 1.1.3 > Invalid Content-Encoding Header in REST API responses > - > > Key: FLINK-5109 > URL: https://issues.apache.org/jira/browse/FLINK-5109 > Project: Flink > Issue Type: Bug > Components: Web Client, Webfrontend >Affects Versions: 1.1.0, 1.2.0, 1.1.1, 1.1.2, 1.1.3 >Reporter: Móger Tibor László > Labels: http-headers, rest_api > > On REST API calls the Flink runtime responds with the header > Content-Encoding, containing the value "utf-8". According to the HTTP/1.1 > standard this header is invalid. ( > https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 ) > Possible acceptable values are: gzip, compress, deflate. Or it should be > omitted. > The invalid header may cause malfunction in projects building against Flink. > The invalid header may be present in earlier versions aswell. > Proposed solution: Remove lines from the project, where CONTENT_ENCODING > header is set to "utf-8". (I could do this in a PR.) > Possible solution but may need further knowledge and skills than mine: > Introduce content-encoding. Doing so may need some configuration beacuse then > Flink would have to encode the responses properly (even paying attention to > the request's Accept-Encoding headers). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5156) Consolidate streaming FieldAccessor functionality
Márton Balassi created FLINK-5156: - Summary: Consolidate streaming FieldAccessor functionality Key: FLINK-5156 URL: https://issues.apache.org/jira/browse/FLINK-5156 Project: Flink Issue Type: Task Reporter: Márton Balassi The streaming FieldAccessors (keyedStream.keyBy(...)) have slightly different semantics compared to their batch counterparts. Currently the streaming ones allow selecting a field within an array (which might be dangerous as the array typeinfo does not contain the length of the array, thus leading to a potential index out of bounds) and accept not only "*", but also "0" to select a whole type. This functionality should be either removed or documented. The latter can be achieved by effectively reverting [1]. Note that said commit was squashed before merging. [1] https://github.com/mbalassi/flink/commit/237f07eb113508703c980b14587d66970e7f6251 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3702) DataStream API PojoFieldAccessor doesn't support nested POJOs
[ https://issues.apache.org/jira/browse/FLINK-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi resolved FLINK-3702. --- Resolution: Fixed Fix Version/s: 1.2.0 Fixed via 1f04542 and 870e219. > DataStream API PojoFieldAccessor doesn't support nested POJOs > - > > Key: FLINK-3702 > URL: https://issues.apache.org/jira/browse/FLINK-3702 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.0.0 >Reporter: Robert Metzger >Assignee: Gabor Gevay > Fix For: 1.2.0 > > > The {{PojoFieldAccessor}} (which is used by {{.sum(String)}} and similar > methods) doesn't support nested POJOs right now. > As part of FLINK-3697 I'll add a check for a nested POJO and fail with an > exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josep Rubió updated FLINK-1707: --- Description: This issue proposes adding the an implementation of the Affinity Propagation algorithm as a Gelly library method and a corresponding example. The algorithm is described in paper [1] and a description of a vertex-centric implementation can be found is [2]. [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf Design doc: https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing Example spreadsheet: https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing Graph: https://docs.google.com/drawings/d/1PC3S-6AEt2Gp_TGrSfiWzkTcL7vXhHSxvM6b9HglmtA/edit?usp=sharing was: This issue proposes adding the an implementation of the Affinity Propagation algorithm as a Gelly library method and a corresponding example. The algorithm is described in paper [1] and a description of a vertex-centric implementation can be found is [2]. [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf Design doc: https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing Example spreadsheet: https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing > Graph: > https://docs.google.com/drawings/d/1PC3S-6AEt2Gp_TGrSfiWzkTcL7vXhHSxvM6b9HglmtA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714684#comment-15714684 ] 刘喆 commented on FLINK-4632: --- It appears again. This time I get the logs. On JobManager: 2016-12-02 16:27:17,275 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (1/10) (ca18e99f339595bf987913f543ef3f54) switched from DEPLOYING to RUNNING 2016-12-02 16:27:17,292 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (5/10) (733ebe58c797aa6e6ed9e85004d6c201) switched from DEPLOYING to RUNNING 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (fa903c45717171026fae9997ec22b594) switched from RUNNING to FAILED 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Streaming Java API Skeleton (c7b96c59cb04fd2cd096b23f85cebdb4) switched from state RUNNING to FAILING. org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Connection timed out at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (1/10) (d99ba0f0a16c31e717c8044e92e0e8c4) switched from RUNNING to CANCELING 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (2/10) (294b46febec5e49a8a859d98ceca2a9e) switched from RUNNING to CANCELING on TaskManager: 2016-12-02 16:27:18,177 INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.10.10.203/10.10.10.203:2181, initiating session 2016-12-02 16:27:18,183 INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.10.10.203/10.10.10.203:2181, sess
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714684#comment-15714684 ] 刘喆 edited comment on FLINK-4632 at 12/2/16 10:10 AM: - It appears again. This time, there is no container lost. This time I get the logs. On JobManager: 2016-12-02 16:27:17,275 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (1/10) (ca18e99f339595bf987913f543ef3f54) switched from DEPLOYING to RUNNING 2016-12-02 16:27:17,292 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (5/10) (733ebe58c797aa6e6ed9e85004d6c201) switched from DEPLOYING to RUNNING 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (fa903c45717171026fae9997ec22b594) switched from RUNNING to FAILED 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Streaming Java API Skeleton (c7b96c59cb04fd2cd096b23f85cebdb4) switched from state RUNNING to FAILING. org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Connection timed out at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (1/10) (d99ba0f0a16c31e717c8044e92e0e8c4) switched from RUNNING to CANCELING 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (2/10) (294b46febec5e49a8a859d98ceca2a9e) switched from RUNNING to CANCELING on TaskManager: 2016-12-02 16:27:18,177 INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.10.10.203/10.10.10.203:2181, initiating session 2016-12-02 16:27:18,183 INFO org.apache.zookeeper.ClientCnxn
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714684#comment-15714684 ] 刘喆 edited comment on FLINK-4632 at 12/2/16 10:10 AM: - It appears again. This time, there is no container lost, and the job restarted. This time I get the logs. On JobManager: 2016-12-02 16:27:17,275 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (1/10) (ca18e99f339595bf987913f543ef3f54) switched from DEPLOYING to RUNNING 2016-12-02 16:27:17,292 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (5/10) (733ebe58c797aa6e6ed9e85004d6c201) switched from DEPLOYING to RUNNING 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (fa903c45717171026fae9997ec22b594) switched from RUNNING to FAILED 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Streaming Java API Skeleton (c7b96c59cb04fd2cd096b23f85cebdb4) switched from state RUNNING to FAILING. org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Connection timed out at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (1/10) (d99ba0f0a16c31e717c8044e92e0e8c4) switched from RUNNING to CANCELING 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (2/10) (294b46febec5e49a8a859d98ceca2a9e) switched from RUNNING to CANCELING on TaskManager: 2016-12-02 16:27:18,177 INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.10.10.203/10.10.10.203:2181, initiating session 2016-12-02 16:27:18,183 INFO org.apache.zookeeper.
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714684#comment-15714684 ] 刘喆 edited comment on FLINK-4632 at 12/2/16 10:11 AM: - This time, there is no container lost, and the job is not hung but restarted. This time I get the logs. On JobManager: 2016-12-02 16:27:17,275 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (1/10) (ca18e99f339595bf987913f543ef3f54) switched from DEPLOYING to RUNNING 2016-12-02 16:27:17,292 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (5/10) (733ebe58c797aa6e6ed9e85004d6c201) switched from DEPLOYING to RUNNING 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (fa903c45717171026fae9997ec22b594) switched from RUNNING to FAILED 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Streaming Java API Skeleton (c7b96c59cb04fd2cd096b23f85cebdb4) switched from state RUNNING to FAILING. org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Connection timed out at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (1/10) (d99ba0f0a16c31e717c8044e92e0e8c4) switched from RUNNING to CANCELING 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (2/10) (294b46febec5e49a8a859d98ceca2a9e) switched from RUNNING to CANCELING on TaskManager: 2016-12-02 16:27:18,177 INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.10.10.203/10.10.10.203:2181, initiating session 2016-12-02 16:27:18,183 INFO org.apache.zookeeper.
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714684#comment-15714684 ] 刘喆 edited comment on FLINK-4632 at 12/2/16 10:32 AM: - This time, there is no container lost, and the job is not hung but restarted. flink-1.2-SNAPSHOT commit : dbe70732434ff1396e1829080cd14f26b691489a scala 2.11 hadoop 2.6.0-cdh5.5.1 java jdk1.8.0_112 This time I get the logs. On JobManager: {noformat} 2016-12-02 16:27:17,275 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (1/10) (ca18e99f339595bf987913f543ef3f54) switched from DEPLOYING to RUNNING 2016-12-02 16:27:17,292 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (5/10) (733ebe58c797aa6e6ed9e85004d6c201) switched from DEPLOYING to RUNNING 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Sink: Unnamed (9/10) (fa903c45717171026fae9997ec22b594) switched from RUNNING to FAILED 2016-12-02 16:27:48,523 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Job Flink Streaming Java API Skeleton (c7b96c59cb04fd2cd096b23f85cebdb4) switched from state RUNNING to FAILING. org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Connection timed out at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) ... 6 more 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (1/10) (d99ba0f0a16c31e717c8044e92e0e8c4) switched from RUNNING to CANCELING 2016-12-02 16:27:48,524 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- Source: Custom Source -> Map (2/10) (294b46febec5e49a8a859d98ceca2a9e) switched from RUNNING to CANCELING {noformat} on TaskManager: {noformat} 2016-12-02 16:27:18,177 INFO org.apache.zookeeper.ClientCnxn
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715048#comment-15715048 ] 刘喆 commented on FLINK-4632: --- Thank you Ufuk Celebi. It depends on the strategy of restarting. Here I use auto restarting stategy, after restarting it works well for a while, and then restarts again. For it's running on a busy YARN cluster, the network may sometime under stress. It seams than Flink is very weak on this YARN cluster. At the same time, some Spark streaming job on my cluster work well. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15715047#comment-15715047 ] 刘喆 commented on FLINK-4632: --- Thank you Ufuk Celebi. It depends on the strategy of restarting. Here I use auto restarting stategy, after restarting it works well for a while, and then restarts again. For it's running on a busy YARN cluster, the network may sometime under stress. It seams than Flink is very weak on this YARN cluster. At the same time, some Spark streaming job on my cluster work well. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh
[jira] [Issue Comment Deleted] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 刘喆 updated FLINK-4632: -- Comment: was deleted (was: Thank you Ufuk Celebi. It depends on the strategy of restarting. Here I use auto restarting stategy, after restarting it works well for a while, and then restarts again. For it's running on a busy YARN cluster, the network may sometime under stress. It seams than Flink is very weak on this YARN cluster. At the same time, some Spark streaming job on my cluster work well.) > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSl
[jira] [Commented] (FLINK-4300) Improve error message for different Scala versions of JM and Client
[ https://issues.apache.org/jira/browse/FLINK-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15721686#comment-15721686 ] Sergio Fernández commented on FLINK-4300: - Any progress on this? As [docker-flink|https://github.com/apache/flink/tree/master/flink-contrib/docker-flink] uses {{SCALA_VERSION=2.11}}, I locally have {{2.11.8}}, but I still have issues. > Improve error message for different Scala versions of JM and Client > --- > > Key: FLINK-4300 > URL: https://issues.apache.org/jira/browse/FLINK-4300 > Project: Flink > Issue Type: Improvement > Components: Client, JobManager >Affects Versions: 1.1.0 >Reporter: Timo Walther > > If a user runs a job (e.g. via RemoteEnvironment) with different Scala > versions of JobManager and Client, the job is not executed and has no proper > error message. > The Client fails only with a meaningless warning: > {code} > 16:59:58,677 WARN akka.remote.ReliableDeliverySupervisor >- Association with remote system [akka.tcp://flink@127.0.0.1:6123] has > failed, address is now gated for [5000] ms. Reason is: [Disassociated]. > {code} > JobManager log only contains the following warning: > {code} > 2016-08-01 16:59:58,664 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@192.168.1.142:63372] has failed, address is now gated for > [5000] ms. Reason is: [scala.Option; local class incompatible: stream > classdesc serialVersionUID = -2062608324514658839, local class > serialVersionUID = -114498752079829388]. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4982) quickstart example: Can't see the output
[ https://issues.apache.org/jira/browse/FLINK-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725705#comment-15725705 ] Mischa Krüger commented on FLINK-4982: -- Ping, still can't find the output^^ > quickstart example: Can't see the output > > > Key: FLINK-4982 > URL: https://issues.apache.org/jira/browse/FLINK-4982 > Project: Flink > Issue Type: Bug > Components: Quickstarts >Reporter: Mischa Krüger > > When compiling and running the quickstart WordCount example on Flink, I can't > find the output anywhere. The java file explicitly write the counted words to > stdout, I searched (I hope) every log file, but couldn't find it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5267) TaskManager logs not scrollable
Mischa Krüger created FLINK-5267: Summary: TaskManager logs not scrollable Key: FLINK-5267 URL: https://issues.apache.org/jira/browse/FLINK-5267 Project: Flink Issue Type: Bug Components: Webfrontend Environment: DC/OS 1.8 Reporter: Mischa Krüger Latest master build, have run the quickstart example successfully and wanted to see the TM logs, but they can't be scrolled. Download the log is luckily still possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5298) Web UI crashes when TM log not existant
Mischa Krüger created FLINK-5298: Summary: Web UI crashes when TM log not existant Key: FLINK-5298 URL: https://issues.apache.org/jira/browse/FLINK-5298 Project: Flink Issue Type: Bug Reporter: Mischa Krüger {code} java.io.FileNotFoundException: flink-taskmanager.out (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleRequestTaskManagerLog(TaskManager.scala:833) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:340) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2016-12-08 16:45:14,995 INFO org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Stopping TaskManager akka://flink/user/taskmanager#1361882659. 2016-12-08 16:45:14,995 INFO org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Disassociating from JobManager 2016-12-08 16:45:14,997 INFO org.apache.flink.runtime.blob.BlobCache - Shutting down BlobCache 2016-12-08 16:45:15,006 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-e61f717b-630c-4a2a-b3e3-62ccb40aa2f9 2016-12-08 16:45:15,006 INFO org.apache.flink.runtime.io.network.NetworkEnvironment- Shutting down the network environment and its components. 2016-12-08 16:45:15,008 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 1 ms). 2016-12-08 16:45:15,009 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 0 ms). 2016-12-08 16:45:15,020 INFO org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Task manager akka://flink/user/taskmanager is completely shut down. 2016-12-08 16:45:15,023 ERROR org.apache.flink.runtime.taskmanager.TaskManager - Actor akka://flink/user/taskmanager#1361882659 terminated, stopping process... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5298) TaskManager crashes when TM log not existant
[ https://issues.apache.org/jira/browse/FLINK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mischa Krüger updated FLINK-5298: - Summary: TaskManager crashes when TM log not existant (was: Web UI crashes when TM log not existant) > TaskManager crashes when TM log not existant > > > Key: FLINK-5298 > URL: https://issues.apache.org/jira/browse/FLINK-5298 > Project: Flink > Issue Type: Bug >Reporter: Mischa Krüger > > {code} > java.io.FileNotFoundException: flink-taskmanager.out (No such file or > directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleRequestTaskManagerLog(TaskManager.scala:833) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:340) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2016-12-08 16:45:14,995 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Stopping > TaskManager akka://flink/user/taskmanager#1361882659. > 2016-12-08 16:45:14,995 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - > Disassociating from JobManager > 2016-12-08 16:45:14,997 INFO org.apache.flink.runtime.blob.BlobCache > - Shutting down BlobCache > 2016-12-08 16:45:15,006 INFO > org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager > removed spill file directory > /tmp/flink-io-e61f717b-630c-4a2a-b3e3-62ccb40aa2f9 > 2016-12-08 16:45:15,006 INFO > org.apache.flink.runtime.io.network.NetworkEnvironment- Shutting down > the network environment and its components. > 2016-12-08 16:45:15,008 INFO > org.apache.flink.runtime.io.network.netty.NettyClient - Successful > shutdown (took 1 ms). > 2016-12-08 16:45:15,009 INFO > org.apache.flink.runtime.io.network.netty.NettyServer - Successful > shutdown (took 0 ms). > 2016-12-08 16:45:15,020 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Task > manager akka://flink/user/taskmanager is completely shut down. > 2016-12-08 16:45:15,023 ERROR > org.apache.flink.runtime.taskmanager.TaskManager - Actor > akka://flink/user/taskmanager#1361882659 terminated, stopping process... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5298) TaskManager crashes when TM log not existant
[ https://issues.apache.org/jira/browse/FLINK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732728#comment-15732728 ] Mischa Krüger commented on FLINK-5298: -- This happened when the log was requested via the web UI. > TaskManager crashes when TM log not existant > > > Key: FLINK-5298 > URL: https://issues.apache.org/jira/browse/FLINK-5298 > Project: Flink > Issue Type: Bug >Reporter: Mischa Krüger > > {code} > java.io.FileNotFoundException: flink-taskmanager.out (No such file or > directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138) > at > org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleRequestTaskManagerLog(TaskManager.scala:833) > at > org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:340) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:122) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2016-12-08 16:45:14,995 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Stopping > TaskManager akka://flink/user/taskmanager#1361882659. > 2016-12-08 16:45:14,995 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - > Disassociating from JobManager > 2016-12-08 16:45:14,997 INFO org.apache.flink.runtime.blob.BlobCache > - Shutting down BlobCache > 2016-12-08 16:45:15,006 INFO > org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager > removed spill file directory > /tmp/flink-io-e61f717b-630c-4a2a-b3e3-62ccb40aa2f9 > 2016-12-08 16:45:15,006 INFO > org.apache.flink.runtime.io.network.NetworkEnvironment- Shutting down > the network environment and its components. > 2016-12-08 16:45:15,008 INFO > org.apache.flink.runtime.io.network.netty.NettyClient - Successful > shutdown (took 1 ms). > 2016-12-08 16:45:15,009 INFO > org.apache.flink.runtime.io.network.netty.NettyServer - Successful > shutdown (took 0 ms). > 2016-12-08 16:45:15,020 INFO > org.apache.flink.mesos.runtime.clusterframework.MesosTaskManager - Task > manager akka://flink/user/taskmanager is completely shut down. > 2016-12-08 16:45:15,023 ERROR > org.apache.flink.runtime.taskmanager.TaskManager - Actor > akka://flink/user/taskmanager#1361882659 terminated, stopping process... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5301) Can't upload job via Web UI when using a proxy
Mischa Krüger created FLINK-5301: Summary: Can't upload job via Web UI when using a proxy Key: FLINK-5301 URL: https://issues.apache.org/jira/browse/FLINK-5301 Project: Flink Issue Type: Bug Components: Webfrontend Reporter: Mischa Krüger Using DC/OS with Flink service in current development (https://github.com/mesosphere/dcos-flink-service). For reproduction: 1. Install a DC/OS cluster 2. Follow the instruction on mentioned repo for setting up a universe server with the flink app. 3. Install the flink app via the universe 4. Access the Web UI 5. Upload a job Experience: The upload reaches 100%, and then says "Saving..." forever. Upload works when using ssh forwarding to access the node directly serving the Flink Web UI. DC/OS uses a proxy to access the Web UI. The webpage is delivered by a component called the "Admin Router". Side note: Interestingly also the new favicon does not appear. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down
Ismaël Mejía created FLINK-4118: --- Summary: The docker-flink image is outdated (1.0.2) and can be slimmed down Key: FLINK-4118 URL: https://issues.apache.org/jira/browse/FLINK-4118 Project: Flink Issue Type: Improvement Reporter: Ismaël Mejía Priority: Minor This issue is to upgrade the docker image and polish some details in it (e.g. it can be slimmed down if we remove some unneeded dependencies, and the code can be polished). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down
[ https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15349470#comment-15349470 ] Ismaël Mejía commented on FLINK-4118: - I am working on this and I have a PR ready for review can somebody please assign this to me, Thanks. > The docker-flink image is outdated (1.0.2) and can be slimmed down > -- > > Key: FLINK-4118 > URL: https://issues.apache.org/jira/browse/FLINK-4118 > Project: Flink > Issue Type: Improvement >Reporter: Ismaël Mejía >Priority: Minor > > This issue is to upgrade the docker image and polish some details in it (e.g. > it can be slimmed down if we remove some unneeded dependencies, and the code > can be polished). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15349699#comment-15349699 ] Josep Rubió commented on FLINK-1707: Hi Vasia, Maybe does not make sense to continue with this implementation. Even being a "graph" algorithm it does not seem to fit good to distributed graph platforms. I know there are some implementations of the original AP and they should be working good (I guess you know them), maybe this is what you need for Gelly. I also think performance should be tested but I don't have access to a real cluster. I've done some tests before for Hadoop with a cluster mounted in my laptop, but 4 nodes of 3gb of memory is the maximum I can reach. Not much useful :( By the way, before doing anything I'll document an example with some iterations and ask some concrete doubts about the implementation. Thanks Vasia! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4120) Lightweight fault tolerance through recomputing lost state
Dénes Vadász created FLINK-4120: --- Summary: Lightweight fault tolerance through recomputing lost state Key: FLINK-4120 URL: https://issues.apache.org/jira/browse/FLINK-4120 Project: Flink Issue Type: New Feature Components: Core Reporter: Dénes Vadász The current fault tolerance mechanism requires that stateful operators write their internal state to stable storage during a checkpoint. This proposal aims at optimizing out this operation in the cases where the operator state can be recomputed from a finite (practically: small) set of source records, and those records are already on checkpoint-aware persistent storage (e.g. in Kafka). The rationale behind the proposal is that the cost of reprocessing is paid only on recovery from (presumably rare) failures, whereas the cost of persisting the state is paid on every checkpoint. Eliminating the need for persistent storage will also simplify system setup and operation. In the cases where this optimisation is applicable, the state of the operators can be restored by restarting the pipeline from a checkpoint taken before the pipeline ingested any of the records required for the state re-computation of the operators (call this the "null-state checkpoint"), as opposed to a restart from the "latest checkpoint". The "latest checkpoint" is still relevant for the recovery: the barriers belonging to that checkpoint must be inserted into the source streams in the position they were originally inserted. Sinks must discard all records until this barrier reaches them. Note the inherent relationship between the "latest" and the "null-state" checkpoints: the pipeline must be restarted from the latter to restore the state at the former. For the stateful operators for which this optimization is applicable we can define the notion of "current null-state watermark" as the watermark such that the operator can correctly (re)compute its current state merely from records after this watermark. For the checkpoint-coordinator to be able to compute the null-state checkpoint, each stateful operator should report its "current null-state watermark" as part of acknowledging the ongoing checkpoint. The null-state checkpoint of the ongoing checkpoint is the most recent checkpoint preceding all the received null-state watermarks (assuming the pipeline preserves the relative order of barriers and watermarks). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351211#comment-15351211 ] Josep Rubió commented on FLINK-1707: Agree, I put this same statement in the design doc. But the paper does not explain that running in parallel the calculations of E and I vertices of the binary model gives different intermediate results as it has a different scheduling but I think it should give same clusters (but I'm not an expert in bayesian networks). About the Capacitated Affinity Propagation, one of the advantages of the binary model is that adding constraints to clusters means you just need different calculations for E and I messages. In CAP case only the α(i,j) calculation is different so this means you just need to modify the functions that updates E vertices. By the way, I'm not sure I understand your answer. Do you mean we should work with the original AP algorithm? Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4120) Lightweight fault tolerance through recomputing lost state
[ https://issues.apache.org/jira/browse/FLINK-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351739#comment-15351739 ] Dénes Vadász commented on FLINK-4120: - Whether the state is re-computed or restored from stable storage should be a trade-off controlled by the designer of the pipeline. If the input records are persisted before the pipeline ingests them and how long they are preserved there should again be at the discretion of the designer of the pipeline. I am arguing in favour of making the re-computation of the state possible; not in favour of mandating it. The use case I am coming with is correlation of log records to restore the message flow in a distributed computer system. In such a case the typical lifetime of operator state is a few minutes at most. (In fact we wanted to port a solution using this optimization (and written long before Spark and Flink were born) to Flink, and discovered that Flink currently lacks this.) Note that if you want to mix recomputing and restoring operators in a pipeline, the restoring operators have to restore the state they saved during the null-state checkpoint and participate in the re-computation phase in the same way they behave during normal operation, unaffected by the latest checkpoint barrier. This barrier influences the behaviour of sinks only: before the barrier sinks consume and discard their input (as these records were already emitted before the failure); after the barrier they emit normally. I do not quite understood the statement relating to barriers being non-deterministic; if in view of my explanations this is still relevant, can you please elaborate on it? Regards Dénes > Lightweight fault tolerance through recomputing lost state > -- > > Key: FLINK-4120 > URL: https://issues.apache.org/jira/browse/FLINK-4120 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Reporter: Dénes Vadász >Priority: Minor > > The current fault tolerance mechanism requires that stateful operators write > their internal state to stable storage during a checkpoint. > This proposal aims at optimizing out this operation in the cases where the > operator state can be recomputed from a finite (practically: small) set of > source records, and those records are already on checkpoint-aware persistent > storage (e.g. in Kafka). > The rationale behind the proposal is that the cost of reprocessing is paid > only on recovery from (presumably rare) failures, whereas the cost of > persisting the state is paid on every checkpoint. Eliminating the need for > persistent storage will also simplify system setup and operation. > In the cases where this optimisation is applicable, the state of the > operators can be restored by restarting the pipeline from a checkpoint taken > before the pipeline ingested any of the records required for the state > re-computation of the operators (call this the "null-state checkpoint"), as > opposed to a restart from the "latest checkpoint". > The "latest checkpoint" is still relevant for the recovery: the barriers > belonging to that checkpoint must be inserted into the source streams in the > position they were originally inserted. Sinks must discard all records until > this barrier reaches them. > Note the inherent relationship between the "latest" and the "null-state" > checkpoints: the pipeline must be restarted from the latter to restore the > state at the former. > For the stateful operators for which this optimization is applicable we can > define the notion of "current null-state watermark" as the watermark such > that the operator can correctly (re)compute its current state merely from > records after this watermark. > > For the checkpoint-coordinator to be able to compute the null-state > checkpoint, each stateful operator should report its "current null-state > watermark" as part of acknowledging the ongoing checkpoint. The null-state > checkpoint of the ongoing checkpoint is the most recent checkpoint preceding > all the received null-state watermarks (assuming the pipeline preserves the > relative order of barriers and watermarks). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4120) Lightweight fault tolerance through recomputing lost state
[ https://issues.apache.org/jira/browse/FLINK-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351739#comment-15351739 ] Dénes Vadász edited comment on FLINK-4120 at 6/27/16 8:23 PM: -- Whether the state is re-computed or restored from stable storage should be a trade-off controlled by the designer of the pipeline. If the input records are persisted before the pipeline ingests them and how long they are preserved there is at the discretion of the designer of the pipeline, so he/she can align these decisions. I am arguing in favour of making the re-computation of the state possible; not in favour of mandating it. The use case I am coming with is correlation of log records to restore the message flow in a distributed computer system. In such a case the typical lifetime of operator state is a few minutes at most. (In fact we wanted to port a solution using this optimization (and written long before Spark and Flink were born) to Flink, and discovered that Flink currently lacks this.) Note that if you want to mix recomputing and restoring operators in a pipeline, the restoring operators have to restore the state they saved during the null-state checkpoint and participate in the re-computation phase in the same way they behave during normal operation, unaffected by the latest checkpoint barrier. This barrier influences the behaviour of sinks only: before the barrier sinks consume and discard their input (as these records were already emitted before the failure); after the barrier they emit normally. I do not quite understood the statement relating to barriers being non-deterministic; if in view of my explanations this is still relevant, can you please elaborate on it? Regards Dénes was (Author: dvadasz): Whether the state is re-computed or restored from stable storage should be a trade-off controlled by the designer of the pipeline. If the input records are persisted before the pipeline ingests them and how long they are preserved there should again be at the discretion of the designer of the pipeline. I am arguing in favour of making the re-computation of the state possible; not in favour of mandating it. The use case I am coming with is correlation of log records to restore the message flow in a distributed computer system. In such a case the typical lifetime of operator state is a few minutes at most. (In fact we wanted to port a solution using this optimization (and written long before Spark and Flink were born) to Flink, and discovered that Flink currently lacks this.) Note that if you want to mix recomputing and restoring operators in a pipeline, the restoring operators have to restore the state they saved during the null-state checkpoint and participate in the re-computation phase in the same way they behave during normal operation, unaffected by the latest checkpoint barrier. This barrier influences the behaviour of sinks only: before the barrier sinks consume and discard their input (as these records were already emitted before the failure); after the barrier they emit normally. I do not quite understood the statement relating to barriers being non-deterministic; if in view of my explanations this is still relevant, can you please elaborate on it? Regards Dénes > Lightweight fault tolerance through recomputing lost state > -- > > Key: FLINK-4120 > URL: https://issues.apache.org/jira/browse/FLINK-4120 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Reporter: Dénes Vadász >Priority: Minor > > The current fault tolerance mechanism requires that stateful operators write > their internal state to stable storage during a checkpoint. > This proposal aims at optimizing out this operation in the cases where the > operator state can be recomputed from a finite (practically: small) set of > source records, and those records are already on checkpoint-aware persistent > storage (e.g. in Kafka). > The rationale behind the proposal is that the cost of reprocessing is paid > only on recovery from (presumably rare) failures, whereas the cost of > persisting the state is paid on every checkpoint. Eliminating the need for > persistent storage will also simplify system setup and operation. > In the cases where this optimisation is applicable, the state of the > operators can be restored by restarting the pipeline from a checkpoint taken > before the pipeline ingested any of the records required for the state > re-computation of the operators (call this the "null-state checkpoint"), as > opposed to a restart from the "latest checkpoint". > The "latest checkpoint"
[jira] [Created] (FLINK-4136) Add a JobHistory Server service
Márton Balassi created FLINK-4136: - Summary: Add a JobHistory Server service Key: FLINK-4136 URL: https://issues.apache.org/jira/browse/FLINK-4136 Project: Flink Issue Type: New Feature Components: Cluster Management Reporter: Márton Balassi When running on YARN the lack of a JobHistory server is very inconvenient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4209) Docker image breaks with multiple NICs
Ismaël Mejía created FLINK-4209: --- Summary: Docker image breaks with multiple NICs Key: FLINK-4209 URL: https://issues.apache.org/jira/browse/FLINK-4209 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Ismaël Mejía Priority: Minor The resolution of the host is done by IP today in the docker image scripts, this is an issue when the system has multiple network cards, if the hostname resolution is done by name, this is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4208) Support Running Flink processes in foreground mode
Ismaël Mejía created FLINK-4208: --- Summary: Support Running Flink processes in foreground mode Key: FLINK-4208 URL: https://issues.apache.org/jira/browse/FLINK-4208 Project: Flink Issue Type: Improvement Reporter: Ismaël Mejía Priority: Minor Flink clusters are started automatically in daemon mode, this is definitely the default case, however if we want to start containers based on flinks, the execution context gets lost. Running flink as foreground processes can fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4208) Support Running Flink processes in foreground mode
[ https://issues.apache.org/jira/browse/FLINK-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía closed FLINK-4208. --- Resolution: Fixed I decided to remove the issue for foreground mode and change the docker script to use an extra wait to avoid losing the process in the background. > Support Running Flink processes in foreground mode > -- > > Key: FLINK-4208 > URL: https://issues.apache.org/jira/browse/FLINK-4208 > Project: Flink > Issue Type: Improvement >Reporter: Ismaël Mejía >Priority: Minor > > Flink clusters are started automatically in daemon mode, this is definitely > the default case, however if we want to start containers based on flinks, the > execution context gets lost. Running flink as foreground processes can fix > this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4209) Fix issue on docker with multiple NICs and remove supervisord dependency
[ https://issues.apache.org/jira/browse/FLINK-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated FLINK-4209: Summary: Fix issue on docker with multiple NICs and remove supervisord dependency (was: Docker image breaks with multiple NICs) > Fix issue on docker with multiple NICs and remove supervisord dependency > > > Key: FLINK-4209 > URL: https://issues.apache.org/jira/browse/FLINK-4209 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Ismaël Mejía >Priority: Minor > > The resolution of the host is done by IP today in the docker image scripts, > this is an issue when the system has multiple network cards, if the hostname > resolution is done by name, this is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4259) Unclosed FSDataOutputStream in FileCache#copy()
[ https://issues.apache.org/jira/browse/FLINK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi closed FLINK-4259. - Resolution: Fixed Fix Version/s: 1.2.0 Fixed via 5d7e949. > Unclosed FSDataOutputStream in FileCache#copy() > --- > > Key: FLINK-4259 > URL: https://issues.apache.org/jira/browse/FLINK-4259 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Neelesh Srinivas Salian >Priority: Minor > Fix For: 1.2.0 > > > {code} > try { > FSDataOutputStream lfsOutput = tFS.create(targetPath, false); > FSDataInputStream fsInput = sFS.open(sourcePath); > IOUtils.copyBytes(fsInput, lfsOutput); > {code} > The FSDataOutputStream lfsOutput should be closed upon exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2662) CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators."
[ https://issues.apache.org/jira/browse/FLINK-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401814#comment-15401814 ] Lorenz Bühmann commented on FLINK-2662: --- I do have the same problem: Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: Plan generation for Unions picked a ship strategy between binary plan operators. at org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) at org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:73) at org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) at org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) at org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) at org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) at org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128) at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:516) at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:398) at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:184) at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:90) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:855) at org.apache.flink.api.java.DataSet.collect(DataSet.java:410) at org.apache.flink.api.java.DataSet.print(DataSet.java:1605) at org.apache.flink.api.scala.DataSet.print(DataSet.scala:1615) at org.sansa.inference.flink.conformance.FlinkBug$.main(FlinkBug.scala:31) at org.sansa.inference.flink.conformance.FlinkBug.main(FlinkBug.scala) I tried both, Flink 1.0.3 and Flink 1.1.0-SNAPSHOT. Any progress on it? > CompilerException: "Bug: Plan generation for Unions picked a ship strategy > between binary plan operators." > -- > > Key: FLINK-2662 > URL: https://issues.apache.org/jira/browse/FLINK-2662 > Project: Flink > Issue Type: Bug > Components: Optimizer >Affects Versions: 0.9.1, 0.10.0 >Reporter: Gabor Gevay > Fix For: 1.0.0 > > > I have a Flink program which throws the exception in the jira title. Full > text: > Exception in thread "main" org.apache.flink.optimizer.CompilerException: Bug: > Plan generation for Unions picked a ship strategy between binary plan > operators. > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.collect(BinaryUnionReplacer.java:113) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:72) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.postVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:170) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:163) > at > org.apache.flink.optimizer.plan.WorksetIterationPlanNode.acceptForStepFunction(WorksetIterationPlanNode.java:194) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:49) > at > org.apache.flink.optimizer.traversals.BinaryUnionReplacer.preVisit(BinaryUnionReplacer.java:41) > at > org.apache.flink.optimizer.plan.DualInputPlanNode.accept(DualInputPlanNode.java:162) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199) > at > org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:127) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:520) > at org.apache.flink.optimizer.Optimizer.compile(Optimizer.java:402) > at > org.apache.flink.client.LocalExecutor.getOptimizerPlanAsJSON(LocalExecutor.java:202) > at > org.apache.flink.api.java.LocalEnvironment.getExecutionPlan(LocalE
[jira] [Updated] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josep Rubió updated FLINK-1707: --- Description: This issue proposes adding the an implementation of the Affinity Propagation algorithm as a Gelly library method and a corresponding example. The algorithm is described in paper [1] and a description of a vertex-centric implementation can be found is [2]. [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf Design doc: https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing Example spreadsheet: https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing was: This issue proposes adding the an implementation of the Affinity Propagation algorithm as a Gelly library method and a corresponding example. The algorithm is described in paper [1] and a description of a vertex-centric implementation can be found is [2]. [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf Design doc: https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406651#comment-15406651 ] Josep Rubió commented on FLINK-1707: Hi all, Sorry for my late response but I did not have much time to work with this. I have created a spreadsheet executing the algorithm with the same example i have in github. This is a 3 nodes graph with following similarities: 1 -> 1: 1.0 1 -> 2: 1.0 1 -> 3: 5.0 2 -> 1: 1.0 2 -> 2: 1.0 2 -> 3: 3.0 3 -> 1: 5.0 3 -> 2: 3.0 3 -> 3: 1.0 The execution in the spreadsheet contains the calculations for intermediate results of the algorithm (eta and beta values being n and b respectively). These calculations are implicit in in alfa and rho in the implementation. You can see that values for Rho messages in the spreadsheet are the values sent from I to E nodes in the implementation and alfa messages are the values sent from E to I nodes. Damping factor can be set at second row. https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing About the other issues: - Although It is recommended in the paper to use damping it can be implemented without it, avoiding having in the vertices the old sent values for each destination. I don't see any other way to implement damping. - Calculating values in parallel is not an issue is a different scheduling of the algorithm. It could be done in a serial mode just doing the calculations for odd or even vertices in each superstep. - I will review the other implementation issues you commented Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing > Example spreadsheet: > https://docs.google.com/spreadsheets/d/1CurZCBP6dPb1IYQQIgUHVjQdyLxK0JDGZwlSXCzBcvA/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned FLINK-3026: --- Assignee: Ismaël Mejía > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task >Reporter: Omer Katz >Assignee: Ismaël Mejía > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3026) Publish the flink docker container to the docker registry
[ https://issues.apache.org/jira/browse/FLINK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410906#comment-15410906 ] Ismaël Mejía commented on FLINK-3026: - I assign myself this issue following the previous discussion with [~aljoscha]. > Publish the flink docker container to the docker registry > - > > Key: FLINK-3026 > URL: https://issues.apache.org/jira/browse/FLINK-3026 > Project: Flink > Issue Type: Task >Reporter: Omer Katz >Assignee: Ismaël Mejía > Labels: Deployment, Docker > > There's a dockerfile that can be used to build a docker container already in > the repository. It'd be awesome to just be able to pull it instead of > building it ourselves. > The dockerfile can be found at > https://github.com/apache/flink/tree/master/flink-contrib/docker-flink > It also doesn't point to the latest version of Flink which I fixed in > https://github.com/apache/flink/pull/1366 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3155) Update Flink docker version to latest stable Flink version
[ https://issues.apache.org/jira/browse/FLINK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated FLINK-3155: Affects Version/s: 1.1.0 > Update Flink docker version to latest stable Flink version > -- > > Key: FLINK-3155 > URL: https://issues.apache.org/jira/browse/FLINK-3155 > Project: Flink > Issue Type: Task > Components: flink-contrib >Affects Versions: 1.0.0, 1.1.0 >Reporter: Maximilian Michels > Fix For: 1.0.0 > > > It would be nice to always set the Docker Flink binary URL to point to the > latest Flink version. Until then, this JIRA keeps track of the updates for > releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3155) Update Flink docker version to latest stable Flink version
[ https://issues.apache.org/jira/browse/FLINK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated FLINK-3155: Fix Version/s: 1.2.0 > Update Flink docker version to latest stable Flink version > -- > > Key: FLINK-3155 > URL: https://issues.apache.org/jira/browse/FLINK-3155 > Project: Flink > Issue Type: Task > Components: flink-contrib >Affects Versions: 1.0.0, 1.1.0 >Reporter: Maximilian Michels > Fix For: 1.0.0 > > > It would be nice to always set the Docker Flink binary URL to point to the > latest Flink version. Until then, this JIRA keeps track of the updates for > releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3155) Update Flink docker version to latest stable Flink version
[ https://issues.apache.org/jira/browse/FLINK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated FLINK-3155: Fix Version/s: (was: 1.2.0) > Update Flink docker version to latest stable Flink version > -- > > Key: FLINK-3155 > URL: https://issues.apache.org/jira/browse/FLINK-3155 > Project: Flink > Issue Type: Task > Components: flink-contrib >Affects Versions: 1.0.0, 1.1.0 >Reporter: Maximilian Michels > Fix For: 1.0.0 > > > It would be nice to always set the Docker Flink binary URL to point to the > latest Flink version. Until then, this JIRA keeps track of the updates for > releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3155) Update Flink docker version to latest stable Flink version
[ https://issues.apache.org/jira/browse/FLINK-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated FLINK-3155: Priority: Minor (was: Major) > Update Flink docker version to latest stable Flink version > -- > > Key: FLINK-3155 > URL: https://issues.apache.org/jira/browse/FLINK-3155 > Project: Flink > Issue Type: Task > Components: flink-contrib >Affects Versions: 1.0.0, 1.1.0 >Reporter: Maximilian Michels >Priority: Minor > Fix For: 1.0.0 > > > It would be nice to always set the Docker Flink binary URL to point to the > latest Flink version. Until then, this JIRA keeps track of the updates for > releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4327) Error in link to downlad version for hadoop 2.7 and scala 2.11
Ismaël Mejía created FLINK-4327: --- Summary: Error in link to downlad version for hadoop 2.7 and scala 2.11 Key: FLINK-4327 URL: https://issues.apache.org/jira/browse/FLINK-4327 Project: Flink Issue Type: Bug Components: Project Website Affects Versions: 1.1.0 Reporter: Ismaël Mejía In the download page https://flink.apache.org/downloads.html The link to download the release for hadoop 2.7.0 and scala 2.11 goes into the previous version (1.0.3) URL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground
[ https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411605#comment-15411605 ] Ismaël Mejía commented on FLINK-4326: - For ref this is the commit of my proposed solution: https://github.com/iemejia/flink/commit/af9ac6cdb3f6601d6248abe82df4fd44de4453e5 Notice that I finally closed the pull request since we could hack this via the && wait for the docker script that was my purpose, but I still agree that having foreground processes has its value. For ref, the JIRA where we discussed this early on: https://issues.apache.org/jira/browse/FLINK-4208 Is there an alternative PR for this ? > Flink start-up scripts should optionally start services on the foreground > - > > Key: FLINK-4326 > URL: https://issues.apache.org/jira/browse/FLINK-4326 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.3 >Reporter: Elias Levy > > This has previously been mentioned in the mailing list, but has not been > addressed. Flink start-up scripts start the job and task managers in the > background. This makes it difficult to integrate Flink with most processes > supervisory tools and init systems, including Docker. One can get around > this via hacking the scripts or manually starting the right classes via Java, > but it is a brittle solution. > In addition to starting the daemons in the foreground, the start up scripts > should use exec instead of running the commends, so as to avoid forks. Many > supervisory tools assume the PID of the process to be monitored is that of > the process it first executes, and fork chains make it difficult for the > supervisor to figure out what process to monitor. Specifically, > jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and > flink-daemon.sh should exec java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground
[ https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413252#comment-15413252 ] Ismaël Mejía commented on FLINK-4326: - Well it is good to know that there is interest around the foreground mode, probably it is a good idea to invite [~greghogan] to the discussion since he reviewed my previous PR. What do you think ? should I rebase my previous patch and create a PR for this, or any of you guys has a better idea of how to do it ? > Flink start-up scripts should optionally start services on the foreground > - > > Key: FLINK-4326 > URL: https://issues.apache.org/jira/browse/FLINK-4326 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.3 >Reporter: Elias Levy > > This has previously been mentioned in the mailing list, but has not been > addressed. Flink start-up scripts start the job and task managers in the > background. This makes it difficult to integrate Flink with most processes > supervisory tools and init systems, including Docker. One can get around > this via hacking the scripts or manually starting the right classes via Java, > but it is a brittle solution. > In addition to starting the daemons in the foreground, the start up scripts > should use exec instead of running the commends, so as to avoid forks. Many > supervisory tools assume the PID of the process to be monitored is that of > the process it first executes, and fork chains make it difficult for the > supervisor to figure out what process to monitor. Specifically, > jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and > flink-daemon.sh should exec java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground
[ https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414182#comment-15414182 ] Ismaël Mejía commented on FLINK-4326: - Well that's the question I was wondering before my previous PR but then I realized that having a centralized point for all the changes was less error-prone (current flink-daemon.sh), that's the reason I ended up mixing flink-daemon with an action like 'start-foreground', on the other hand we can rename flink-daemon into flink-service and it will make the same but it will have a less confusing naming. > Flink start-up scripts should optionally start services on the foreground > - > > Key: FLINK-4326 > URL: https://issues.apache.org/jira/browse/FLINK-4326 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.3 >Reporter: Elias Levy > > This has previously been mentioned in the mailing list, but has not been > addressed. Flink start-up scripts start the job and task managers in the > background. This makes it difficult to integrate Flink with most processes > supervisory tools and init systems, including Docker. One can get around > this via hacking the scripts or manually starting the right classes via Java, > but it is a brittle solution. > In addition to starting the daemons in the foreground, the start up scripts > should use exec instead of running the commends, so as to avoid forks. Many > supervisory tools assume the PID of the process to be monitored is that of > the process it first executes, and fork chains make it difficult for the > supervisor to figure out what process to monitor. Specifically, > jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and > flink-daemon.sh should exec java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4326) Flink start-up scripts should optionally start services on the foreground
[ https://issues.apache.org/jira/browse/FLINK-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417255#comment-15417255 ] Ismaël Mejía commented on FLINK-4326: - A separation of (daemon/console) scripts would be the nicest, no doubt. However, I am not sure if removing the PID code + output will be appropriate when we run daemons and foreground processes at the same time, how do we count the running instances if somebody runs a new process in foreground mode, or what would be the logic if we call stop-all, must we kill all the processes even the foreground ones ? in these cases I think we need the PID/output refs, but well I am not really sure and maybe we can do such things without it. Independent of this we must also not forget that we should preserve at least the same options (start|stop|stop-all) for both jobmanager.sh and taskmanager. because they do their magic (build the runtime options) and at the end they call the the (daemon/console) script. I suppose we will need the new start-foreground option in these scripts too, or are there any other ideas of how to do it best ? > Flink start-up scripts should optionally start services on the foreground > - > > Key: FLINK-4326 > URL: https://issues.apache.org/jira/browse/FLINK-4326 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.3 >Reporter: Elias Levy > > This has previously been mentioned in the mailing list, but has not been > addressed. Flink start-up scripts start the job and task managers in the > background. This makes it difficult to integrate Flink with most processes > supervisory tools and init systems, including Docker. One can get around > this via hacking the scripts or manually starting the right classes via Java, > but it is a brittle solution. > In addition to starting the daemons in the foreground, the start up scripts > should use exec instead of running the commends, so as to avoid forks. Many > supervisory tools assume the PID of the process to be monitored is that of > the process it first executes, and fork chains make it difficult for the > supervisor to figure out what process to monitor. Specifically, > jobmanager.sh and taskmanager.sh should exec flink-daemon.sh, and > flink-daemon.sh should exec java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4319) Rework Cluster Management (FLIP-6)
[ https://issues.apache.org/jira/browse/FLINK-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961748#comment-15961748 ] Markus Müller commented on FLINK-4319: -- For a reference implementation on Bluemix with kubernetes support in beta, please have a look at github.com/sedgewickmm18/After-HadoopsummitMeetupIoT - it should work on all platforms with kubernetes support. I'm waiting for FLIP-6 to implement multi-tenancy support, i.e. to have task managers dedicated to tenants and thus isolate tenant workload > Rework Cluster Management (FLIP-6) > -- > > Key: FLINK-4319 > URL: https://issues.apache.org/jira/browse/FLINK-4319 > Project: Flink > Issue Type: Improvement > Components: Cluster Management >Affects Versions: 1.1.0 >Reporter: Stephan Ewen > > This is the root issue to track progress of the rework of cluster management > (FLIP-6) > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-6317) History server - wrong default directory
Lorenz Bühmann created FLINK-6317: - Summary: History server - wrong default directory Key: FLINK-6317 URL: https://issues.apache.org/jira/browse/FLINK-6317 Project: Flink Issue Type: Bug Components: Web Client Affects Versions: 1.2.0 Reporter: Lorenz Bühmann Priority: Minor When the history server is started without a directory specified in the configuration file, it will use some random directory located in the Java Temp directory. Unfortunately, a file separator is missing: {code:title=HistoryServer.java@L139-L143|borderStyle=solid} String webDirectory = config.getString(HistoryServerOptions.HISTORY_SERVER_WEB_DIR); if (webDirectory == null) { webDirectory = System.getProperty("java.io.tmpdir") + "flink-web-history-" + UUID.randomUUID(); } webDir = new File(webDirectory); {code} It should be {code} webDirectory = System.getProperty("java.io.tmpdir") + File.separator + "flink-web-history-" + UUID.randomUUID(); {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-6317) History server - wrong default directory
[ https://issues.apache.org/jira/browse/FLINK-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorenz Bühmann updated FLINK-6317: -- Description: When the history server is started without a directory specified in the configuration file, it will use some random directory located in the Java Temp directory. Unfortunately, a file separator is missing: {code:title=HistoryServer.java@L139-L143|borderStyle=solid} String webDirectory = config.getString(HistoryServerOptions.HISTORY_SERVER_WEB_DIR); if (webDirectory == null) { webDirectory = System.getProperty("java.io.tmpdir") + "flink-web-history-" + UUID.randomUUID(); } webDir = new File(webDirectory); {code} It should be {code} webDirectory = System.getProperty("java.io.tmpdir") + File.separator + "flink-web-history-" + UUID.randomUUID(); {code} was: When the history server is started without a directory specified in the configuration file, it will use some random directory located in the Java Temp directory. Unfortunately, a file separator is missing: {code:title=HistoryServer.java@L139-L143|borderStyle=solid} String webDirectory = config.getString(HistoryServerOptions.HISTORY_SERVER_WEB_DIR); if (webDirectory == null) { webDirectory = System.getProperty("java.io.tmpdir") + "flink-web-history-" + UUID.randomUUID(); } webDir = new File(webDirectory); {code} It should be {code} webDirectory = System.getProperty("java.io.tmpdir") + File.separator + "flink-web-history-" + UUID.randomUUID(); {code} > History server - wrong default directory > > > Key: FLINK-6317 > URL: https://issues.apache.org/jira/browse/FLINK-6317 > Project: Flink > Issue Type: Bug > Components: Web Client >Affects Versions: 1.2.0 >Reporter: Lorenz Bühmann >Priority: Minor > Labels: easyfix > > When the history server is started without a directory specified in the > configuration file, it will use some random directory located in the Java > Temp directory. Unfortunately, a file separator is missing: > {code:title=HistoryServer.java@L139-L143|borderStyle=solid} > String webDirectory = > config.getString(HistoryServerOptions.HISTORY_SERVER_WEB_DIR); > if (webDirectory == null) { > webDirectory = System.getProperty("java.io.tmpdir") + > "flink-web-history-" + UUID.randomUUID(); > } > webDir = new File(webDirectory); > {code} > It should be > {code} > webDirectory = System.getProperty("java.io.tmpdir") + File.separator + > "flink-web-history-" + UUID.randomUUID(); > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2147) Approximate calculation of frequencies in data streams
[ https://issues.apache.org/jira/browse/FLINK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981414#comment-15981414 ] Gábor Hermann commented on FLINK-2147: -- I would prefer emitting updates on a window basis, as a Flink have quite rich options for triggering. With overpartitioning, there could be _count-min sketch partitions_ (CMS-partitions), more than the number of partitions (i.e. subtasks). We could assign the CMS-partition to the input data (based on hashing) and keyBy on CMS-partition. Then, we could fold over the CMS-partition (a compact array), which is (AFAIK) internally stored as a keyed state. This way, we would not keep state for every key separately (saving memory), while allowing scaling operators (inc./dec. parallelism). Does that make sense? Using windows makes easier to define _when to delete old data_ and _when to emit results_ and deal with _out-of-orderness_. However, with windows there's slightly more memory overhead compared to e.g. storing one count-min sketch array per partition. A question is then *what API should we provide?* The user could specify the key, window assigner, trigger, evictor, allowedLateness, and the count-min sketch properties (size, hash functions). Then, the window could be translated into a another window keyed by the CMS-partition (as I described). But should it be a simply function that takes a DataStream as input and returns a DataStream with the results? Or should we add DataStream a special countMinSketch function to KeyedDataStream? Alternatively, we could implement count-min sketch without windows. The user would specify two streams: one queries and the other writes the count-min sketch. So the "triggering" is done by a stream. The problem is then how do we specify when to delete old data and how to deal with out-of-orderness? Another question is *where could we place the API?* In flink-streaming-java module? Or flink-streaming-contrib? This, of course, highly depends on what API we would provide. > Approximate calculation of frequencies in data streams > -- > > Key: FLINK-2147 > URL: https://issues.apache.org/jira/browse/FLINK-2147 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Gabor Gevay > Labels: approximate, statistics > > Count-Min sketch is a hashing-based algorithm for approximately keeping track > of the frequencies of elements in a data stream. It is described by Cormode > et al. in the following paper: > http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf > Note that this algorithm can be conveniently implemented in a distributed > way, as described in section 3.2 of the paper. > The paper > http://www.vldb.org/conf/2002/S10P03.pdf > also describes algorithms for approximately keeping track of frequencies, but > here the user can specify a threshold below which she is not interested in > the frequency of an element. The error-bounds are also different than the > Count-min sketch algorithm. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5753) CEP timeout handler.
[ https://issues.apache.org/jira/browse/FLINK-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007743#comment-16007743 ] Michał Jurkiewicz commented on FLINK-5753: -- [~kkl0u], My job is operating is eventTime. Here is my configuration: {code} final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); env.getConfig().setGlobalJobParameters(conf); {code} > CEP timeout handler. > > > Key: FLINK-5753 > URL: https://issues.apache.org/jira/browse/FLINK-5753 > Project: Flink > Issue Type: Bug > Components: CEP >Affects Versions: 1.1.2 >Reporter: Michał Jurkiewicz > > I configured the following flink job in my environment: > {code} > Pattern patternCommandStarted = Pattern. > begin("event-accepted").subtype(Event.class) > .where(e -> {event accepted where > statement}).next("second-event-started").subtype(Event.class) > .where(e -> {event started where statement})) > .within(Time.seconds(30)); > DataStream> events = CEP > .pattern(eventsStream.keyBy(e -> e.getEventProperties().get("deviceCode")), > patternCommandStarted) > .select(eventSelector, eventSelector); > static class EventSelector implements PatternSelectFunction, > PatternTimeoutFunction {} > {code} > The problem that I have is related to timeout handling. I observed that: > if: first event appears, second event not appear in the stream > and *no new events appear in a stream*, timeout handler is not executed. > Expected result: timeout handler should be executed in case if there are no > new events in a stream -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522303#comment-15522303 ] 刘喆 commented on FLINK-4632: --- I tried the lastest github version, the problem is still there. When the job is running, kill one taskmanager, then it becomes canceling and then hung > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:162) > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:533) > at > org.apache
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528781#comment-15528781 ] 刘喆 commented on FLINK-4632: --- I can't reproduce it now. I only save the TaskManager' log as the beginning. If it happen again, I will come back here. Thanks very much. > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 >Priority: Blocker > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:87) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: 连接超时 > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) > at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) > ... 6 more > after applyhttps://issues.apache.org/jira/browse/FLINK-4625 > it show: > java.lang.Exception: TaskManager was lost/killed: > ResourceID{resourceId='container_1471620986643_744852_01_001400'} @ > 38.slave.adh (dataPort=45349) > at > org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:162) > at > org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.ja
[jira] [Commented] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528833#comment-15528833 ] 刘喆 commented on FLINK-4632: --- I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2 > Environment: cdh5.5.1 jdk1.7 flink1.1.2 1.2-snapshot kafka0.8.2 >Reporter: 刘喆 >Priority: Blocker > > When run flink streaming on yarn, using kafka as source, it runs well when > start. But after long run, for example 8 hours, dealing 60,000,000+ > messages, it hung: no messages consumed, one taskmanager is CANCELING, the > exception show: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > connection timeout > at > org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:152) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:79) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:275) > at > io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:253) > at > io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:835) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadExcepti
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528833#comment-15528833 ] 刘喆 edited comment on FLINK-4632 at 9/28/16 8:13 AM: I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 and when I don't use checkpoint at all, it work well was (Author: liuzhe): I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/jira/browse/FLINK-4632 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Streaming >Affects Versions: 1.2.0, 1.1.2
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528833#comment-15528833 ] 刘喆 edited comment on FLINK-4632 at 9/28/16 8:25 AM: I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. I have 2 screenshot for it: http://p1.bpimg.com/567571/eb9442e01ede0a24.png http://p1.bpimg.com/567571/eb9442e01ede0a24.png Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 and when I don't use checkpoint at all, it work well was (Author: liuzhe): I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 and when I don't use checkpoint at all, it work well > when yarn nodemanager lost, flink hung > --- > > Key: FLINK-4632 > URL: https://issues.apache.org/ji
[jira] [Comment Edited] (FLINK-4632) when yarn nodemanager lost, flink hung
[ https://issues.apache.org/jira/browse/FLINK-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528833#comment-15528833 ] 刘喆 edited comment on FLINK-4632 at 9/28/16 11:00 AM: - I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. I have 2 screenshot for it: http://p1.bpimg.com/567571/eb9442e01ede0a24.png http://p1.bpimg.com/567571/6502b8c89fc68229.png Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 and when I don't use checkpoint at all, it work well was (Author: liuzhe): I think it is related to checkpoint. When I use checkpoint with 'exactly_once' mode, sub task may hung but other sub tasks running, after a long time, all tasks hung. At the same time, there is no more checkpoint producted. When killing, maybe the checkpoint block other thread. I use JobManager as checkpoint backend. The checkpoint interval is 30 seconds. I have 2 screenshot for it: http://p1.bpimg.com/567571/eb9442e01ede0a24.png http://p1.bpimg.com/567571/eb9442e01ede0a24.png Some log as below: 2016-09-27 09:42:59,463 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (369/500) (e60c3755b5461c181e29cd30400cd6b0) switched from DEPLOYING to RUNNING 2016-09-27 09:42:59,552 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (274/500) (f983db19c603a51027cf7031e19edb79) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,256 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (497/500) (92bde125c4eb920d32aa11b6514f4cf1) switched from DEPLOYING to RUNNING 2016-09-27 09:43:00,477 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph- send (373/500) (a59fec5a8ce66518d9003e6a480e1854) switched from DEPLOYING to RUNNING 2016-09-27 09:44:05,867 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 1 @ 1474940645865 2016-09-27 09:44:41,782 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 1 (in 35917 ms) 2016-09-27 09:49:05,865 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1474940945865 2016-09-27 09:50:05,866 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint 2 expired before completing. 2016-09-27 09:50:07,390 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:10,572 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 2016-09-27 09:50:12,207 WARN org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Received late message for now expired checkpoint attempt 2 and when I don't use checkpoint at all, it work well > wh
[jira] [Created] (FLINK-4706) Is there a way to load user classes first?
刘喆 created FLINK-4706: - Summary: Is there a way to load user classes first? Key: FLINK-4706 URL: https://issues.apache.org/jira/browse/FLINK-4706 Project: Flink Issue Type: New Feature Components: Core Affects Versions: 1.1.2, 1.2.0 Reporter: 刘喆 If some classes in the job jar different with flink run-time or yarn environment, it can't work. So how can we load classes in the job jar first? Is there some cli argument or some configuation? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4712) Implementing ranking predictions for ALS
Domokos Miklós Kelen created FLINK-4712: --- Summary: Implementing ranking predictions for ALS Key: FLINK-4712 URL: https://issues.apache.org/jira/browse/FLINK-4712 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Domokos Miklós Kelen We started working on implementing ranking predictions for recommender systems. Ranking prediction means that beside predicting scores for user-item pairs, the recommender system is able to recommend a top K list for the users. Details: In practice, this would mean finding the K items for a particular user with the highest predicted rating. It should be possible also to specify whether to exclude the already seen items from a particular user's toplist. (See for example the 'exclude_known' setting of [Graphlab Create's ranking factorization recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]. The output of the topK recommendation function could be in the form of DataSet[(Int,Int,Int)], meaning (user, item, rank), similar to Graphlab Create's output. However, this is arguable: follow up work includes implementing ranking recommendation evaluation metrics (such as precision@k, recall@k, ndcg@k), similar to [Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. It would be beneficial if we were able to design the API such that it could be included in the proposed evaluation framework (see [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it neccessary to consider the possible output type DataSet[(Int, Array[Int])] or DataSet[(Int, Array[(Int,Double)])] meaning (user, array of items), possibly including the predicted scores as well. See [issue todo] for details. Another question arising is whether to provide this function as a member of the ALS class, as a switch-kind of parameter to the ALS implementation (meaning the model is either a rating or a ranking recommender model) or in some other way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4712) Implementing ranking predictions for ALS
[ https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Domokos Miklós Kelen updated FLINK-4712: Description: We started working on implementing ranking predictions for recommender systems. Ranking prediction means that beside predicting scores for user-item pairs, the recommender system is able to recommend a top K list for the users. Details: In practice, this would mean finding the K items for a particular user with the highest predicted rating. It should be possible also to specify whether to exclude the already seen items from a particular user's toplist. (See for example the 'exclude_known' setting of [Graphlab Create's ranking factorization recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend] ). The output of the topK recommendation function could be in the form of {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab Create's output. However, this is arguable: follow up work includes implementing ranking recommendation evaluation metrics (such as precision@k, recall@k, ndcg@k), similar to [Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. It would be beneficial if we were able to design the API such that it could be included in the proposed evaluation framework (see [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it neccessary to consider the possible output type {{DataSet[(Int, Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, array of items), possibly including the predicted scores as well. See [issue todo] for details. Another question arising is whether to provide this function as a member of the ALS class, as a switch-kind of parameter to the ALS implementation (meaning the model is either a rating or a ranking recommender model) or in some other way. was: We started working on implementing ranking predictions for recommender systems. Ranking prediction means that beside predicting scores for user-item pairs, the recommender system is able to recommend a top K list for the users. Details: In practice, this would mean finding the K items for a particular user with the highest predicted rating. It should be possible also to specify whether to exclude the already seen items from a particular user's toplist. (See for example the 'exclude_known' setting of [Graphlab Create's ranking factorization recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]. The output of the topK recommendation function could be in the form of {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab Create's output. However, this is arguable: follow up work includes implementing ranking recommendation evaluation metrics (such as precision@k, recall@k, ndcg@k), similar to [Spark's implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. It would be beneficial if we were able to design the API such that it could be included in the proposed evaluation framework (see [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it neccessary to consider the possible output type {{DataSet[(Int, Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, array of items), possibly including the predicted scores as well. See [issue todo] for details. Another question arising is whether to provide this function as a member of the ALS class, as a switch-kind of parameter to the ALS implementation (meaning the model is either a rating or a ranking recommender model) or in some other way. > Implementing ranking predictions for ALS > > > Key: FLINK-4712 > URL: https://issues.apache.org/jira/browse/FLINK-4712 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Domokos Miklós Kelen > > We started working on implementing ranking predictions for recommender > systems. Ranking prediction means that beside predicting scores for user-item > pairs, the recommender system is able to recommend a top K list for the users. > Details: > In practice, this would mean finding the K items for a particular user with > the highest predicted rating. It should be possible also to specify whether > to excl