[jira] [Created] (FLINK-1432) CombineTaskTest.testCancelCombineTaskSorting sometimes fails
Robert Metzger created FLINK-1432: - Summary: CombineTaskTest.testCancelCombineTaskSorting sometimes fails Key: FLINK-1432 URL: https://issues.apache.org/jira/browse/FLINK-1432 Project: Flink Issue Type: Bug Reporter: Robert Metzger We have a bunch of tests which fail only in rare cases on travis. https://s3.amazonaws.com/archive.travis-ci.org/jobs/47783455/log.txt {code} Exception in thread "Thread-17" java.lang.AssertionError: Canceling task failed: java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) at java.util.ArrayList$Itr.next(ArrayList.java:831) at org.apache.flink.runtime.memorymanager.DefaultMemoryManager.release(DefaultMemoryManager.java:290) at org.apache.flink.runtime.operators.GroupReduceCombineDriver.cancel(GroupReduceCombineDriver.java:221) at org.apache.flink.runtime.operators.testutils.DriverTestBase.cancel(DriverTestBase.java:272) at org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:60) at org.junit.Assert.fail(Assert.java:88) at org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:68) java.lang.NullPointerException at org.apache.flink.runtime.memorymanager.DefaultMemoryManager.release(DefaultMemoryManager.java:291) at org.apache.flink.runtime.operators.GroupReduceCombineDriver.cleanup(GroupReduceCombineDriver.java:213) at org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriverInternal(DriverTestBase.java:245) at org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriver(DriverTestBase.java:175) at org.apache.flink.runtime.operators.CombineTaskTest$1.run(CombineTaskTest.java:143) Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.172 sec <<< FAILURE! - in org.apache.flink.runtime.operators.CombineTaskTest testCancelCombineTaskSorting[0](org.apache.flink.runtime.operators.CombineTaskTest) Time elapsed: 1.023 sec <<< FAILURE! java.lang.AssertionError: Exception was thrown despite proper canceling. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.flink.runtime.operators.CombineTaskTest.testCancelCombineTaskSorting(CombineTaskTest.java:162) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
Robert Metzger created FLINK-1433: - Summary: Add HADOOP_CLASSPATH to start scripts Key: FLINK-1433 URL: https://issues.apache.org/jira/browse/FLINK-1433 Project: Flink Issue Type: Improvement Reporter: Robert Metzger With the Hadoop file system wrapper, its important to have access to the hadoop filesystem classes. The HADOOP_CLASSPATH seems to be a standard environment variable used by Hadoop for such libraries. Deployments like Google Compute Cloud set this variable containing the "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287259#comment-14287259 ] ASF GitHub Bot commented on FLINK-1352: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71002640 You are right @hsaputra, because I'm not sure which approach is the best. In the corresponding JIRA issue I have tried to give a summary of what I think are the pros and cons of indefinitely many registration tries vs. a limited number of tries and a constant pause in between tries vs. an increasing pause. Indefinitely many registration tries: Pros: If the JobManager becomes available at some point in time, then the TaskManager will definitely connect to it Cons: If the JobManager dies of some reason, then the TaskManager will linger around for all eternity or until it is stopped manually Limited number of tries: Pros: Will terminate itself after some time Cons: The time interval might be too short for the JobManager to get started Constant pause: Pros: Relatively quick response time Cons: Causing network traffic until the JobManager has been started Increasing pause: Pros: Reduction of network traffic if the JobManager takes a little bit longer to start Cons: Might delay the registration process if one interval was just missed > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71002640 You are right @hsaputra, because I'm not sure which approach is the best. In the corresponding JIRA issue I have tried to give a summary of what I think are the pros and cons of indefinitely many registration tries vs. a limited number of tries and a constant pause in between tries vs. an increasing pause. Indefinitely many registration tries: Pros: If the JobManager becomes available at some point in time, then the TaskManager will definitely connect to it Cons: If the JobManager dies of some reason, then the TaskManager will linger around for all eternity or until it is stopped manually Limited number of tries: Pros: Will terminate itself after some time Cons: The time interval might be too short for the JobManager to get started Constant pause: Pros: Relatively quick response time Cons: Causing network traffic until the JobManager has been started Increasing pause: Pros: Reduction of network traffic if the JobManager takes a little bit longer to start Cons: Might delay the registration process if one interval was just missed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1372] [runtime] Fixes Akka logging
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/329#issuecomment-71003585 Ok, I'll merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1372) TaskManager and JobManager do not log startup settings any more
[ https://issues.apache.org/jira/browse/FLINK-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287269#comment-14287269 ] ASF GitHub Bot commented on FLINK-1372: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/329#issuecomment-71003585 Ok, I'll merge it. > TaskManager and JobManager do not log startup settings any more > --- > > Key: FLINK-1372 > URL: https://issues.apache.org/jira/browse/FLINK-1372 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > In prior versions, the jobmanager and taskmanager logged a lot of startup > options: > - Environment > - ports > - memory configuration > - network configuration > Currently, they log very little. We should add the logging back in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71003641 Thanks for the summary, Till :) On 22 Jan 2015, at 11:48, Till Rohrmann wrote: > Indefinitely many registration tries: > Pros: If the JobManager becomes available at some point in time, then the TaskManager will definitely connect to it > Cons: If the JobManager dies of some reason, then the TaskManager will linger around for all eternity or until it is stopped manually I am against this as the lingering around is imo problematic. > Limited number of tries: > Pros: Will terminate itself after some time > Cons: The time interval might be too short for the JobManager to get started > > Constant pause: > Pros: Relatively quick response time > Cons: Causing network traffic until the JobManager has been started > > Increasing pause: > Pros: Reduction of network traffic if the JobManager takes a little bit longer to start > Cons: Might delay the registration process if one interval was just missed Maybe keep the current strategy (n times constant pause c) and then start backing off? Has this been reported as a problem in a setup? Since this is not very complicated, but it's hard to find a heuristic to match all use cases, we might just implement all strategies, keep the current as default and make it configurable.= --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287270#comment-14287270 ] ASF GitHub Bot commented on FLINK-1352: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71003641 Thanks for the summary, Till :) On 22 Jan 2015, at 11:48, Till Rohrmann wrote: > Indefinitely many registration tries: > Pros: If the JobManager becomes available at some point in time, then the TaskManager will definitely connect to it > Cons: If the JobManager dies of some reason, then the TaskManager will linger around for all eternity or until it is stopped manually I am against this as the lingering around is imo problematic. > Limited number of tries: > Pros: Will terminate itself after some time > Cons: The time interval might be too short for the JobManager to get started > > Constant pause: > Pros: Relatively quick response time > Cons: Causing network traffic until the JobManager has been started > > Increasing pause: > Pros: Reduction of network traffic if the JobManager takes a little bit longer to start > Cons: Might delay the registration process if one interval was just missed Maybe keep the current strategy (n times constant pause c) and then start backing off? Has this been reported as a problem in a setup? Since this is not very complicated, but it's hard to find a heuristic to match all use cases, we might just implement all strategies, keep the current as default and make it configurable.= > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1372] [runtime] Fixes Akka logging
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/329 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1372) TaskManager and JobManager do not log startup settings any more
[ https://issues.apache.org/jira/browse/FLINK-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287287#comment-14287287 ] ASF GitHub Bot commented on FLINK-1372: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/329 > TaskManager and JobManager do not log startup settings any more > --- > > Key: FLINK-1372 > URL: https://issues.apache.org/jira/browse/FLINK-1372 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > In prior versions, the jobmanager and taskmanager logged a lot of startup > options: > - Environment > - ports > - memory configuration > - network configuration > Currently, they log very little. We should add the logging back in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1434) Web interface cannot be used to run streaming programs
Gyula Fora created FLINK-1434: - Summary: Web interface cannot be used to run streaming programs Key: FLINK-1434 URL: https://issues.apache.org/jira/browse/FLINK-1434 Project: Flink Issue Type: Bug Components: Streaming, Webfrontend Affects Versions: 0.9 Reporter: Gyula Fora Flink streaming programs currently cannot be submitted through the web client. When you try run the jar you get a ProgramInvocationException. The reason for this might be that streaming programs completely bypass the use of Plans for job execution and the streaming execution environment directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1434) Web interface cannot be used to run streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287289#comment-14287289 ] Gyula Fora commented on FLINK-1434: --- This issue is also linked with https://issues.apache.org/jira/browse/FLINK-1401 > Web interface cannot be used to run streaming programs > -- > > Key: FLINK-1434 > URL: https://issues.apache.org/jira/browse/FLINK-1434 > Project: Flink > Issue Type: Bug > Components: Streaming, Webfrontend >Affects Versions: 0.9 >Reporter: Gyula Fora > > Flink streaming programs currently cannot be submitted through the web > client. When you try run the jar you get a ProgramInvocationException. > The reason for this might be that streaming programs completely bypass the > use of Plans for job execution and the streaming execution environment > directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1372) TaskManager and JobManager do not log startup settings any more
[ https://issues.apache.org/jira/browse/FLINK-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-1372. Resolution: Fixed Fixed in 631b6eb804dc4a0236198574a8f8011ab2b6c8c2 > TaskManager and JobManager do not log startup settings any more > --- > > Key: FLINK-1372 > URL: https://issues.apache.org/jira/browse/FLINK-1372 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > In prior versions, the jobmanager and taskmanager logged a lot of startup > options: > - Environment > - ports > - memory configuration > - network configuration > Currently, they log very little. We should add the logging back in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1435) TaskManager does not log missing memory error on start up
Malte Schwarzer created FLINK-1435: -- Summary: TaskManager does not log missing memory error on start up Key: FLINK-1435 URL: https://issues.apache.org/jira/browse/FLINK-1435 Project: Flink Issue Type: Bug Components: TaskManager Affects Versions: 0.7.0-incubating Reporter: Malte Schwarzer Priority: Minor When using bin/start-cluster.sh to start TaskManagers and a worker node is failing to start because of missing memory, you do not receive any error messages in log files. Worker node has only 15000M memory available, but it is configured with Maximum heap size: 4 MiBytes. Task manager does not join the cluster. Process seems to stuck. Last line of log looks like this: ... - Initializing memory manager with 24447 megabytes of memory. Page size is 32768 bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malte Schwarzer updated FLINK-1435: --- Priority: Major (was: Minor) > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer > Labels: memorymanager > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malte Schwarzer updated FLINK-1435: --- Description: When using bin/start-cluster.sh to start TaskManagers and a worker node is failing to start because of missing memory, you do not receive any error messages in log files. Worker node has only 15000M memory available, but it is configured with Maximum heap size: 4 MiBytes. Task manager does not join the cluster. Process hangs. Last lines of log looks like this: ... ... - - Starting with 12 incoming and 12 outgoing connection threads. ... - Setting low water mark to 16384 and high water mark to 32768 bytes. ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. ... - Using 0.7 of the free heap space for managed memory. ... - Initializing memory manager with 24447 megabytes of memory. Page size is 32768 bytes. (END) Error message about not enough memory is missing. was: When using bin/start-cluster.sh to start TaskManagers and a worker node is failing to start because of missing memory, you do not receive any error messages in log files. Worker node has only 15000M memory available, but it is configured with Maximum heap size: 4 MiBytes. Task manager does not join the cluster. Process seems to stuck. Last line of log looks like this: ... - Initializing memory manager with 24447 megabytes of memory. Page size is 32768 bytes. > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer >Priority: Minor > Labels: memorymanager > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malte Schwarzer updated FLINK-1435: --- Priority: Minor (was: Major) > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer >Priority: Minor > Labels: memorymanager > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287340#comment-14287340 ] Ufuk Celebi commented on FLINK-1435: Thanks for reporting the issue. I agree that there should be an informative error message. Currently, the memory manager is initialized from the config value and tries to allocate all memory as byte buffers. At some point it just fails, when not enough memory is available. In the GlobalBufferPool, we actually catch and rethrow the OutOfMemoryException and log the missing amount of memory. We should add something similar as well or put the memory manager instantiation into a try catch block. Would you like to provide a patch for this? > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer >Priority: Minor > Labels: memorymanager > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1434) Web interface cannot be used to run streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287352#comment-14287352 ] Gyula Fora commented on FLINK-1434: --- This actually seems to be a ClassLoader issue, but I wonder why don't have the same issue when running from the command line client. [~uce] or [~StephanEwen] any ideas here? > Web interface cannot be used to run streaming programs > -- > > Key: FLINK-1434 > URL: https://issues.apache.org/jira/browse/FLINK-1434 > Project: Flink > Issue Type: Bug > Components: Streaming, Webfrontend >Affects Versions: 0.9 >Reporter: Gyula Fora > > Flink streaming programs currently cannot be submitted through the web > client. When you try run the jar you get a ProgramInvocationException. > The reason for this might be that streaming programs completely bypass the > use of Plans for job execution and the streaming execution environment > directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-629) Add support for null values to the java api
[ https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287392#comment-14287392 ] mustafa elbehery commented on FLINK-629: Its on Flink 9, I have to stick right now to Flink 8.0 .. I can not upgrade now > Add support for null values to the java api > --- > > Key: FLINK-629 > URL: https://issues.apache.org/jira/browse/FLINK-629 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Stephan Ewen >Assignee: Gyula Fora >Priority: Critical > Labels: github-import > Fix For: pre-apache > > Attachments: Selection_006.png, SimpleTweetInputFormat.java, > Tweet.java, model.tar.gz > > > Currently, many runtime operations fail when encountering a null value. Tuple > serialization should allow null fields. > I suggest to add a method to the tuples called `getFieldNotNull()` which > throws a meaningful exception when the accessed field is null. That way, we > simplify the logic of operators that should not dead with null fields, like > key grouping or aggregations. > Even though SQL allows grouping and aggregating of null values, I suggest to > exclude this from the java api, because the SQL semantics of aggregating null > fields are messy. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/629 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: enhancement, java api, > Milestone: Release 0.5.1 > Created at: Wed Mar 26 00:27:49 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-629) Add support for null values to the java api
[ https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287403#comment-14287403 ] Robert Metzger commented on FLINK-629: -- I think you can very easily backport the fix to Flink 0.8.0 > Add support for null values to the java api > --- > > Key: FLINK-629 > URL: https://issues.apache.org/jira/browse/FLINK-629 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Stephan Ewen >Assignee: Gyula Fora >Priority: Critical > Labels: github-import > Fix For: pre-apache > > Attachments: Selection_006.png, SimpleTweetInputFormat.java, > Tweet.java, model.tar.gz > > > Currently, many runtime operations fail when encountering a null value. Tuple > serialization should allow null fields. > I suggest to add a method to the tuples called `getFieldNotNull()` which > throws a meaningful exception when the accessed field is null. That way, we > simplify the logic of operators that should not dead with null fields, like > key grouping or aggregations. > Even though SQL allows grouping and aggregating of null values, I suggest to > exclude this from the java api, because the SQL semantics of aggregating null > fields are messy. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/629 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: enhancement, java api, > Milestone: Release 0.5.1 > Created at: Wed Mar 26 00:27:49 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-629) Add support for null values to the java api
[ https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287404#comment-14287404 ] Robert Metzger commented on FLINK-629: -- Maybe you can even cherry-pick the commit into the release-0.8 branch > Add support for null values to the java api > --- > > Key: FLINK-629 > URL: https://issues.apache.org/jira/browse/FLINK-629 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Stephan Ewen >Assignee: Gyula Fora >Priority: Critical > Labels: github-import > Fix For: pre-apache > > Attachments: Selection_006.png, SimpleTweetInputFormat.java, > Tweet.java, model.tar.gz > > > Currently, many runtime operations fail when encountering a null value. Tuple > serialization should allow null fields. > I suggest to add a method to the tuples called `getFieldNotNull()` which > throws a meaningful exception when the accessed field is null. That way, we > simplify the logic of operators that should not dead with null fields, like > key grouping or aggregations. > Even though SQL allows grouping and aggregating of null values, I suggest to > exclude this from the java api, because the SQL semantics of aggregating null > fields are messy. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/629 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: enhancement, java api, > Milestone: Release 0.5.1 > Created at: Wed Mar 26 00:27:49 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1410) Integrate Flink version variables into website layout
[ https://issues.apache.org/jira/browse/FLINK-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287423#comment-14287423 ] Max Michels commented on FLINK-1410: +1 We should make releasing as simple as possible. Perhaps we should encourage people who work on the website to use the Jekyll template system... > Integrate Flink version variables into website layout > - > > Key: FLINK-1410 > URL: https://issues.apache.org/jira/browse/FLINK-1410 > Project: Flink > Issue Type: Bug > Components: Project Website >Reporter: Robert Metzger > Attachments: ulli-howItShouldLook.png, ulli-howItactuallyLooks.png > > > The new website layout doesn't use the variables in the website configuration. > This makes releasing versions extremely hard, because one needs to manually > fix all the links for every version change. > The old layout of the website was respecting all variables which made > releasing a new version of the website a matter of minutes (changing one > file). > I would highly recommend to fix FLINK-1387 first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1436) Command-line interface verbose option (-v)
Max Michels created FLINK-1436: -- Summary: Command-line interface verbose option (-v) Key: FLINK-1436 URL: https://issues.apache.org/jira/browse/FLINK-1436 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Max Michels Priority: Trivial Let me run just a basic Flink job and add the verbose flag. It's a general option, so let me add it as a first parameter: > ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar > hdfs:///input hdfs:///output9 Invalid action! ./flink [GENERAL_OPTIONS] [ARGUMENTS] general options: -h,--help Show the help for the CLI Frontend. -v,--verbose Print more detailed error messages. Action "run" compiles and runs a program. Syntax: run [OPTIONS] "run" action arguments: -c,--classClass with the program entry point ("main" method or "getPlan()" method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelismThe parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action "info" displays information about a program. "info" action arguments: -c,--classClass with the program entry point ("main" method or "getPlan()" method. Only needed if the JAR file does not specify the class in its manifest. -e,--executionplan Show optimized execution plan of the program (JSON) -m,--jobmanager Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelismThe parallelism with which to run the program. Optional flag to override the default value specified in the configuration. Action "list" lists running and finished programs. "list" action arguments: -m,--jobmanagerAddress of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -r,--running Show running programs and their JobIDs -s,--scheduledShow scheduled prorgrams and their JobIDs Action "cancel" cancels a running program. "cancel" action arguments: -i,--jobid JobID of program to cancel -m,--jobmanagerAddress of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. What just happened? This results in a lot of output which is usually generated if you use the --help option on command-line tools. If your terminal window is large enough, then you will see a tiny message: "Please specify an action". I did specify an action. Strange. If you read the help messages carefully you see, that "general options" belong to the action. > ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar > hdfs:///input hdfs:///output9 For the sake of mitigating user frustration, let us also accept -v as the first argument. It may seem trivial for the day-to-day Flink user but makes a difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287477#comment-14287477 ] Max Michels commented on FLINK-1436: This is also related to the confusion in https://issues.apache.org/jira/browse/FLINK-1424 > Command-line interface verbose option (-v) > -- > > Key: FLINK-1436 > URL: https://issues.apache.org/jira/browse/FLINK-1436 > Project: Flink > Issue Type: Improvement > Components: Start-Stop Scripts >Reporter: Max Michels >Priority: Trivial > Labels: usability > > Let me run just a basic Flink job and add the verbose flag. It's a general > option, so let me add it as a first parameter: > > ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > Invalid action! > ./flink [GENERAL_OPTIONS] [ARGUMENTS] > general options: > -h,--help Show the help for the CLI Frontend. > -v,--verbose Print more detailed error messages. > Action "run" compiles and runs a program. > Syntax: run [OPTIONS] > "run" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "info" displays information about a program. > "info" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -e,--executionplan Show optimized execution plan of the > program (JSON) > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "list" lists running and finished programs. > "list" action arguments: > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > -r,--running Show running programs and their JobIDs > -s,--scheduledShow scheduled prorgrams and their JobIDs > Action "cancel" cancels a running program. > "cancel" action arguments: > -i,--jobid JobID of program to cancel > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > What just happened? This results in a lot of output which is usually > generated if you use the --help option on command-line tools. If your > terminal window is large enough, then you will see a tiny message: > "Please specify an action". I did specify an action. Strange. If you read the > help messages carefully you see, that "general options" belong to the action. > > ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > For the sake of mitigating user frustration, let us also accept -v as the > first argument. It may seem trivial for the day-to-day Flink user but makes a > difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287509#comment-14287509 ] Robert Metzger commented on FLINK-1436: --- I totally agree with you. Also, it would be nice if the cli tool would say what it got as an action. In your case for example "invalid action: -v" I had a similar issue with the jar file, the only error message I've got as "Invalid jar file" or so. It would have been nicer if the output was "org.apache.flink.Wordcount is not a valid file location". In addition to that, I don't see a reason why there is a "-v" option at all. I'm always using "-v" because I want to see the exceptions which happen. So I vote to a) improve the error messages b) remove the verbose flag and make it always verbose c) do not show the full help text on errors. > Command-line interface verbose option (-v) > -- > > Key: FLINK-1436 > URL: https://issues.apache.org/jira/browse/FLINK-1436 > Project: Flink > Issue Type: Improvement > Components: Start-Stop Scripts >Reporter: Max Michels >Priority: Trivial > Labels: usability > > Let me run just a basic Flink job and add the verbose flag. It's a general > option, so let me add it as a first parameter: > > ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > Invalid action! > ./flink [GENERAL_OPTIONS] [ARGUMENTS] > general options: > -h,--help Show the help for the CLI Frontend. > -v,--verbose Print more detailed error messages. > Action "run" compiles and runs a program. > Syntax: run [OPTIONS] > "run" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "info" displays information about a program. > "info" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -e,--executionplan Show optimized execution plan of the > program (JSON) > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "list" lists running and finished programs. > "list" action arguments: > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > -r,--running Show running programs and their JobIDs > -s,--scheduledShow scheduled prorgrams and their JobIDs > Action "cancel" cancels a running program. > "cancel" action arguments: > -i,--jobid JobID of program to cancel > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > What just happened? This results in a lot of output which is usually > generated if you use the --help option on command-line tools. If your > terminal window is large enough, then you will see a tiny message: > "Please specify an action
[jira] [Commented] (FLINK-1434) Web interface cannot be used to run streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287514#comment-14287514 ] Gyula Fora commented on FLINK-1434: --- It seems like I was unaware of the OptimizerPlanEnvironment when I implemented the StreamExecutionEnvironment so it is probably trying to start a minicluster when invoked from the webclient. > Web interface cannot be used to run streaming programs > -- > > Key: FLINK-1434 > URL: https://issues.apache.org/jira/browse/FLINK-1434 > Project: Flink > Issue Type: Bug > Components: Streaming, Webfrontend >Affects Versions: 0.9 >Reporter: Gyula Fora > > Flink streaming programs currently cannot be submitted through the web > client. When you try run the jar you get a ProgramInvocationException. > The reason for this might be that streaming programs completely bypass the > use of Plans for job execution and the streaming execution environment > directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1424) bin/flink run does not recognize -c parameter anymore
[ https://issues.apache.org/jira/browse/FLINK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287517#comment-14287517 ] Carsten Brandt commented on FLINK-1424: --- In general I could but currently I'm very busy so I can not say if or when I can do it. > bin/flink run does not recognize -c parameter anymore > - > > Key: FLINK-1424 > URL: https://issues.apache.org/jira/browse/FLINK-1424 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: master >Reporter: Carsten Brandt > > bin/flink binary does not recognize `-c` parameter anymore which specifies > the class to run: > {noformat} > $ ./flink run "/path/to/target/impro3-ws14-flink-1.0-SNAPSHOT.jar" -c > de.tu_berlin.impro3.flink.etl.FollowerGraphGenerator /tmp/flink/testgraph.txt > 1 > usage: emma-experiments-impro3-ss14-flink >[-?] > emma-experiments-impro3-ss14-flink: error: unrecognized arguments: '-c' > {noformat} > before this command worked fine and executed the job. > I tracked it down to the following commit using `git bisect`: > {noformat} > 93eadca782ee8c77f89609f6d924d73021dcdda9 is the first bad commit > commit 93eadca782ee8c77f89609f6d924d73021dcdda9 > Author: Alexander Alexandrov > Date: Wed Dec 24 13:49:56 2014 +0200 > [FLINK-1027] [cli] Added support for '--' and '-' prefixed tokens in CLI > program arguments. > > This closes #278 > :04 04 a1358e6f7fe308b4d51a47069f190a29f87fdeda > d6f11bbc9444227d5c6297ec908e44b9644289a9 Mflink-clients > {noformat} > https://github.com/apache/flink/commit/93eadca782ee8c77f89609f6d924d73021dcdda9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1437) Bug in PojoSerializer's copy() method
Timo Walther created FLINK-1437: --- Summary: Bug in PojoSerializer's copy() method Key: FLINK-1437 URL: https://issues.apache.org/jira/browse/FLINK-1437 Project: Flink Issue Type: Bug Components: Java API Reporter: Timo Walther Assignee: Timo Walther The PojoSerializer's {{copy()}} method does not work properly with {{null}} values. An exception could look like: {code} Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: null at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792) Caused by: java.io.EOFException at org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270) at org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277) at org.apache.flink.types.StringValue.copyString(StringValue.java:839) at org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261) at org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303) at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788) {code} I'm working on a fix for that... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287776#comment-14287776 ] Stephan Ewen commented on FLINK-1433: - +1 that seems very important > Add HADOOP_CLASSPATH to start scripts > - > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287787#comment-14287787 ] Stephan Ewen commented on FLINK-1435: - We should make sure that all errors in the JobManager and TaskManager setup are logged. I would suggest to put a "try{} catch (Throwable ) {} " block around the constructor call in the main method. BTW: You may find the exception in the {{taskmanager.out}} file, if not in the log file. But it should absolutely go into the log file. > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer >Priority: Minor > Labels: memorymanager > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method
[ https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287801#comment-14287801 ] Stephan Ewen commented on FLINK-1437: - Good catch. The POJO serializer can probably handle this, without need to update all individual serializers? > Bug in PojoSerializer's copy() method > - > > Key: FLINK-1437 > URL: https://issues.apache.org/jira/browse/FLINK-1437 > Project: Flink > Issue Type: Bug > Components: Java API >Reporter: Timo Walther >Assignee: Timo Walther > > The PojoSerializer's {{copy()}} method does not work properly with {{null}} > values. An exception could look like: > {code} > Caused by: java.io.IOException: Thread 'SortMerger spilling thread' > terminated due to an exception: null > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792) > Caused by: java.io.EOFException > at > org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83) > at > org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159) > at > org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270) > at > org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277) > at org.apache.flink.types.StringValue.copyString(StringValue.java:839) > at > org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261) > at > org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449) > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303) > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788) > {code} > I'm working on a fix for that... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/324#issuecomment-71079720 Can I get +1 for this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/324#issuecomment-71103567 You have my +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/324#issuecomment-71119613 Cool, thanks @StephanEwen, if no one beats me merging I will do this EOD today --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288533#comment-14288533 ] Stephan Ewen commented on FLINK-1436: - +1 for all three suggestions from robert > Command-line interface verbose option (-v) > -- > > Key: FLINK-1436 > URL: https://issues.apache.org/jira/browse/FLINK-1436 > Project: Flink > Issue Type: Improvement > Components: Start-Stop Scripts >Reporter: Max Michels >Priority: Trivial > Labels: usability > > Let me run just a basic Flink job and add the verbose flag. It's a general > option, so let me add it as a first parameter: > > ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > Invalid action! > ./flink [GENERAL_OPTIONS] [ARGUMENTS] > general options: > -h,--help Show the help for the CLI Frontend. > -v,--verbose Print more detailed error messages. > Action "run" compiles and runs a program. > Syntax: run [OPTIONS] > "run" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "info" displays information about a program. > "info" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -e,--executionplan Show optimized execution plan of the > program (JSON) > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "list" lists running and finished programs. > "list" action arguments: > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > -r,--running Show running programs and their JobIDs > -s,--scheduledShow scheduled prorgrams and their JobIDs > Action "cancel" cancels a running program. > "cancel" action arguments: > -i,--jobid JobID of program to cancel > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > What just happened? This results in a lot of output which is usually > generated if you use the --help option on command-line tools. If your > terminal window is large enough, then you will see a tiny message: > "Please specify an action". I did specify an action. Strange. If you read the > help messages carefully you see, that "general options" belong to the action. > > ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > For the sake of mitigating user frustration, let us also accept -v as the > first argument. It may seem trivial for the day-to-day Flink user but makes a > difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1436) Command-line interface verbose option (-v)
[ https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-1436: Labels: starter usability (was: usability) > Command-line interface verbose option (-v) > -- > > Key: FLINK-1436 > URL: https://issues.apache.org/jira/browse/FLINK-1436 > Project: Flink > Issue Type: Improvement > Components: Start-Stop Scripts >Reporter: Max Michels >Priority: Trivial > Labels: starter, usability > > Let me run just a basic Flink job and add the verbose flag. It's a general > option, so let me add it as a first parameter: > > ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > Invalid action! > ./flink [GENERAL_OPTIONS] [ARGUMENTS] > general options: > -h,--help Show the help for the CLI Frontend. > -v,--verbose Print more detailed error messages. > Action "run" compiles and runs a program. > Syntax: run [OPTIONS] > "run" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "info" displays information about a program. > "info" action arguments: > -c,--classClass with the program entry point > ("main" > method or "getPlan()" method. Only > needed > if the JAR file does not specify the > class > in its manifest. > -e,--executionplan Show optimized execution plan of the > program (JSON) > -m,--jobmanager Address of the JobManager (master) to > which to connect. Use this flag to > connect > to a different JobManager than the one > specified in the configuration. > -p,--parallelismThe parallelism with which to run the > program. Optional flag to override the > default value specified in the > configuration. > Action "list" lists running and finished programs. > "list" action arguments: > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > -r,--running Show running programs and their JobIDs > -s,--scheduledShow scheduled prorgrams and their JobIDs > Action "cancel" cancels a running program. > "cancel" action arguments: > -i,--jobid JobID of program to cancel > -m,--jobmanagerAddress of the JobManager (master) to which >to connect. Use this flag to connect to a >different JobManager than the one specified >in the configuration. > What just happened? This results in a lot of output which is usually > generated if you use the --help option on command-line tools. If your > terminal window is large enough, then you will see a tiny message: > "Please specify an action". I did specify an action. Strange. If you read the > help messages carefully you see, that "general options" belong to the action. > > ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar > > hdfs:///input hdfs:///output9 > For the sake of mitigating user frustration, let us also accept -v as the > first argument. It may seem trivial for the day-to-day Flink user but makes a > difference for a novice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1435) TaskManager does not log missing memory error on start up
[ https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-1435: Labels: memorymanager starter (was: memorymanager) > TaskManager does not log missing memory error on start up > - > > Key: FLINK-1435 > URL: https://issues.apache.org/jira/browse/FLINK-1435 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 0.7.0-incubating >Reporter: Malte Schwarzer >Priority: Minor > Labels: memorymanager, starter > > When using bin/start-cluster.sh to start TaskManagers and a worker node is > failing to start because of missing memory, you do not receive any error > messages in log files. > Worker node has only 15000M memory available, but it is configured with > Maximum heap size: 4 MiBytes. Task manager does not join the cluster. > Process hangs. > Last lines of log looks like this: > ... > ... - - Starting with 12 incoming and 12 outgoing connection threads. > ... - Setting low water mark to 16384 and high water mark to 32768 bytes. > ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap > arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216. > ... - Using 0.7 of the free heap space for managed memory. > ... - Initializing memory manager with 24447 megabytes of memory. Page size > is 32768 bytes. > (END) > Error message about not enough memory is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1424) bin/flink run does not recognize -c parameter anymore
[ https://issues.apache.org/jira/browse/FLINK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288543#comment-14288543 ] Stephan Ewen commented on FLINK-1424: - [~aalexandrov] was also suggesting to use a different CLI library in [FLINK-1347] Maybe that could fix this as well. > bin/flink run does not recognize -c parameter anymore > - > > Key: FLINK-1424 > URL: https://issues.apache.org/jira/browse/FLINK-1424 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: master >Reporter: Carsten Brandt > > bin/flink binary does not recognize `-c` parameter anymore which specifies > the class to run: > {noformat} > $ ./flink run "/path/to/target/impro3-ws14-flink-1.0-SNAPSHOT.jar" -c > de.tu_berlin.impro3.flink.etl.FollowerGraphGenerator /tmp/flink/testgraph.txt > 1 > usage: emma-experiments-impro3-ss14-flink >[-?] > emma-experiments-impro3-ss14-flink: error: unrecognized arguments: '-c' > {noformat} > before this command worked fine and executed the job. > I tracked it down to the following commit using `git bisect`: > {noformat} > 93eadca782ee8c77f89609f6d924d73021dcdda9 is the first bad commit > commit 93eadca782ee8c77f89609f6d924d73021dcdda9 > Author: Alexander Alexandrov > Date: Wed Dec 24 13:49:56 2014 +0200 > [FLINK-1027] [cli] Added support for '--' and '-' prefixed tokens in CLI > program arguments. > > This closes #278 > :04 04 a1358e6f7fe308b4d51a47069f190a29f87fdeda > d6f11bbc9444227d5c6297ec908e44b9644289a9 Mflink-clients > {noformat} > https://github.com/apache/flink/commit/93eadca782ee8c77f89609f6d924d73021dcdda9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71133987 I am not sure that the infinite number of tries is actually bad. This sort of depends on the situation, I guess: - On YARN, it may make sense, because the node will then go back into the pool of available resource - On standalone, it will anyways be there for Flink, so the TaskManager might as well keep trying to offer itself for work. Think of a network partitioning event - after the partitions re-joined, the cluster should work as a whole again. How about the following: We have a config parameter how long nodes should attempt to register. YARN could set a timeout (say 2-5 minutes), while by default, the timeout is infinite. Concerning the attempt pause: Having attempts with exponential backoff (and a cap) is the common thing (and I think it was the default before). Start with a 50ms pause and double it each attempt and cap it at 1 or 2 minutes or so. If you miss early attempts, the pause will not be long. If you missed an all attempts within the first second, you are guaranteed to not wait more than twice as long as you already waited anyways. For the sake of transparency and making sure that the states are actually in sync: How about we have three response messages for the registration attempt: 1. Refused (for whatever reason, the message should have a string that the TM can log) 2. Accepted (with the assigned ID) 3. Already registered (with the assigned ID) - The current logic handles this correctly as well, but this will allow us to log better at the TaskManager and debug problems there much better. Since this is a mechanism which may have weird cornercase behavior, it would be good to know as much about what was happening as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288569#comment-14288569 ] ASF GitHub Bot commented on FLINK-1352: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71133987 I am not sure that the infinite number of tries is actually bad. This sort of depends on the situation, I guess: - On YARN, it may make sense, because the node will then go back into the pool of available resource - On standalone, it will anyways be there for Flink, so the TaskManager might as well keep trying to offer itself for work. Think of a network partitioning event - after the partitions re-joined, the cluster should work as a whole again. How about the following: We have a config parameter how long nodes should attempt to register. YARN could set a timeout (say 2-5 minutes), while by default, the timeout is infinite. Concerning the attempt pause: Having attempts with exponential backoff (and a cap) is the common thing (and I think it was the default before). Start with a 50ms pause and double it each attempt and cap it at 1 or 2 minutes or so. If you miss early attempts, the pause will not be long. If you missed an all attempts within the first second, you are guaranteed to not wait more than twice as long as you already waited anyways. For the sake of transparency and making sure that the states are actually in sync: How about we have three response messages for the registration attempt: 1. Refused (for whatever reason, the message should have a string that the TM can log) 2. Accepted (with the assigned ID) 3. Already registered (with the assigned ID) - The current logic handles this correctly as well, but this will allow us to log better at the TaskManager and debug problems there much better. Since this is a mechanism which may have weird cornercase behavior, it would be good to know as much about what was happening as possible. > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/324 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/324#issuecomment-71134892 As per recommendation from @StephanEwen, will not merge this to 0.8 until we need to cherry-pick fixes related to these files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-629) Add support for null values to the java api
[ https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288646#comment-14288646 ] mustafa elbehery commented on FLINK-629: Its working now, Thanks a lot robert for your support :) :) > Add support for null values to the java api > --- > > Key: FLINK-629 > URL: https://issues.apache.org/jira/browse/FLINK-629 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Stephan Ewen >Assignee: Gyula Fora >Priority: Critical > Labels: github-import > Fix For: pre-apache > > Attachments: Selection_006.png, SimpleTweetInputFormat.java, > Tweet.java, model.tar.gz > > > Currently, many runtime operations fail when encountering a null value. Tuple > serialization should allow null fields. > I suggest to add a method to the tuples called `getFieldNotNull()` which > throws a meaningful exception when the accessed field is null. That way, we > simplify the logic of operators that should not dead with null fields, like > key grouping or aggregations. > Even though SQL allows grouping and aggregating of null values, I suggest to > exclude this from the java api, because the SQL semantics of aggregating null > fields are messy. > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/629 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: enhancement, java api, > Milestone: Release 0.5.1 > Created at: Wed Mar 26 00:27:49 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288754#comment-14288754 ] ASF GitHub Bot commented on FLINK-1352: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71148024 Thanks for the explation @tillrohrmann +1 for exponential backoff approach. We can have max retries and max delay for each try as configurable configuration properties. > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71148024 Thanks for the explation @tillrohrmann +1 for exponential backoff approach. We can have max retries and max delay for each try as configurable configuration properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/326#issuecomment-71157010 I took the pull request and filtered the branch to move the files into the `flink-addons/flink-gelly` directory (that worked better for me than the subtree merge). Also, I prefixed all commit messages with the jira issue and component. You can find it here, it perfectly preserves all commit history. https://github.com/StephanEwen/incubator-flink/commits/gelly Let me know if I should merge it like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288905#comment-14288905 ] ASF GitHub Bot commented on FLINK-1201: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/326#issuecomment-71157010 I took the pull request and filtered the branch to move the files into the `flink-addons/flink-gelly` directory (that worked better for me than the subtree merge). Also, I prefixed all commit messages with the jira issue and component. You can find it here, it perfectly preserves all commit history. https://github.com/StephanEwen/incubator-flink/commits/gelly Let me know if I should merge it like that. > Graph API for Flink > > > Key: FLINK-1201 > URL: https://issues.apache.org/jira/browse/FLINK-1201 > Project: Flink > Issue Type: New Feature >Reporter: Kostas Tzoumas >Assignee: Vasia Kalavri > > This issue tracks the development of a Graph API/DSL for Flink. > Until the code is pushed to the Flink repository, collaboration is happening > here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/325#issuecomment-71157412 Looks good to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value
[ https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288912#comment-14288912 ] ASF GitHub Bot commented on FLINK-1353: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/325#issuecomment-71157412 Looks good to merge > Execution always uses DefaultAkkaAskTimeout, rather than the configured value > - > > Key: FLINK-1353 > URL: https://issues.apache.org/jira/browse/FLINK-1353 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/323#discussion_r23435527 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java --- @@ -237,6 +244,25 @@ private void checkKryoInitialized() { // Throwable and all subclasses should be serialized via java serialization kryo.addDefaultSerializer(Throwable.class, new JavaSerializer()); + // If the type we have to serialize as a GenricType is implementing SpecificRecordBase, + // we have to register the avro serializer + // This rule only applies if users explicitly use the GenericTypeInformation for the avro types + // usually, we are able to handle Avro POJOs with the POJO serializer. + if(SpecificRecordBase.class.isAssignableFrom(type)) { + ClassTag tag = scala.reflect.ClassTag$.MODULE$.apply(type); + this.kryo.register(type, com.twitter.chill.avro.AvroSerializer.SpecificRecordSerializer(tag)); + + } + // Avro POJOs contain java.util.List which have GenericData.Array as their runtime type + // because Kryo is not able to serialize them properly, we use this serializer for them + this.kryo.register(GenericData.Array.class, new SpecificInstanceCollectionSerializer(ArrayList.class)); --- End diff -- Should we make this registration conditional so that it only happens when we have encountered an Avro type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1391) Kryo fails to properly serialize avro collection types
[ https://issues.apache.org/jira/browse/FLINK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288922#comment-14288922 ] ASF GitHub Bot commented on FLINK-1391: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/323#discussion_r23435527 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java --- @@ -237,6 +244,25 @@ private void checkKryoInitialized() { // Throwable and all subclasses should be serialized via java serialization kryo.addDefaultSerializer(Throwable.class, new JavaSerializer()); + // If the type we have to serialize as a GenricType is implementing SpecificRecordBase, + // we have to register the avro serializer + // This rule only applies if users explicitly use the GenericTypeInformation for the avro types + // usually, we are able to handle Avro POJOs with the POJO serializer. + if(SpecificRecordBase.class.isAssignableFrom(type)) { + ClassTag tag = scala.reflect.ClassTag$.MODULE$.apply(type); + this.kryo.register(type, com.twitter.chill.avro.AvroSerializer.SpecificRecordSerializer(tag)); + + } + // Avro POJOs contain java.util.List which have GenericData.Array as their runtime type + // because Kryo is not able to serialize them properly, we use this serializer for them + this.kryo.register(GenericData.Array.class, new SpecificInstanceCollectionSerializer(ArrayList.class)); --- End diff -- Should we make this registration conditional so that it only happens when we have encountered an Avro type? > Kryo fails to properly serialize avro collection types > -- > > Key: FLINK-1391 > URL: https://issues.apache.org/jira/browse/FLINK-1391 > Project: Flink > Issue Type: Improvement >Affects Versions: 0.8, 0.9 >Reporter: Robert Metzger >Assignee: Robert Metzger > > Before FLINK-610, Avro was the default generic serializer. > Now, special types coming from Avro are handled by Kryo .. which seems to > cause errors like: > {code} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: > java.lang.NullPointerException > at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) > at > org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143) > at > org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244) > at > org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56) > at > org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71) > at > org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189) > at > org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176) > at > org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53) > at > org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/323#issuecomment-71157859 Looks like a good fix to me. One comment/question inline, otherwise good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1391) Kryo fails to properly serialize avro collection types
[ https://issues.apache.org/jira/browse/FLINK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288923#comment-14288923 ] ASF GitHub Bot commented on FLINK-1391: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/323#issuecomment-71157859 Looks like a good fix to me. One comment/question inline, otherwise good to merge. > Kryo fails to properly serialize avro collection types > -- > > Key: FLINK-1391 > URL: https://issues.apache.org/jira/browse/FLINK-1391 > Project: Flink > Issue Type: Improvement >Affects Versions: 0.8, 0.9 >Reporter: Robert Metzger >Assignee: Robert Metzger > > Before FLINK-610, Avro was the default generic serializer. > Now, special types coming from Avro are handled by Kryo .. which seems to > cause errors like: > {code} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: > java.lang.NullPointerException > at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) > at > org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143) > at > org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244) > at > org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56) > at > org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71) > at > org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189) > at > org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176) > at > org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53) > at > org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1392) Serializing Protobuf - issue 1
[ https://issues.apache.org/jira/browse/FLINK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288929#comment-14288929 ] ASF GitHub Bot commented on FLINK-1392: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/322#issuecomment-71158115 Per the discussion on the dev mailing list (http://mail-archives.apache.org/mod_mbox/flink-dev/201501.mbox/browser), should this be registered by the code that analyzes the generic type and sets up the serializer for the encountered (nested) types? [FLINK-1417] > Serializing Protobuf - issue 1 > -- > > Key: FLINK-1392 > URL: https://issues.apache.org/jira/browse/FLINK-1392 > Project: Flink > Issue Type: Bug >Reporter: Felix Neutatz >Assignee: Robert Metzger >Priority: Minor > > Hi, I started to experiment with Parquet using Protobuf. > When I use the standard Protobuf class: > com.twitter.data.proto.tutorial.AddressBookProtos > The code which I run, can be found here: > [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java] > I get the following exception: > {code:xml} > Exception in thread "main" java.lang.Exception: Deserializing the > InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: > Could not read the user code wrapper: Error while deserializing element from > collection > at > org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: > org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could > not read the user code wrapper: Error while deserializing element from > collection > at > org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285) > at > org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57) > ... 25 more > Caused by: java.io.IOException: Error while deserializing element from > collection > at > org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invok
[GitHub] flink pull request: [FLINK-1392] Add Kryo serializer for Protobuf
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/322#issuecomment-71158115 Per the discussion on the dev mailing list (http://mail-archives.apache.org/mod_mbox/flink-dev/201501.mbox/browser), should this be registered by the code that analyzes the generic type and sets up the serializer for the encountered (nested) types? [FLINK-1417] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1165] No createCollectionsEnvironment i...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/320#issuecomment-71158365 I vote to call it `createCollectionsEnvironment` in Java as well, so that the APIs are in sync, and we do not have a breaking change in the Scala API. Otherwise this is good to merge. Thanks, Ajay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1147][Java API] TypeInference on POJOs
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/315#issuecomment-71158420 +1 to merge from me as well. Go ahead, Timo... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1165) No createCollectionsEnvironment in Java API
[ https://issues.apache.org/jira/browse/FLINK-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288935#comment-14288935 ] ASF GitHub Bot commented on FLINK-1165: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/320#issuecomment-71158365 I vote to call it `createCollectionsEnvironment` in Java as well, so that the APIs are in sync, and we do not have a breaking change in the Scala API. Otherwise this is good to merge. Thanks, Ajay! > No createCollectionsEnvironment in Java API > --- > > Key: FLINK-1165 > URL: https://issues.apache.org/jira/browse/FLINK-1165 > Project: Flink > Issue Type: Improvement >Reporter: Till Rohrmann > > In the Scala API the ExecutionEnvironment has the method > createCollectionEnvironment but not in the Java API. We should stick to one > approach in both APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1147) TypeInference on POJOs
[ https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288937#comment-14288937 ] ASF GitHub Bot commented on FLINK-1147: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/315#issuecomment-71158420 +1 to merge from me as well. Go ahead, Timo... > TypeInference on POJOs > -- > > Key: FLINK-1147 > URL: https://issues.apache.org/jira/browse/FLINK-1147 > Project: Flink > Issue Type: Improvement > Components: Java API >Affects Versions: 0.7.0-incubating >Reporter: Stephan Ewen >Assignee: Timo Walther > > On Tuples, we currently use type inference that figures out the types of > output type variables relative to the input type variable. > We need a similar functionality for POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)