Re: [DISCUSS] how choose Scala and Java
I think you're referring to the implementation of some of Flink's modules, right? If that is the case, then the rule of thumb is that we want to use Java for the low level runtime implementations. For the API implementations it is a case to case decision. The Scala API, for example is of course implemented in Scala. For other APIs we tend to use Scala only if it gives a clear advantage over a Java implementation. If your question is more like which Flink API to use (either Java or Scala API), then it's completely up to you and your preferences. Cheers, Till On Wed, Sep 7, 2016 at 8:21 AM, 时某人 wrote: > Scala and Java mixed in the module. Some Flink API indeed make someone > confused. > What is rule about the current Scala and Java API at the first implement > time? > > > Thanks
Assign tasks from JIRA
Hello folks! I am novice in community, but I want to participate in community process. How I can assign https://issues.apache.org/jira/browse/FLINK-4506 to me? Best regards, Kirill Morozov EPAM mailto:kirill_moro...@epam.com
Re: Assign tasks from JIRA
Hi Kirill, welcome to the Flink community! I gave your JIRA account Contributor permissions for the FLINK JIRA. You should now be able to assign tasks to yourself. Best, Fabian 2016-09-07 9:46 GMT+02:00 Kirill Morozov : > Hello folks! > > I am novice in community, but I want to participate in community process. > How I can assign https://issues.apache.org/jira/browse/FLINK-4506 to me? > > Best regards, > Kirill Morozov > EPAM mailto:kirill_moro...@epam.com > >
Re: [DISCUSS] FLIP-11: Table API Stream Aggregations
Hi Fabian, Thanks for sharing your ideas. They all make sense to me. Regarding to reassigning timestamp, I do not have an use case. I come up with this because DataStream has a TimestampAssigner :) +1 for this FLIP. - Jark Wu > 在 2016年9月7日,下午2:59,Fabian Hueske 写道: > > Hi, > > thanks for your comments and questions! > Actually, you are bringing up the points that Timo and I discussed the most > when designing the FLIP ;-) > > - We also thought about the syntactic shortcut for running aggregates like > you proposed (table.groupBy(‘a).select(…)). Our motivation to not allow > this shortcut is to prevent users from accidentally performing a > "dangerous" operation. The problem with unbounded sliding row-windows is > that their state does never expire. If you have an evolving key space, you > will likely run into problems at some point because the operator state > grows too large. IMO, a row-window session is a better approach, because it > defines a timeout after which state can be discarded. groupBy.select is a > very common operation in batch but its semantics in streaming are very > different. In my opinion it makes sense to make users aware of these > differences through the API. > > - Reassigning timestamps and watermarks is a very delicate issue. You are > right, that Calcite exposes this field which is necessary due to the > semantics of SQL. However, also in Calcite you cannot freely choose the > timestamp attribute for streaming queries (it must be a monotone or > quasi-monotone attribute) which is hard to reason about (and guarantee) > after a few operators have been applied. Streaming tables in Flink will > likely have a time attribute which is identical to the initial rowtime. > However, Flink does modify timestamps internally, e.g., for records that > are emitted from time windows, in order to ensure that consecutive windows > perform as expected. Modify or reassign timestamps in the middle of a job > can result in unexpected results which are very hard to reason about. Do > you have a concrete use case in mind for reassigning timestamps? > > - The idea to represent rowtime and systime as object is good. Our > motivation to go for reserved Scala symbols was to have a uniform syntax > with windows over streaming and batch tables. On batch tables you can > compute time windows basically over every time attribute (they are treated > similar to grouping attributes with a bit of extra logic to extract the > grouping key for sliding and session windows). If you write window(Tumble > over 10.minutes on 'rowtime) on a streaming table, 'rowtime would indicate > event-time. On a batch table with a 'rowtime attribute, the same operator > would be internally converted into a group by. By going for the object > approach we would lose this compatibility (or would need to introduce an > additional column attribute to specifiy the window attribute for batch > tables). > > As usual some of the design decisions are based on preferences. > Do they make sense to you? Let me know what you think. > > Best, Fabian > > > 2016-09-07 5:12 GMT+02:00 Jark Wu : > >> Hi all, >> >> I'm on vacation for about five days , sorry to have missed this great FLIP. >> >> Yes, the non-windowed aggregates is a special case of row-window. And the >> proposal looks really good. Can we have a simplified form for the special >> case? Such as : >> table.groupBy(‘a).rowWindow(SlideRows.unboundedPreceding).select(…) >> can be simplified to table.groupBy(‘a).select(…). The latter will actually >> call the former. >> >> Another question is about the rowtime. As the FLIP said, DataStream and >> StreamTableSource is responsible to assign timestamps and watermarks, >> furthermore “rowtime” and “systemtime” are not real column. IMO, it is >> different with Calcite’s rowtime, which is a real column in the table. In >> FLIP's way, we will lose some flexibility. Because the timestamp column may >> be created after some transformations or join operation, not created at >> beginning. So why do we have to define rowtime at beginning? (because of >> watermark?) Can we have a way to define rowtime after source table like >> TimestampAssinger? >> >> Regarding to “allowLateness” method. I come up a trick that we can make >> ‘rowtime and ‘system to be a Scala object, not a symbol expression. The API >> will looks like this : >> >> window(Tumble over 10.minutes on rowtime allowLateness as ‘w) >> >> The implementation will look like this: >> >> class TumblingWindow(size: Expression) extends Window { >> def on(time: rowtime.type): TumblingEventTimeWindow = >> new TumblingEventTimeWindow(alias, ‘rowtime, size)// has >> allowLateness() method >> >> def on(time: systemtime.type): TumblingProcessingTimeWindow= >> new TumblingProcessingTimeWindow(alias, ‘systemtime, size) >> // hasn’t allowLateness() method >> } >> object rowtime >> object systemtime >> >> What do you think about this? >> >> - Jark Wu >> >>> 在 20
[jira] [Created] (FLINK-4588) Fix Merging of Covering Window in MergingWindowSet
Aljoscha Krettek created FLINK-4588: --- Summary: Fix Merging of Covering Window in MergingWindowSet Key: FLINK-4588 URL: https://issues.apache.org/jira/browse/FLINK-4588 Project: Flink Issue Type: Bug Components: Windowing Operators Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 1.0.4, 1.2.0, 1.1.3 Right now, when a new window gets merged that covers all of the existing window {{MergingWindowSet}} does not correctly set the state window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4589) Fix Merging of Covering Window in MergingWindowSet
Aljoscha Krettek created FLINK-4589: --- Summary: Fix Merging of Covering Window in MergingWindowSet Key: FLINK-4589 URL: https://issues.apache.org/jira/browse/FLINK-4589 Project: Flink Issue Type: Bug Components: Windowing Operators Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 1.0.4, 1.2.0, 1.1.3 Right now, when a new window gets merged that covers all of the existing window {{MergingWindowSet}} does not correctly set the state window. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4590) Some Table API tests are failing when debug lvl is set to DEBUG
Robert Metzger created FLINK-4590: - Summary: Some Table API tests are failing when debug lvl is set to DEBUG Key: FLINK-4590 URL: https://issues.apache.org/jira/browse/FLINK-4590 Project: Flink Issue Type: Bug Components: Table API & SQL Affects Versions: 1.2.0 Reporter: Robert Metzger For debugging another issue, I've set the log level on travis to DEBUG. After that, the Table API tests started failing {code} Failed tests: SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinus:175 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinus:175 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinus:175 Internal error: Error occurred while applying rule DataSetScanRule SetOperatorsITCase.testMinus:175 Internal error: Error occurred while applying rule DataSetScanRule {code} Probably Calcite is executing additional assertions depending on the debug level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4591) Select star does not work with grouping
Timo Walther created FLINK-4591: --- Summary: Select star does not work with grouping Key: FLINK-4591 URL: https://issues.apache.org/jira/browse/FLINK-4591 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: Timo Walther It would be consistent if this would also work: {{table.groupBy('*).select("*)}} Currently, the star only works in a plain select without grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4592) Fix flaky test ScalarFunctionsTest.testCurrentTimePoint
Timo Walther created FLINK-4592: --- Summary: Fix flaky test ScalarFunctionsTest.testCurrentTimePoint Key: FLINK-4592 URL: https://issues.apache.org/jira/browse/FLINK-4592 Project: Flink Issue Type: Bug Components: Table API & SQL Reporter: Timo Walther It seems that the test is still non deterministic. {code} org.apache.flink.api.table.expressions.ScalarFunctionsTest testCurrentTimePoint(org.apache.flink.api.table.expressions.ScalarFunctionsTest) Time elapsed: 0.083 sec <<< FAILURE! org.junit.ComparisonFailure: Wrong result for: AS(>=(CHAR_LENGTH(CAST(CURRENT_TIMESTAMP):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary" NOT NULL), 22), '_c0') expected:<[tru]e> but was:<[fals]e> at org.junit.Assert.assertEquals(Assert.java:115) at org.apache.flink.api.table.expressions.utils.ExpressionTestBase$$anonfun$evaluateExprs$1.apply(ExpressionTestBase.scala:126) at org.apache.flink.api.table.expressions.utils.ExpressionTestBase$$anonfun$evaluateExprs$1.apply(ExpressionTestBase.scala:123) at scala.collection.mutable.LinkedHashSet.foreach(LinkedHashSet.scala:87) at org.apache.flink.api.table.expressions.utils.ExpressionTestBase.evaluateExprs(ExpressionTestBase.scala:123) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4593) Fix PageRank algorithm example
Alexander Pivovarov created FLINK-4593: -- Summary: Fix PageRank algorithm example Key: FLINK-4593 URL: https://issues.apache.org/jira/browse/FLINK-4593 Project: Flink Issue Type: Bug Components: Project Website Reporter: Alexander Pivovarov Priority: Minor This page https://flink.apache.org/features.html shows the code which implements PageRank algorithm (Batch Processing Applications). I noticed couple bugs in the code Page class has pageId field Adjacency has just id but in the code I see {code}pages.join(adjacency).where("pageId").equalTo("pageId"){code} Also {code}Page(page.id, 0.15 / numPages){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4594) Validate lower bound in MathUtils.checkedDownCast
Greg Hogan created FLINK-4594: - Summary: Validate lower bound in MathUtils.checkedDownCast Key: FLINK-4594 URL: https://issues.apache.org/jira/browse/FLINK-4594 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.2.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial {{MathUtils.checkedDownCast}} only compares against the upper bound {{Integer.MAX_VALUE}}, which has worked with current usage. Rather than adding a second comparison we can replace {noformat} if (value > Integer.MAX_VALUE) { {noformat} with a cast and check {noformat} if ((int)value != value) { ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
release-1.1.2 does not exist on github
Hi Everyone I noticed that release-1.1.2 does not exist on github https://github.com/apache/flink/releases Regards Alex
[jira] [Created] (FLINK-4595) Close FileOutputStream in ParameterTool
Alexander Pivovarov created FLINK-4595: -- Summary: Close FileOutputStream in ParameterTool Key: FLINK-4595 URL: https://issues.apache.org/jira/browse/FLINK-4595 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 1.1.2 Reporter: Alexander Pivovarov Priority: Trivial ParameterTool and ParameterToolTest do not close FileOutputStream {code} defaultProps.store(new FileOutputStream(file), "Default file created by Flink's ParameterUtil.createPropertiesFile()"); {code} {code} props.store(new FileOutputStream(propertiesFile), "Test properties"); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re:Re: [DISCUSS] how choose Scala and Java
Hi, Till, Thanks for your clear reply. In fact the API of Java and Scala are not coordinating. User are easy confused by the Same class name of Java/Scala API such as `StreamExecutionEnvironment`. Since Scala can do almost what Java can, Why not use Scala only? Kafka's API looks good. https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/javaapi/consumer/SimpleConsumer.scala There is a little hard to maintain the Java and Scala API at the same time. Two source packages mean the separated implements each other. Can we re-consider such base design? At 2016-09-07 15:26:55, "Till Rohrmann" wrote: >I think you're referring to the implementation of some of Flink's modules, >right? > >If that is the case, then the rule of thumb is that we want to use Java for >the low level runtime implementations. For the API implementations it is a >case to case decision. The Scala API, for example is of course implemented >in Scala. For other APIs we tend to use Scala only if it gives a clear >advantage over a Java implementation. > >If your question is more like which Flink API to use (either Java or Scala >API), then it's completely up to you and your preferences. > >Cheers, >Till > >On Wed, Sep 7, 2016 at 8:21 AM, 时某人 wrote: > >> Scala and Java mixed in the module. Some Flink API indeed make someone >> confused. >> What is rule about the current Scala and Java API at the first implement >> time? >> >> >> Thanks
[jira] [Created] (FLINK-4596) Class not found exception when RESTART_STRATEGY is configured with fully qualified class name in the yaml
Nagarjun Guraja created FLINK-4596: -- Summary: Class not found exception when RESTART_STRATEGY is configured with fully qualified class name in the yaml Key: FLINK-4596 URL: https://issues.apache.org/jira/browse/FLINK-4596 Project: Flink Issue Type: Bug Reporter: Nagarjun Guraja CAUSE: createRestartStrategyFactory converts configured value of strategyname to lowercase and searches for class name using lowercased string. FIX: Do not lower case the strategy config value or just lowercase for the switch case alone -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: release-1.1.2 does not exist on github
Hi Alex, the problem is that the tag has not been created for the release. I'll add it. On Wed, Sep 7, 2016 at 11:09 PM, Alexander Pivovarov wrote: > Hi Everyone > > I noticed that release-1.1.2 does not exist on github > https://github.com/apache/flink/releases > > Regards > Alex >
Re: Re: [DISCUSS] how choose Scala and Java
The Java and Scala APIs are organized in different Maven modules and the Scala APIs are based on the respective Java API. The benefit of this design is to keep Scala dependencies out of the Java APIs which is requested by many users. The Java and Scala counterparts of the DataSet and DataStream APIs are mostly similar, but differ in some details. For instance, the Scala APIs support Scala Lambda functions and Scala Tuples and case classes which are not available in Java. Last but probably most importantly, the Flink community committed itself to API backward compatibility for all 1.x releases. Any major API changes could only be done in a 2.x release. Best, Fabian 2016-09-08 2:54 GMT+02:00 时某人 : > > > Hi, Till, Thanks for your clear reply. > > > In fact the API of Java and Scala are not coordinating. User are easy > confused by the Same class name of Java/Scala API such as > `StreamExecutionEnvironment`. > Since Scala can do almost what Java can, Why not use Scala only? Kafka's > API looks good. https://github.com/apache/kafka/blob/trunk/core/src/ > main/scala/kafka/javaapi/consumer/SimpleConsumer.scala > There is a little hard to maintain the Java and Scala API at the same > time. Two source packages mean the separated implements each other. > Can we re-consider such base design? > > At 2016-09-07 15:26:55, "Till Rohrmann" wrote: > >I think you're referring to the implementation of some of Flink's modules, > >right? > > > >If that is the case, then the rule of thumb is that we want to use Java > for > >the low level runtime implementations. For the API implementations it is a > >case to case decision. The Scala API, for example is of course implemented > >in Scala. For other APIs we tend to use Scala only if it gives a clear > >advantage over a Java implementation. > > > >If your question is more like which Flink API to use (either Java or Scala > >API), then it's completely up to you and your preferences. > > > >Cheers, > >Till > > > >On Wed, Sep 7, 2016 at 8:21 AM, 时某人 wrote: > > > >> Scala and Java mixed in the module. Some Flink API indeed make someone > >> confused. > >> What is rule about the current Scala and Java API at the first > implement > >> time? > >> > >> > >> Thanks >
[jira] [Created] (FLINK-4597) Improve Scalar Function section in Table API documentation
Jark Wu created FLINK-4597: -- Summary: Improve Scalar Function section in Table API documentation Key: FLINK-4597 URL: https://issues.apache.org/jira/browse/FLINK-4597 Project: Flink Issue Type: Improvement Components: Table API & SQL Reporter: Jark Wu Assignee: Jark Wu Priority: Minor The function signature in Scalar Function section is a little confusing. Because it's hard to distinguish keyword and parameters. Such as : {{EXTRACT(TIMEINTERVALUNIT FROM TEMPORAL)}}, user may not know TEMPORAL is a parameter after first glance. I propose to use {{<>}} around parameters, i.e. {{EXTRACT( FROM )}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4598) Support NULLIF in Table API
Jark Wu created FLINK-4598: -- Summary: Support NULLIF in Table API Key: FLINK-4598 URL: https://issues.apache.org/jira/browse/FLINK-4598 Project: Flink Issue Type: New Feature Components: Table API & SQL Reporter: Jark Wu Assignee: Jark Wu This could be a subtask of [FLINK-4549]. As Flink SQL has supported {{NULLIF}} implicitly. We should support it in Table API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)