Re: [DISCUSS] how choose Scala and Java

2016-09-07 Thread Till Rohrmann
I think you're referring to the implementation of some of Flink's modules,
right?

If that is the case, then the rule of thumb is that we want to use Java for
the low level runtime implementations. For the API implementations it is a
case to case decision. The Scala API, for example is of course implemented
in Scala. For other APIs we tend to use Scala only if it gives a clear
advantage over a Java implementation.

If your question is more like which Flink API to use (either Java or Scala
API), then it's completely up to you and your preferences.

Cheers,
Till

On Wed, Sep 7, 2016 at 8:21 AM, 时某人  wrote:

> Scala and Java mixed in the module. Some Flink API indeed make someone
> confused.
> What is rule about  the current Scala and Java API at the first implement
> time?
>
>
> Thanks


Assign tasks from JIRA

2016-09-07 Thread Kirill Morozov
Hello folks!

I am novice in community, but I want to participate in community process. How I 
can assign https://issues.apache.org/jira/browse/FLINK-4506 to me?

Best regards,
Kirill Morozov
EPAM mailto:kirill_moro...@epam.com



Re: Assign tasks from JIRA

2016-09-07 Thread Fabian Hueske
Hi Kirill,

welcome to the Flink community!

I gave your JIRA account Contributor permissions for the FLINK JIRA.
You should now be able to assign tasks to yourself.

Best, Fabian

2016-09-07 9:46 GMT+02:00 Kirill Morozov :

> Hello folks!
>
> I am novice in community, but I want to participate in community process.
> How I can assign https://issues.apache.org/jira/browse/FLINK-4506 to me?
>
> Best regards,
> Kirill Morozov
> EPAM mailto:kirill_moro...@epam.com
>
>


Re: [DISCUSS] FLIP-11: Table API Stream Aggregations

2016-09-07 Thread Jark Wu
Hi Fabian, 

Thanks for sharing your ideas. 

They all make sense to me. Regarding to reassigning timestamp, I do not have an 
use case. I come up with this because DataStream has a TimestampAssigner :)

+1 for this FLIP. 

- Jark Wu 

> 在 2016年9月7日,下午2:59,Fabian Hueske  写道:
> 
> Hi,
> 
> thanks for your comments and questions!
> Actually, you are bringing up the points that Timo and I discussed the most
> when designing the FLIP ;-)
> 
> - We also thought about the syntactic shortcut for running aggregates like
> you proposed (table.groupBy(‘a).select(…)). Our motivation to not allow
> this shortcut is to prevent users from accidentally performing a
> "dangerous" operation. The problem with unbounded sliding row-windows is
> that their state does never expire. If you have an evolving key space, you
> will likely run into problems at some point because the operator state
> grows too large. IMO, a row-window session is a better approach, because it
> defines a timeout after which state can be discarded. groupBy.select is a
> very common operation in batch but its semantics in streaming are very
> different. In my opinion it makes sense to make users aware of these
> differences through the API.
> 
> - Reassigning timestamps and watermarks is a very delicate issue. You are
> right, that Calcite exposes this field which is necessary due to the
> semantics of SQL. However, also in Calcite you cannot freely choose the
> timestamp attribute for streaming queries (it must be a monotone or
> quasi-monotone attribute) which is hard to reason about (and guarantee)
> after a few operators have been applied. Streaming tables in Flink will
> likely have a time attribute which is identical to the initial rowtime.
> However, Flink does modify timestamps internally, e.g., for records that
> are emitted from time windows, in order to ensure that consecutive windows
> perform as expected. Modify or reassign timestamps in the middle of a job
> can result in unexpected results which are very hard to reason about. Do
> you have a concrete use case in mind for reassigning timestamps?
> 
> - The idea to represent rowtime and systime as object is good. Our
> motivation to go for reserved Scala symbols was to have a uniform syntax
> with windows over streaming and batch tables. On batch tables you can
> compute time windows basically over every time attribute (they are treated
> similar to grouping attributes with a bit of extra logic to extract the
> grouping key for sliding and session windows). If you write window(Tumble
> over 10.minutes on 'rowtime) on a streaming table, 'rowtime would indicate
> event-time. On a batch table with a 'rowtime attribute, the same operator
> would be internally converted into a group by. By going for the object
> approach we would lose this compatibility (or would need to introduce an
> additional column attribute to specifiy the window attribute for batch
> tables).
> 
> As usual some of the design decisions are based on preferences.
> Do they make sense to you? Let me know what you think.
> 
> Best, Fabian
> 
> 
> 2016-09-07 5:12 GMT+02:00 Jark Wu :
> 
>> Hi all,
>> 
>> I'm on vacation for about five days , sorry to have missed this great FLIP.
>> 
>> Yes, the non-windowed aggregates is a special case of row-window. And the
>> proposal looks really good.  Can we have a simplified form for the special
>> case? Such as : 
>> table.groupBy(‘a).rowWindow(SlideRows.unboundedPreceding).select(…)
>> can be simplified to  table.groupBy(‘a).select(…). The latter will actually
>> call the former.
>> 
>> Another question is about the rowtime. As the FLIP said, DataStream and
>> StreamTableSource is responsible to assign timestamps and watermarks,
>> furthermore “rowtime” and “systemtime” are not real column. IMO, it is
>> different with Calcite’s rowtime, which is a real column in the table. In
>> FLIP's way, we will lose some flexibility. Because the timestamp column may
>> be created after some transformations or join operation, not created at
>> beginning. So why do we have to define rowtime at beginning? (because of
>> watermark?) Can we have a way to define rowtime after source table like
>> TimestampAssinger?
>> 
>> Regarding to “allowLateness” method. I come up a trick that we can make
>> ‘rowtime and ‘system to be a Scala object, not a symbol expression. The API
>> will looks like this :
>> 
>> window(Tumble over 10.minutes on rowtime allowLateness as ‘w)
>> 
>> The implementation will look like this:
>> 
>> class TumblingWindow(size: Expression) extends Window {
>>  def on(time: rowtime.type): TumblingEventTimeWindow =
>>  new TumblingEventTimeWindow(alias, ‘rowtime, size)// has
>> allowLateness() method
>> 
>>  def on(time: systemtime.type): TumblingProcessingTimeWindow=
>> new TumblingProcessingTimeWindow(alias, ‘systemtime, size)
>> // hasn’t allowLateness() method
>> }
>> object rowtime
>> object systemtime
>> 
>> What do you think about this?
>> 
>> - Jark Wu
>> 
>>> 在 20

[jira] [Created] (FLINK-4588) Fix Merging of Covering Window in MergingWindowSet

2016-09-07 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4588:
---

 Summary: Fix Merging of Covering Window in MergingWindowSet
 Key: FLINK-4588
 URL: https://issues.apache.org/jira/browse/FLINK-4588
 Project: Flink
  Issue Type: Bug
  Components: Windowing Operators
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 1.0.4, 1.2.0, 1.1.3


Right now, when a new window gets merged that covers all of the existing window 
{{MergingWindowSet}} does not correctly set the state window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4589) Fix Merging of Covering Window in MergingWindowSet

2016-09-07 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4589:
---

 Summary: Fix Merging of Covering Window in MergingWindowSet
 Key: FLINK-4589
 URL: https://issues.apache.org/jira/browse/FLINK-4589
 Project: Flink
  Issue Type: Bug
  Components: Windowing Operators
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Blocker
 Fix For: 1.0.4, 1.2.0, 1.1.3


Right now, when a new window gets merged that covers all of the existing window 
{{MergingWindowSet}} does not correctly set the state window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4590) Some Table API tests are failing when debug lvl is set to DEBUG

2016-09-07 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-4590:
-

 Summary: Some Table API tests are failing when debug lvl is set to 
DEBUG
 Key: FLINK-4590
 URL: https://issues.apache.org/jira/browse/FLINK-4590
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Affects Versions: 1.2.0
Reporter: Robert Metzger


For debugging another issue, I've set the log level on travis to DEBUG.
After that, the Table API tests started failing

{code}

Failed tests: 
  SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinusAll:156 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error 
occurred while applying rule DataSetScanRule
  SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error 
occurred while applying rule DataSetScanRule
  SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error 
occurred while applying rule DataSetScanRule
  SetOperatorsITCase.testMinusDifferentFieldNames:204 Internal error: Error 
occurred while applying rule DataSetScanRule
  SetOperatorsITCase.testMinus:175 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinus:175 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinus:175 Internal error: Error occurred while 
applying rule DataSetScanRule
  SetOperatorsITCase.testMinus:175 Internal error: Error occurred while 
applying rule DataSetScanRule
{code}

Probably Calcite is executing additional assertions depending on the debug 
level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4591) Select star does not work with grouping

2016-09-07 Thread Timo Walther (JIRA)
Timo Walther created FLINK-4591:
---

 Summary: Select star does not work with grouping
 Key: FLINK-4591
 URL: https://issues.apache.org/jira/browse/FLINK-4591
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Timo Walther


It would be consistent if this would also work:

{{table.groupBy('*).select("*)}}

Currently, the star only works in a plain select without grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4592) Fix flaky test ScalarFunctionsTest.testCurrentTimePoint

2016-09-07 Thread Timo Walther (JIRA)
Timo Walther created FLINK-4592:
---

 Summary: Fix flaky test ScalarFunctionsTest.testCurrentTimePoint
 Key: FLINK-4592
 URL: https://issues.apache.org/jira/browse/FLINK-4592
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Timo Walther


It seems that the test is still non deterministic.

{code}
org.apache.flink.api.table.expressions.ScalarFunctionsTest
testCurrentTimePoint(org.apache.flink.api.table.expressions.ScalarFunctionsTest)
  Time elapsed: 0.083 sec  <<< FAILURE!
org.junit.ComparisonFailure: Wrong result for: 
AS(>=(CHAR_LENGTH(CAST(CURRENT_TIMESTAMP):VARCHAR(1) CHARACTER SET "ISO-8859-1" 
COLLATE "ISO-8859-1$en_US$primary" NOT NULL), 22), '_c0') expected:<[tru]e> but 
was:<[fals]e>
at org.junit.Assert.assertEquals(Assert.java:115)
at 
org.apache.flink.api.table.expressions.utils.ExpressionTestBase$$anonfun$evaluateExprs$1.apply(ExpressionTestBase.scala:126)
at 
org.apache.flink.api.table.expressions.utils.ExpressionTestBase$$anonfun$evaluateExprs$1.apply(ExpressionTestBase.scala:123)
at 
scala.collection.mutable.LinkedHashSet.foreach(LinkedHashSet.scala:87)
at 
org.apache.flink.api.table.expressions.utils.ExpressionTestBase.evaluateExprs(ExpressionTestBase.scala:123)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4593) Fix PageRank algorithm example

2016-09-07 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created FLINK-4593:
--

 Summary: Fix PageRank algorithm example
 Key: FLINK-4593
 URL: https://issues.apache.org/jira/browse/FLINK-4593
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Alexander Pivovarov
Priority: Minor


This page https://flink.apache.org/features.html shows the code which 
implements PageRank algorithm (Batch Processing Applications).

I noticed couple bugs in the code
Page class has pageId field
Adjacency has just id

but in the code I see
{code}pages.join(adjacency).where("pageId").equalTo("pageId"){code}

Also {code}Page(page.id, 0.15 / numPages){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4594) Validate lower bound in MathUtils.checkedDownCast

2016-09-07 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4594:
-

 Summary: Validate lower bound in MathUtils.checkedDownCast
 Key: FLINK-4594
 URL: https://issues.apache.org/jira/browse/FLINK-4594
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial


{{MathUtils.checkedDownCast}} only compares against the upper bound 
{{Integer.MAX_VALUE}}, which has worked with current usage. 

Rather than adding a second comparison we can replace

{noformat}
if (value > Integer.MAX_VALUE) {
{noformat}

with a cast and check

{noformat}
if ((int)value != value) { ...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


release-1.1.2 does not exist on github

2016-09-07 Thread Alexander Pivovarov
Hi Everyone

I noticed that release-1.1.2 does not exist on github
https://github.com/apache/flink/releases

Regards
Alex


[jira] [Created] (FLINK-4595) Close FileOutputStream in ParameterTool

2016-09-07 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created FLINK-4595:
--

 Summary: Close FileOutputStream in ParameterTool
 Key: FLINK-4595
 URL: https://issues.apache.org/jira/browse/FLINK-4595
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.1.2
Reporter: Alexander Pivovarov
Priority: Trivial


ParameterTool and ParameterToolTest do not close FileOutputStream
{code}
defaultProps.store(new FileOutputStream(file), "Default file created by Flink's 
ParameterUtil.createPropertiesFile()");
{code}

{code}
props.store(new FileOutputStream(propertiesFile), "Test properties");
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re:Re: [DISCUSS] how choose Scala and Java

2016-09-07 Thread 时某人


Hi, Till, Thanks for your clear reply. 


In fact the API of Java and Scala are not coordinating. User are easy confused 
by the Same class name of Java/Scala API such as `StreamExecutionEnvironment`.
Since Scala can do almost what Java can, Why not use Scala only? Kafka's API 
looks good. 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/javaapi/consumer/SimpleConsumer.scala
There is a little hard to maintain the Java and Scala API at the same time. Two 
source packages mean the separated implements each other.
Can we re-consider such base design?

At 2016-09-07 15:26:55, "Till Rohrmann"  wrote:
>I think you're referring to the implementation of some of Flink's modules,
>right?
>
>If that is the case, then the rule of thumb is that we want to use Java for
>the low level runtime implementations. For the API implementations it is a
>case to case decision. The Scala API, for example is of course implemented
>in Scala. For other APIs we tend to use Scala only if it gives a clear
>advantage over a Java implementation.
>
>If your question is more like which Flink API to use (either Java or Scala
>API), then it's completely up to you and your preferences.
>
>Cheers,
>Till
>
>On Wed, Sep 7, 2016 at 8:21 AM, 时某人  wrote:
>
>> Scala and Java mixed in the module. Some Flink API indeed make someone
>> confused.
>> What is rule about  the current Scala and Java API at the first implement
>> time?
>>
>>
>> Thanks


[jira] [Created] (FLINK-4596) Class not found exception when RESTART_STRATEGY is configured with fully qualified class name in the yaml

2016-09-07 Thread Nagarjun Guraja (JIRA)
Nagarjun Guraja created FLINK-4596:
--

 Summary: Class not found exception when RESTART_STRATEGY is 
configured with fully qualified class name in the yaml
 Key: FLINK-4596
 URL: https://issues.apache.org/jira/browse/FLINK-4596
 Project: Flink
  Issue Type: Bug
Reporter: Nagarjun Guraja


CAUSE: createRestartStrategyFactory converts configured value of strategyname 
to lowercase and searches for class name using lowercased string.  

FIX: Do not lower case the strategy config value or just lowercase for the 
switch case alone



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: release-1.1.2 does not exist on github

2016-09-07 Thread Robert Metzger
Hi Alex,

the problem is that the tag has not been created for the release. I'll add
it.

On Wed, Sep 7, 2016 at 11:09 PM, Alexander Pivovarov 
wrote:

> Hi Everyone
>
> I noticed that release-1.1.2 does not exist on github
> https://github.com/apache/flink/releases
>
> Regards
> Alex
>


Re: Re: [DISCUSS] how choose Scala and Java

2016-09-07 Thread Fabian Hueske
The Java and Scala APIs are organized in different Maven modules and the
Scala APIs are based on the respective Java API.
The benefit of this design is to keep Scala dependencies out of the Java
APIs which is requested by many users.

The Java and Scala counterparts of the DataSet and DataStream APIs are
mostly similar, but differ in some details.
For instance, the Scala APIs support Scala Lambda functions and Scala
Tuples and case classes which are not available in Java.

Last but probably most importantly, the Flink community committed itself to
API backward compatibility for all 1.x releases.
Any major API changes could only be done in a 2.x release.

Best, Fabian

2016-09-08 2:54 GMT+02:00 时某人 :

>
>
> Hi, Till, Thanks for your clear reply.
>
>
> In fact the API of Java and Scala are not coordinating. User are easy
> confused by the Same class name of Java/Scala API such as
> `StreamExecutionEnvironment`.
> Since Scala can do almost what Java can, Why not use Scala only? Kafka's
> API looks good. https://github.com/apache/kafka/blob/trunk/core/src/
> main/scala/kafka/javaapi/consumer/SimpleConsumer.scala
> There is a little hard to maintain the Java and Scala API at the same
> time. Two source packages mean the separated implements each other.
> Can we re-consider such base design?
>
> At 2016-09-07 15:26:55, "Till Rohrmann"  wrote:
> >I think you're referring to the implementation of some of Flink's modules,
> >right?
> >
> >If that is the case, then the rule of thumb is that we want to use Java
> for
> >the low level runtime implementations. For the API implementations it is a
> >case to case decision. The Scala API, for example is of course implemented
> >in Scala. For other APIs we tend to use Scala only if it gives a clear
> >advantage over a Java implementation.
> >
> >If your question is more like which Flink API to use (either Java or Scala
> >API), then it's completely up to you and your preferences.
> >
> >Cheers,
> >Till
> >
> >On Wed, Sep 7, 2016 at 8:21 AM, 时某人  wrote:
> >
> >> Scala and Java mixed in the module. Some Flink API indeed make someone
> >> confused.
> >> What is rule about  the current Scala and Java API at the first
> implement
> >> time?
> >>
> >>
> >> Thanks
>


[jira] [Created] (FLINK-4597) Improve Scalar Function section in Table API documentation

2016-09-07 Thread Jark Wu (JIRA)
Jark Wu created FLINK-4597:
--

 Summary: Improve Scalar Function section in Table API documentation
 Key: FLINK-4597
 URL: https://issues.apache.org/jira/browse/FLINK-4597
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Jark Wu
Assignee: Jark Wu
Priority: Minor


The function signature in Scalar Function section is a little confusing. 
Because it's hard to distinguish keyword and parameters. Such as : 
{{EXTRACT(TIMEINTERVALUNIT FROM TEMPORAL)}}, user may not know TEMPORAL is a 
parameter after first glance. I propose to use {{<>}} around parameters, i.e. 
{{EXTRACT( FROM )}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4598) Support NULLIF in Table API

2016-09-07 Thread Jark Wu (JIRA)
Jark Wu created FLINK-4598:
--

 Summary: Support NULLIF  in Table API
 Key: FLINK-4598
 URL: https://issues.apache.org/jira/browse/FLINK-4598
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Jark Wu
Assignee: Jark Wu


This could be a subtask of [FLINK-4549]. As Flink SQL has supported {{NULLIF}} 
implicitly. We should support it in Table API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)