Re: Taking time off

2017-01-23 Thread Till Rohrmann
I totally agree with the previous writers. Thanks a lot for all your effort
and all the best for your future.

Cheers,
Till

On Sat, Jan 21, 2017 at 5:13 PM, Ufuk Celebi  wrote:

> I totally did not see this email. Sorry! Thanks for all your hard work
> and the great discussions on and offline. I learned a lot from you in
> the past two years! Looking forward to seeing you again on the mailing
> lists in the future.
>
> – Ufuk
>
>
> On Sat, Jan 21, 2017 at 4:49 PM, Paris Carbone  wrote:
> > Thanks for all the cool contributions on Flink and Beam!
> > Keep rockin' in the distr systems space :)
> >
> > Paris
> >
> >> On 21 Jan 2017, at 00:53, Fabian Hueske  wrote:
> >>
> >> Hi Max,
> >>
> >> Thanks for all your efforts!
> >> Hope to see you back soon.
> >>
> >> Take care, Fabian
> >>
> >>
> >> 2017-01-16 11:22 GMT+01:00 Vasiliki Kalavri  >:
> >>
> >>> Hi Max,
> >>>
> >>> thank you for all your work! Enjoy your time off and hope to have you
> back
> >>> with us soon ^^
> >>>
> >>> Cheers,
> >>> -Vasia.
> >>>
> >>> On 14 January 2017 at 09:03, Maximilian Michels 
> wrote:
> >>>
>  Dear Squirrels,
> 
>  Thank you! It's been very exciting to see the Flink community grow and
>  flourish over the past two years.
> 
>  For the beginning of this year, I decided to take some time off, which
>  means I'll be less engaged on the mailing list or on GitHub/JIRA.
> 
>  In the meantime, if you have any questions I might be able to answer,
> >>> feel
>  free to contact me. Looking forward to see the squirrels rise further!
> 
>  Best,
>  Max
> 
> >>>
> >
>


[jira] [Created] (FLINK-5609) Add last update time to docs

2017-01-23 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5609:
--

 Summary: Add last update time to docs
 Key: FLINK-5609
 URL: https://issues.apache.org/jira/browse/FLINK-5609
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


Add a small text to the start page stating when the docs was last updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5610) Rename Installation and Setup to Project Setup

2017-01-23 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5610:
--

 Summary: Rename Installation and Setup to Project Setup
 Key: FLINK-5610
 URL: https://issues.apache.org/jira/browse/FLINK-5610
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


With the recent refactorings of the documentation, the high level section 
"Installation and Setup" could more aptly be named "Project Structure" because 
it mostly groups pages about project setup like IDE setup etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Discuss] Some problem Unit test

2017-01-23 Thread sjk
Hi, all

Surefire plugin default execute `default-test` when mvn test. There two 
execution in Flink’s surefire plugin configure, default-test and 
integration-tests.

I have three problem about unit test:
1. As default-test and integration-tests are mutual exclusion, when I local 
execute “mvn clean test -f flink-libraries/flink-table/pom.xml -U”, it will not 
execute all the unit test of ITCase.*, such as 
org.apache.flink.table.api.scala.stream.sql.SqlITCase will not be executed. I 
think it’s a bug.
   Where is integration-tests used?
2. *Test.* also will be not executed, there lots of *Test.* unit test, do they 
need be test?
3. Suite.* use in scala generally, flink-ml use scalatest-maven-plugin instead 
of surefire plugin

I think we should do something on unit test:
1. Choose one unit test plugin: surefire or scalatest-maven-plugin
2. Include given unit test wildcard class, such as **/*ITCase.*,   **/*Test.*,  
**/*Suite.*, **/*Test.*
   All such unit test should be executed. Clean the non unit test class with 
name end of “Test”.


After I try to modify the configure of surefire plugin, lots of error occur


default-test
test

test


${skip.default.test}

**/*ITCase.*
**/*Suite.*
**/*Test.*





 

cc Stephan Ewen


Best regards
-Jinkui Shi




Re: 答复: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing release candidate)

2017-01-23 Thread Robert Metzger
Hi all,

I would like to do a proper voting RC1 early this week.
>From the issues mentioned here, most of them have pull requests or were
changed to a lower priority.
Once we've merged all outstanding PRs, I'll create the next RC.

Regards,
Robert


On Mon, Jan 16, 2017 at 12:13 PM, Fabian Hueske  wrote:

> A user reported that outer joins on the Table API and SQL compute wrong
> results:
>
> https://issues.apache.org/jira/browse/FLINK-5498
>
> 2017-01-15 20:23 GMT+01:00 Till Rohrmann :
>
> > I found two problematic issues with Mesos HA mode which breaks it:
> >
> > https://issues.apache.org/jira/browse/FLINK-5495
> > https://issues.apache.org/jira/browse/FLINK-5496
> >
> > On Fri, Jan 13, 2017 at 11:29 AM, Fabian Hueske 
> wrote:
> >
> > > I tested the Table API / SQL a bit.
> > >
> > > I implemented a windowed aggregation with the streaming Table API and
> it
> > > produced the same results as a DataStream API implementation.
> > > Joining a stream with a TableFunction also seemed to work well.
> > > Moreover, I checked the results of a bunch of TPC-H queries (batch SQL)
> > > and all produced correct results.
> > >
> > >
> > >
> > > 2017-01-12 17:45 GMT+01:00 Till Rohrmann :
> > >
> > >> I'm wondering whether we should not depend the webserver encryption on
> > the
> > >> global encryption activation and activating it instead per default.
> > >>
> > >> On Thu, Jan 12, 2017 at 4:54 PM, Chesnay Schepler  >
> > >> wrote:
> > >>
> > >> > FLINK-5470 is a duplicate of FLINK-5298 for which there is also an
> > open
> > >> PR.
> > >> >
> > >> > FLINK-5472 is imo invalid since the webserver does support https,
> you
> > >> just
> > >> > have to enable it as per the security documentation.
> > >> >
> > >> >
> > >> > On 12.01.2017 16:20, Till Rohrmann wrote:
> > >> >
> > >> > I also found an issue:
> > >> >
> > >> > https://issues.apache.org/jira/browse/FLINK-5470
> > >> >
> > >> > I also noticed that Flink's webserver does not support https
> requests.
> > >> It
> > >> > might be worthwhile to add it, though.
> > >> >
> > >> > https://issues.apache.org/jira/browse/FLINK-5472
> > >> >
> > >> > On Thu, Jan 12, 2017 at 11:24 AM, Robert Metzger <
> rmetz...@apache.org
> > >
> > >> > wrote:
> > >> >
> > >> >> I also found a bunch of issues
> > >> >>
> > >> >> https://issues.apache.org/jira/browse/FLINK-5465
> > >> >> https://issues.apache.org/jira/browse/FLINK-5462
> > >> >> https://issues.apache.org/jira/browse/FLINK-5464
> > >> >> https://issues.apache.org/jira/browse/FLINK-5463
> > >> >>
> > >> >>
> > >> >> On Thu, Jan 12, 2017 at 9:56 AM, Fabian Hueske < <
> fhue...@gmail.com>
> > >> >> fhue...@gmail.com> wrote:
> > >> >>
> > >> >> > I have another bugfix for 1.2.:
> > >> >> >
> > >> >> > https://issues.apache.org/jira/browse/FLINK-2662 (pending PR)
> > >> >> >
> > >> >> > 2017-01-10 15:16 GMT+01:00 Robert Metzger <  >
> > >> >> rmetz...@apache.org>:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > this depends a lot on the number of issues we find during the
> > >> testing.
> > >> >> > >
> > >> >> > >
> > >> >> > > These are the issues I found so far:
> > >> >> > >
> > >> >> > > https://issues.apache.org/jira/browse/FLINK-5379 (unresolved)
> > >> >> > > https://issues.apache.org/jira/browse/FLINK-5383 (resolved)
> > >> >> > > https://issues.apache.org/jira/browse/FLINK-5382 (resolved)
> > >> >> > > https://issues.apache.org/jira/browse/FLINK-5381 (resolved)
> > >> >> > > https://issues.apache.org/jira/browse/FLINK-5380 (pending PR)
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > On Tue, Jan 10, 2017 at 11:58 AM, shijinkui <
> > shijin...@huawei.com>
> > >> >> > wrote:
> > >> >> > >
> > >> >> > > > Do we have a probable time of 1.2 release? This month or Next
> > >> month?
> > >> >> > > >
> > >> >> > > > -邮件原件-
> > >> >> > > > 发件人: Robert Metzger [mailto: 
> > >> >> rmetz...@apache.org]
> > >> >> > > > 发送时间: 2017年1月3日 20:44
> > >> >> > > > 收件人: dev@flink.apache.org
> > >> >> > > > 抄送: u...@flink.apache.org
> > >> >> > > > 主题: [DISCUSS] Apache Flink 1.2.0 RC0 (Non-voting testing
> > release
> > >> >> > > candidate)
> > >> >> > > >
> > >> >> > > > Hi,
> > >> >> > > >
> > >> >> > > > First of all, I wish everybody a happy new year 2017.
> > >> >> > > >
> > >> >> > > > I've set user@flink in CC so that users who are interested
> in
> > >> >> helping
> > >> >> > > > with the testing get notified. Please respond only to the
> dev@
> > >> >> list to
> > >> >> > > > keep the discussion there!
> > >> >> > > >
> > >> >> > > > According to the 1.2 release discussion thread, I've created
> a
> > >> first
> > >> >> > > > release candidate for Flink 1.2.
> > >> >> > > > The release candidate will not be the final release, because
> > I'm
> > >> >> > certain
> > >> >> > > > that we'll find at least one blocking issue in the candidate
> :)
> > >> >> > > >
> > >> >> > > > Therefore, the RC is meant as a testing only release
> candidate.
> > >> >> > > > Please report every issue we need to fix

Re: Taking time off

2017-01-23 Thread Kostas Kloudas
Hi Max! 

I may be repeating the previous writers but Thanks a lot for the work and 
all the contributions to the projects both at the technical and at the 
community level!

Cheers,
Kostas

> On Jan 23, 2017, at 9:28 AM, Till Rohrmann  wrote:
> 
> I totally agree with the previous writers. Thanks a lot for all your effort
> and all the best for your future.
> 
> Cheers,
> Till
> 
> On Sat, Jan 21, 2017 at 5:13 PM, Ufuk Celebi  wrote:
> 
>> I totally did not see this email. Sorry! Thanks for all your hard work
>> and the great discussions on and offline. I learned a lot from you in
>> the past two years! Looking forward to seeing you again on the mailing
>> lists in the future.
>> 
>> – Ufuk
>> 
>> 
>> On Sat, Jan 21, 2017 at 4:49 PM, Paris Carbone  wrote:
>>> Thanks for all the cool contributions on Flink and Beam!
>>> Keep rockin' in the distr systems space :)
>>> 
>>> Paris
>>> 
 On 21 Jan 2017, at 00:53, Fabian Hueske  wrote:
 
 Hi Max,
 
 Thanks for all your efforts!
 Hope to see you back soon.
 
 Take care, Fabian
 
 
 2017-01-16 11:22 GMT+01:00 Vasiliki Kalavri >> :
 
> Hi Max,
> 
> thank you for all your work! Enjoy your time off and hope to have you
>> back
> with us soon ^^
> 
> Cheers,
> -Vasia.
> 
> On 14 January 2017 at 09:03, Maximilian Michels 
>> wrote:
> 
>> Dear Squirrels,
>> 
>> Thank you! It's been very exciting to see the Flink community grow and
>> flourish over the past two years.
>> 
>> For the beginning of this year, I decided to take some time off, which
>> means I'll be less engaged on the mailing list or on GitHub/JIRA.
>> 
>> In the meantime, if you have any questions I might be able to answer,
> feel
>> free to contact me. Looking forward to see the squirrels rise further!
>> 
>> Best,
>> Max
>> 
> 
>>> 
>> 



[jira] [Created] (FLINK-5611) Add QueryableStateException type

2017-01-23 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5611:
--

 Summary: Add QueryableStateException type
 Key: FLINK-5611
 URL: https://issues.apache.org/jira/browse/FLINK-5611
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


We currently have some exceptions like {{UnknownJobManager}} and the like that 
should be sub types of the to be introduced {{QueryableStateException}}. Right 
now, they extend checked and unchecked Exceptions.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5612) GlobPathFilter not-serializable exception

2017-01-23 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5612:
---

 Summary: GlobPathFilter not-serializable exception
 Key: FLINK-5612
 URL: https://issues.apache.org/jira/browse/FLINK-5612
 Project: Flink
  Issue Type: Bug
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.2.0, 1.3.0
Reporter: Chesnay Schepler
Priority: Blocker


A user reported on the mailing list a non-serializable exception when using the 
GlobFIlePathFilters.

It appears that the PathMatchers are all created as anonymous inner classes and 
thus contain a reference to the encapsulating, non-serializable FileSystem 
class.

We can fix this by moving the Matcher instantiation into filterPath(...).

{code}
public static void main(String[] args) throws Exception {

final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
final TextInputFormat format = new TextInputFormat(new Path("/temp"));

format.setFilesFilter(new GlobFilePathFilter(
Collections.singletonList("**"),
Arrays.asList("**/another_file.bin", "**/dataFile1.txt")
));

DataSet result = env.readFile(format,"/tmp");
result.writeAsText("/temp/out");
env.execute("GlobFilePathFilter-Test");

}
{code}
{code}

Exception in thread "main" org.apache.flink.optimizer.CompilerException:
Error translating node 'Data Source "at
readFile(ExecutionEnvironment.java:520)
(org.apache.flink.api.java.io.TextInputFormat)" : NONE [[ GlobalProperties
[partitioning=RANDOM_PARTITIONED] ]] [[ LocalProperties [ordering=null,
grouped=null, unique=null] ]]': Could not write the user code wrapper class
org.apache.flink.api.common.operators.util.UserCodeObjectWrapper :
java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:381)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:106)
at
org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86)
at
org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
at
org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:192)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:188)
at
org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at com.apsaltis.EventDetectionJob.main(EventDetectionJob.java:75)
Caused by:
org.apache.flink.runtime.operators.util.CorruptConfigurationException:
Could not write the user code wrapper class
org.apache.flink.api.common.operators.util.UserCodeObjectWrapper :
java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at
org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:281)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:888)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:281)
... 8 more
Caused by: java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at java.util.ArrayList.writeObject(ArrayList.java:747)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)

Re: GlobFilePathFilter NotSerializableException

2017-01-23 Thread Chesnay Schepler

Hello,

this appears to be a bug in Flink.

The problem is that the PathMatcher objects in the GlobFilePathFilter 
all contain a reference to

the encapsulating class.

The easiest solution to this would be to build the PathMatcher after 
they were shipped within
the filterPath method. Since this code is in flink-core we can't use the 
ClosureCleaner unfortunately.


I have created a JIRA for this: 
https://issues.apache.org/jira/browse/FLINK-5612


Regards,
Chesnay

On 23.01.2017 00:07, Andrew Psaltis wrote:

Hi,
I am trying to use the GlobFilePathFIlter with Flink 1.2-SNAPSHOT have also
tried using the latest 1.3-SNAPSHOT code and get the same error. Basically
if using the GlobFilePathFilter there is a serialization exception due to
the inner class in sun.nio.fs.UnixFileSystem not being serializable. I have
tried various different kryo registrations, but must be missing something,
I am happy to work on fixing this, but may need some direction. The below
code (which I lifted from the testReadMultiplePatterns() in the
FileInputFormatTest class) reproduces the error, the exception and stack
trace follows. FWIW, I am testing this on OSX.

public static void main(String[] args) throws Exception {

 final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
 final TextInputFormat format = new TextInputFormat(new Path("/temp"));

 format.setFilesFilter(new GlobFilePathFilter(
 Collections.singletonList("**"),
 Arrays.asList("**/another_file.bin", "**/dataFile1.txt")
 ));

 DataSet result = env.readFile(format,"/tmp");
 result.writeAsText("/temp/out");
 env.execute("GlobFilePathFilter-Test");

}


Exception in thread "main" org.apache.flink.optimizer.CompilerException:
Error translating node 'Data Source "at
readFile(ExecutionEnvironment.java:520)
(org.apache.flink.api.java.io.TextInputFormat)" : NONE [[ GlobalProperties
[partitioning=RANDOM_PARTITIONED] ]] [[ LocalProperties [ordering=null,
grouped=null, unique=null] ]]': Could not write the user code wrapper class
org.apache.flink.api.common.operators.util.UserCodeObjectWrapper :
java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:381)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:106)
at
org.apache.flink.optimizer.plan.SourcePlanNode.accept(SourcePlanNode.java:86)
at
org.apache.flink.optimizer.plan.SingleInputPlanNode.accept(SingleInputPlanNode.java:199)
at
org.apache.flink.optimizer.plan.OptimizedPlan.accept(OptimizedPlan.java:128)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.compileJobGraph(JobGraphGenerator.java:192)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:188)
at
org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at com.apsaltis.EventDetectionJob.main(EventDetectionJob.java:75)
Caused by:
org.apache.flink.runtime.operators.util.CorruptConfigurationException:
Could not write the user code wrapper class
org.apache.flink.api.common.operators.util.UserCodeObjectWrapper :
java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at
org.apache.flink.runtime.operators.util.TaskConfig.setStubWrapper(TaskConfig.java:281)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.createDataSourceVertex(JobGraphGenerator.java:888)
at
org.apache.flink.optimizer.plantranslate.JobGraphGenerator.preVisit(JobGraphGenerator.java:281)
... 8 more
Caused by: java.io.NotSerializableException: sun.nio.fs.UnixFileSystem$3
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at java.util.ArrayList.writeObject(ArrayList.java:747)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(Objec

Re: [DISCUSS] proposal for the User Defined AGGregate (UDAGG)

2017-01-23 Thread Fabian Hueske
Thanks for the clarification Shaoxuan.

Cheers, Fabian

2017-01-22 4:08 GMT+01:00 Shaoxuan Wang :

> Hi Fabian,
> Thanks for the carefully checking on the proposal.
> Yes, code generation is in my plan. As shown in "2.3 UDAGG interface", the
> input and return types of the new proposed UDAGG functions are dynamically
> given by the users ("[user defined xxx inputs/types]"). All embed built-in
> functions for this new API have to be generated via codegen. I will update
> Jira and doc.
>
> Thanks,
> Shaoxuan
>
>
> On Sat, Jan 21, 2017 at 7:29 AM, Fabian Hueske  wrote:
>
> > Hi Shaoxuan,
> >
> > thanks a lot for this great design doc.
> > I think user defined aggregation functions are a very important feature
> for
> > the Table API and SQL.
> >
> > Have you thought about how the aggregation functions will be embedded in
> > Flink functions?
> > At the moment, we have a generic Flink function which is configured with
> > aggregation functions, i.e., we do not leverage code generation here.
> > Do you plan to embed built-in and user-defined aggregations functions
> that
> > implement the proposed API with code generation?
> >
> > Can you maybe extend the JIRA or design document with this information?
> >
> > Thank you,
> > Fabian
> >
> > 2017-01-18 20:55 GMT+01:00 Shaoxuan Wang :
> >
> > > Hi everyone,
> > > I have drafted the design doc (link is provided below) for UDAGG, and
> > > created the JIRA (FLINK-5564) to track the progress of this design.
> > > Special thanks to Stephan and Fabian for their advice and help.
> > >
> > > Please check the design doc, feel free to share your comments in the
> > google
> > > doc:
> > > https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz6
> > > 7yXOypY7Uh5gIOK2r-U/edit
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Wed, Jan 11, 2017 at 6:09 AM, Fabian Hueske 
> > wrote:
> > >
> > > > Hi Shaoxuan,
> > > >
> > > > user-defined aggregates would be a great addition to the Table API /
> > SQL.
> > > > I completely agree that the current (internal) interface is not well
> > > suited
> > > > as an external interface and needs to be redesigned if exposed to
> > users.
> > > >
> > > > We need to careful think about this new interface and how we can
> > > integrate
> > > > it with the DataStream (and DataSet) API to support all required
> > > > operations, esp. with respect to null aggregates and support for
> > > combining
> > > > / merging.
> > > > I agree that for efficient execution, we should avoid WindowFunctions
> > > > (large state) and FoldFunction (not mergeable). If we need a new
> > > interface
> > > > in the DataStream API, we need to discuss this in more detail.
> > > > I think we need a bit more information about the proposed UDAGG
> > interface
> > > > to discuss how this can be mapped to DataStream operators.
> > > >
> > > > Support for retraction will be required for our future plans with the
> > > > streaming Table API / SQL interface.
> > > >
> > > > Looking forward to your proposal,
> > > > Fabian
> > > >
> > > > 2017-01-10 15:40 GMT+01:00 Shaoxuan Wang :
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > I am writing this email to propose a new User Defined Aggregate
> > > > interface.
> > > > > We were trying to leverage the existing Aggregate interface, but
> > > > > unfortunately we realized that it is not sufficient to meet all our
> > > > needs.
> > > > > Here are the obstacles we have observed:
> > > > > 1) The current aggregate interface is not very concise to users.
> One
> > > > needs
> > > > > to know the design details of the intermediate Row buffer before
> > > > implements
> > > > > an Aggregate. Seven functions are needed even for a simple Count
> > > > aggregate.
> > > > > We'd better to make the UDAGG interface much more concisely.
> > > > > 2) the current aggregate function can be only applied on one single
> > > > column.
> > > > > There are many scenarios which require the aggregate function
> taking
> > > > > multiple columns as the inputs.
> > > > > 3) “Retraction” is not covered in the current Aggregate.
> > > > >
> > > > > For #1, I am thinking instead of letting users to manipulate the
> > > > > intermediate buffer, we could potentially put the entire Aggregate
> > > > instance
> > > > > or a subclass instance of Aggregate to the Row buffer, such that
> the
> > > user
> > > > > does not need to know how the Aggregate state is maintained by the
> > > > > framework.
> > > > > But to achieve this goal, we probably need a new dataStream API.
> The
> > > > > existing reduce API does not work with two different types of
> inputs
> > > (in
> > > > > this proposal, the inputs will be upstream values, and the instance
> > of
> > > > the
> > > > > current accumulated Aggregate), while the fold API is not able to
> > merge
> > > > the
> > > > > two Aggregate results (which is usually needed for merging two
> > session
> > > > > windows).
> > > > >
> > > > > For #3, besides the aggregate itself, there are a few other things
> > need

Re: [Dev] Flink 'InputFormat' Interface execution related problem

2017-01-23 Thread Fabian Hueske
Hi Pawan,

I don't this this works. The InputSplits are generated by the JobManager,
i.e., not in parallel by a single process.
After the parallel InputFormats have been started on the TaskManagers, they
request InputSplits and open() them. If there are no InputSplits there is
no work to be done and open will not be called.
You can tweak the behavior by implementing your own InputSplits and
InputSplitAssigner which assigns exactly one input split to each task.

Fabian

2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
pawan.manis...@gmail.com>:

> Hi,
>
> When we are implementing that Flink *InputFormat* Interface, if we have
> that*
> input split creation* part in our data analytics server APIs can we
> directly go to the second phase of the flink InputFormat Interface
> execution.
>
> Basically I need to know that can we read those InputSplits directly,
> without generating InputSplits inside the InputFormat Interface. So it
> would be great if you can provide any kind of help.
>
> Thanks,
> Pawan
>
> --
>
> *Pawan Gunaratne*
> *Mob: +94 770373556*
>


[jira] [Created] (FLINK-5613) QueryableState: requesting a non-existing key in RocksDBStateBackend is not consistent with the MemoryStateBackend and FsStateBackend

2017-01-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5613:
--

 Summary: QueryableState: requesting a non-existing key in 
RocksDBStateBackend is not consistent with the MemoryStateBackend and 
FsStateBackend
 Key: FLINK-5613
 URL: https://issues.apache.org/jira/browse/FLINK-5613
 Project: Flink
  Issue Type: Bug
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


Querying for a non-existing key for a state that has a default value set 
currently results in the default value being returned in the 
RocksDBStateBackend only. MemoryStateBackend or FsStateBackend will return null 
which results in an UnknownKeyOrNamespace exception.

Default values are now deprecated and will be removed eventually so we should 
not introduce them into the new queryable state API and thus adapt the 
RocksDBStateBackend accordingly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-5614) Enable snapshot builds for release branches

2017-01-23 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-5614:
-

 Summary: Enable snapshot builds for release branches
 Key: FLINK-5614
 URL: https://issues.apache.org/jira/browse/FLINK-5614
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Robert Metzger


As you can see here: 
https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-core/1.2-SNAPSHOT/

The 1.2-SNAPSHOT repository didn't update since Dec 20. That's exactly the day 
when I forked off the release-1.2 branch from master.
So it seems that the current infrastructure to build the nightly's / snapshot 
versions only builds master, not the release branches.

I think this is the corresponding Jenkins build: 
https://builds.apache.org/view/All/job/flink-snapshot-deployment/

Snapshot deployment was moved from Travis from Jenkins here: FLINK-3383

I added this JIRA mostly to see if others stumbled across this as well, and 
whether there's enough interest in fixing the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Strange segfaults in Flink 1.2 (probably not RocksDB related)

2017-01-23 Thread Gyula Fóra
Hi,
I tried it with and without RocksDB, no difference. Also tried with the
newest JDK version, still fails for that specific topology.

On the other hand after I made some other changes in the topology (somwhere
else) I can't reproduce it anymore. So it seems to be a very specific
thing. To me it looks like a JVM problem, if it happens again I will let
you know.

Gyula

Stephan Ewen  ezt írta (időpont: 2017. jan. 22., V,
21:16):

> Hey!
>
> I am actually a bit puzzled how these segfaults could come, unless via a
> native library, or a JVM bug.
>
> Can you try how it behaves when not using RocksDB or using a newer JVM
> version?
>
> Stephan
>
>
> On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra  wrote:
>
> > Hey All,
> >
> > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4
> and I
> > have noticed some very strange segfaults. (I am running in test
> > environments - with the minicluster)
> > It is a fairly complex job so I wouldnt go into details but the
> interesting
> > part is that adding/removing a simple filter in the wrong place in the
> > topology (such as (e -> true)  or anything actually ) seems to cause
> > frequent segfaults during execution.
> >
> > Basically the key part looks something like:
> >
> > ...
> > DataStream stream =
> source.map().setParallelism(1)..uid("AssignFieldIds").
> > name("AssignFieldIds").startNewChain();
> > DataStream filtered = input1.filter(t -> true).setParallelism(1)
> > IterativeStream itStream = filtered.iterate(...)
> > ...
> >
> > Some notes before the actual error: replacing the filter with a map or
> > other chained transforms also leads to this problem. If the filter is not
> > chained there is no error (or if I remove the filter).
> >
> > The error I get looks like this:
> > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242
> >
> > I wonder if anyone has seen something like this before, or have some
> ideas
> > how to debug it. The simple work around is to not chain the filter but
> it's
> > still very strange.
> >
> > Regards,
> > Gyula
> >
>


[jira] [Created] (FLINK-5615) queryable state: execute the QueryableStateITCase for all three state back-ends

2017-01-23 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-5615:
--

 Summary: queryable state: execute the QueryableStateITCase for all 
three state back-ends
 Key: FLINK-5615
 URL: https://issues.apache.org/jira/browse/FLINK-5615
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.2.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The QueryableStateITCase currently is only tested with the MemoryStateBackend 
but as has been seen in the past, some errors or inconsistent behaviour only 
appeared with different state back-ends. It should thus be extended to be 
tested with all three currently existing state back-ends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Strange segfaults in Flink 1.2 (probably not RocksDB related)

2017-01-23 Thread Robert Metzger
I think I saw similar segfaults when rebuilding Flink while Flink daemons
were running out of the build-target repository. After the flink-dist jar
has been rebuild and I tried stopping the daemons, they were running into
these segfaults.

On Mon, Jan 23, 2017 at 12:02 PM, Gyula Fóra  wrote:

> Hi,
> I tried it with and without RocksDB, no difference. Also tried with the
> newest JDK version, still fails for that specific topology.
>
> On the other hand after I made some other changes in the topology (somwhere
> else) I can't reproduce it anymore. So it seems to be a very specific
> thing. To me it looks like a JVM problem, if it happens again I will let
> you know.
>
> Gyula
>
> Stephan Ewen  ezt írta (időpont: 2017. jan. 22., V,
> 21:16):
>
> > Hey!
> >
> > I am actually a bit puzzled how these segfaults could come, unless via a
> > native library, or a JVM bug.
> >
> > Can you try how it behaves when not using RocksDB or using a newer JVM
> > version?
> >
> > Stephan
> >
> >
> > On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra  wrote:
> >
> > > Hey All,
> > >
> > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4
> > and I
> > > have noticed some very strange segfaults. (I am running in test
> > > environments - with the minicluster)
> > > It is a fairly complex job so I wouldnt go into details but the
> > interesting
> > > part is that adding/removing a simple filter in the wrong place in the
> > > topology (such as (e -> true)  or anything actually ) seems to cause
> > > frequent segfaults during execution.
> > >
> > > Basically the key part looks something like:
> > >
> > > ...
> > > DataStream stream =
> > source.map().setParallelism(1)..uid("AssignFieldIds").
> > > name("AssignFieldIds").startNewChain();
> > > DataStream filtered = input1.filter(t -> true).setParallelism(1)
> > > IterativeStream itStream = filtered.iterate(...)
> > > ...
> > >
> > > Some notes before the actual error: replacing the filter with a map or
> > > other chained transforms also leads to this problem. If the filter is
> not
> > > chained there is no error (or if I remove the filter).
> > >
> > > The error I get looks like this:
> > > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242
> > >
> > > I wonder if anyone has seen something like this before, or have some
> > ideas
> > > how to debug it. The simple work around is to not chain the filter but
> > it's
> > > still very strange.
> > >
> > > Regards,
> > > Gyula
> > >
> >
>


Re: [Discuss] Some problem Unit test

2017-01-23 Thread Jinkui Shi
[1] Maven Surefire: multiple executions and debugging 
https://blog-rmannibucau.rhcloud.com/#/post/maven-surefire-debugging-execution 

> On Jan 23, 2017, at 17:17, sjk  wrote:
> 
> Hi, all
> 
> Surefire plugin default execute `default-test` when mvn test. There two 
> execution in Flink’s surefire plugin configure, default-test and 
> integration-tests.
> 
> I have three problem about unit test:
> 1. As default-test and integration-tests are mutual exclusion, when I local 
> execute “mvn clean test -f flink-libraries/flink-table/pom.xml -U”, it will 
> not execute all the unit test of ITCase.*, such as 
> org.apache.flink.table.api.scala.stream.sql.SqlITCase will not be executed. I 
> think it’s a bug.
>   Where is integration-tests used?
> 2. *Test.* also will be not executed, there lots of *Test.* unit test, do 
> they need be test?
> 3. Suite.* use in scala generally, flink-ml use scalatest-maven-plugin 
> instead of surefire plugin
> 
> I think we should do something on unit test:
> 1. Choose one unit test plugin: surefire or scalatest-maven-plugin
> 2. Include given unit test wildcard class, such as **/*ITCase.*,   
> **/*Test.*,  **/*Suite.*, **/*Test.*
>   All such unit test should be executed. Clean the non unit test class with 
> name end of “Test”.
> 
> 
> After I try to modify the configure of surefire plugin, lots of error occur
> 
> 
>   default-test
>   test
>   
>   test
>   
>   
>   ${skip.default.test}
>   
>   **/*ITCase.*
>   **/*Suite.*
>   **/*Test.*
>   
>   
>   
> 
> 
> 
> 
> cc Stephan Ewen
> 
> 
> Best regards
> -Jinkui Shi
> 
> 



[jira] [Created] (FLINK-5616) YarnPreConfiguredMasterHaServicesTest fails sometimes

2017-01-23 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-5616:
---

 Summary: YarnPreConfiguredMasterHaServicesTest fails sometimes
 Key: FLINK-5616
 URL: https://issues.apache.org/jira/browse/FLINK-5616
 Project: Flink
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0, 1.3.0
Reporter: Aljoscha Krettek


This is the relevant part from the log:
{code}
---
 T E S T S
---
Running 
org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest
Formatting using clusterid: testClusterID
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.407 sec - in 
org.apache.flink.yarn.highavailability.YarnPreConfiguredMasterHaServicesTest
Running org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest
Formatting using clusterid: testClusterID
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.479 sec <<< 
FAILURE! - in 
org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest
testClosingReportsToLeader(org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest)
  Time elapsed: 0.836 sec  <<< FAILURE!
org.mockito.exceptions.verification.WantedButNotInvoked: 
Wanted but not invoked:
leaderContender.handleError();
-> at 
org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120)
Actually, there were zero interactions with this mock.

at 
org.apache.flink.yarn.highavailability.YarnIntraNonHaMasterServicesTest.testClosingReportsToLeader(YarnIntraNonHaMasterServicesTest.java:120)

Running org.apache.flink.yarn.YarnFlinkResourceManagerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.82 sec - in 
org.apache.flink.yarn.YarnFlinkResourceManagerTest
Running org.apache.flink.yarn.YarnClusterDescriptorTest
java.lang.RuntimeException: Couldn't deploy Yarn cluster
at 
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425)
at 
org.apache.flink.yarn.YarnClusterDescriptorTest.testConfigOverwrite(YarnClusterDescriptorTest.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: org.apache.flink.configuration.IllegalConfigurationException: The 
number of virtual cores per node were configured with 2147483647 but Yarn only 
has 8 virtual cores available. Please note that the number of virtual cores is 
set to the number of task slots by default unless configured in the Flink 
config with 'yarn.containers.vcores.'
at 
org.apache.

[jira] [Created] (FLINK-5617) Check new public APIs in 1.2 release

2017-01-23 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-5617:
-

 Summary: Check new public APIs in 1.2 release
 Key: FLINK-5617
 URL: https://issues.apache.org/jira/browse/FLINK-5617
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.2.0
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 1.2.0


Before releasing Flink 1.2.0, I would like to quickly review which new public 
methods we are supporting in future releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Fabian Hueske
Hi everybody,

it seems that currently several contributors are working on new features
for the streaming Table API / SQL around row windows (as defined in FLIP-11
[1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
FLINK-5584).
Since these efforts overlap quite a bit I spent some time thinking about
how we can approach these features and how to avoid overlapping
contributions.

The challenge here is the following. Some of the Table API row windows as
defined by FLIP-11 [1] are basically SQL OVER windows while other cannot be
easily expressed as such (TumbleRows for row-count intervals, SessionRows).
However, since Calcite already supports SQL OVER windows, we can reuse the
optimization logic for some of the Table API row windows. I also thought
about the semantics of the TumbleRows and SessionRows windows as defined in
FLIP-11 and came to the conclusion that these are not well defined in
FLIP-11 and should rather be defined as SlideRows windows with a special
PARTITION BY clause.

I propose to approach SQL OVER windows and Table API row windows as follows:

We start with three simple cases for SQL OVER windows (not Table API yet):

* OVER RANGE for event time
* OVER RANGE for processing time
* OVER ROW for processing time

All cases fulfill the following restrictions:
- All aggregations in SELECT must refer to the same window.
- PARTITION BY may not contain the rowtime attribute.
- ORDER BY must be on rowtime attribute (for event time) or on a marker
function that indicates processing time. Additional sort attributes are not
supported initially.
- only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
PRECEDING AND CURRENT ROW" are supported.

OVER ROW for event time cannot be easily supported. With event time, we may
have late records which need to be injected into the order of records. When
a record in injected in to the order where a row-count window has already
been computed, this and all following windows will change. We could either
drop the record or sent out many retraction records. I think it is best to
not open this can of worms at this point.

The rational for all of the above restrictions is to have first versions of
OVER windows soon.
Once we have the above cases covered we can extend and remove limitations
as follows:

- Table API SlideRow windows (with the same restrictions as above). This
will be mostly API work since the execution part has been solved before.
- Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
- Add support for different windows in SELECT. All windows must be
partitioned and ordered in the same way.
- Add support for additional ORDER BY attributes (besides time).

As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not
well defined, IMO.
They can be expressed as SlideRows windows with special partitioning
(partitioning on fixed, non-overlapping time ranges for TumbleRows, and
gap-separated, non-overlapping time ranges for SessionRows)
I would not start to work on those yet.

I would like to close all related JIRA issues (FLINK-4678, FLINK-4679,
FLINK-4680, FLINK-5584) and restructure the development of these features
as outlined above with corresponding JIRA issues.

What do others think? (I cc'ed the contributors assigned to the above JIRA
issues)

Best, Fabian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations


[jira] [Created] (FLINK-5618) Add queryable state documentation

2017-01-23 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5618:
--

 Summary: Add queryable state documentation
 Key: FLINK-5618
 URL: https://issues.apache.org/jira/browse/FLINK-5618
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi


Adds docs about how to use queryable state usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Strange segfaults in Flink 1.2 (probably not RocksDB related)

2017-01-23 Thread Stephan Ewen
@Robert is this about "hot code replace" in the JVM?

On Mon, Jan 23, 2017 at 1:11 PM, Robert Metzger  wrote:

> I think I saw similar segfaults when rebuilding Flink while Flink daemons
> were running out of the build-target repository. After the flink-dist jar
> has been rebuild and I tried stopping the daemons, they were running into
> these segfaults.
>
> On Mon, Jan 23, 2017 at 12:02 PM, Gyula Fóra  wrote:
>
> > Hi,
> > I tried it with and without RocksDB, no difference. Also tried with the
> > newest JDK version, still fails for that specific topology.
> >
> > On the other hand after I made some other changes in the topology
> (somwhere
> > else) I can't reproduce it anymore. So it seems to be a very specific
> > thing. To me it looks like a JVM problem, if it happens again I will let
> > you know.
> >
> > Gyula
> >
> > Stephan Ewen  ezt írta (időpont: 2017. jan. 22., V,
> > 21:16):
> >
> > > Hey!
> > >
> > > I am actually a bit puzzled how these segfaults could come, unless via
> a
> > > native library, or a JVM bug.
> > >
> > > Can you try how it behaves when not using RocksDB or using a newer JVM
> > > version?
> > >
> > > Stephan
> > >
> > >
> > > On Sun, Jan 22, 2017 at 7:51 PM, Gyula Fóra  wrote:
> > >
> > > > Hey All,
> > > >
> > > > I am trying to port some of my streaming jobs to Flink 1.2 from 1.1.4
> > > and I
> > > > have noticed some very strange segfaults. (I am running in test
> > > > environments - with the minicluster)
> > > > It is a fairly complex job so I wouldnt go into details but the
> > > interesting
> > > > part is that adding/removing a simple filter in the wrong place in
> the
> > > > topology (such as (e -> true)  or anything actually ) seems to cause
> > > > frequent segfaults during execution.
> > > >
> > > > Basically the key part looks something like:
> > > >
> > > > ...
> > > > DataStream stream =
> > > source.map().setParallelism(1)..uid("AssignFieldIds").
> > > > name("AssignFieldIds").startNewChain();
> > > > DataStream filtered = input1.filter(t -> true).setParallelism(1)
> > > > IterativeStream itStream = filtered.iterate(...)
> > > > ...
> > > >
> > > > Some notes before the actual error: replacing the filter with a map
> or
> > > > other chained transforms also leads to this problem. If the filter is
> > not
> > > > chained there is no error (or if I remove the filter).
> > > >
> > > > The error I get looks like this:
> > > > https://gist.github.com/gyfora/da713b99b1e85a681ff1a4182f001242
> > > >
> > > > I wonder if anyone has seen something like this before, or have some
> > > ideas
> > > > how to debug it. The simple work around is to not chain the filter
> but
> > > it's
> > > > still very strange.
> > > >
> > > > Regards,
> > > > Gyula
> > > >
> > >
> >
>


Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Haohui Mai
+1

We are also quite interested in these features and would love to
participate and contribute.

~Haohui

On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske  wrote:

> Hi everybody,
>
> it seems that currently several contributors are working on new features
> for the streaming Table API / SQL around row windows (as defined in FLIP-11
> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
> FLINK-5584).
> Since these efforts overlap quite a bit I spent some time thinking about
> how we can approach these features and how to avoid overlapping
> contributions.
>
> The challenge here is the following. Some of the Table API row windows as
> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot be
> easily expressed as such (TumbleRows for row-count intervals, SessionRows).
> However, since Calcite already supports SQL OVER windows, we can reuse the
> optimization logic for some of the Table API row windows. I also thought
> about the semantics of the TumbleRows and SessionRows windows as defined in
> FLIP-11 and came to the conclusion that these are not well defined in
> FLIP-11 and should rather be defined as SlideRows windows with a special
> PARTITION BY clause.
>
> I propose to approach SQL OVER windows and Table API row windows as
> follows:
>
> We start with three simple cases for SQL OVER windows (not Table API yet):
>
> * OVER RANGE for event time
> * OVER RANGE for processing time
> * OVER ROW for processing time
>
> All cases fulfill the following restrictions:
> - All aggregations in SELECT must refer to the same window.
> - PARTITION BY may not contain the rowtime attribute.
> - ORDER BY must be on rowtime attribute (for event time) or on a marker
> function that indicates processing time. Additional sort attributes are not
> supported initially.
> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
> PRECEDING AND CURRENT ROW" are supported.
>
> OVER ROW for event time cannot be easily supported. With event time, we may
> have late records which need to be injected into the order of records. When
> a record in injected in to the order where a row-count window has already
> been computed, this and all following windows will change. We could either
> drop the record or sent out many retraction records. I think it is best to
> not open this can of worms at this point.
>
> The rational for all of the above restrictions is to have first versions of
> OVER windows soon.
> Once we have the above cases covered we can extend and remove limitations
> as follows:
>
> - Table API SlideRow windows (with the same restrictions as above). This
> will be mostly API work since the execution part has been solved before.
> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> - Add support for different windows in SELECT. All windows must be
> partitioned and ordered in the same way.
> - Add support for additional ORDER BY attributes (besides time).
>
> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not
> well defined, IMO.
> They can be expressed as SlideRows windows with special partitioning
> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and
> gap-separated, non-overlapping time ranges for SessionRows)
> I would not start to work on those yet.
>
> I would like to close all related JIRA issues (FLINK-4678, FLINK-4679,
> FLINK-4680, FLINK-5584) and restructure the development of these features
> as outlined above with corresponding JIRA issues.
>
> What do others think? (I cc'ed the contributors assigned to the above JIRA
> issues)
>
> Best, Fabian
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations
>


Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Haohui Mai
Hi Fabian,

FLINK-4692 has added the support for tumbling window and we are excited to
try it out and expose it as a SQL construct.

Just curious -- what's your thought on the SQL syntax on tumbling window?

Implementation wise it might make sense to think tumbling window as a
special case of the sliding window.

The problem I see is that the OVER construct might be insufficient to
support all the use cases of tumbling windows. For example, it fails to
express tumbling windows that have fractional time units (as pointed out in
http://calcite.apache.org/docs/stream.html).

It looks to me that the Calcite / Azure Stream Analytics have introduced a
new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.

Do you think it is a good idea to follow the same conventions? Your ideas
are appreciated.

Regards,
Haohui


On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai  wrote:

> +1
>
> We are also quite interested in these features and would love to
> participate and contribute.
>
> ~Haohui
>
> On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske  wrote:
>
>> Hi everybody,
>>
>> it seems that currently several contributors are working on new features
>> for the streaming Table API / SQL around row windows (as defined in
>> FLIP-11
>> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
>> FLINK-5584).
>> Since these efforts overlap quite a bit I spent some time thinking about
>> how we can approach these features and how to avoid overlapping
>> contributions.
>>
>> The challenge here is the following. Some of the Table API row windows as
>> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot
>> be
>> easily expressed as such (TumbleRows for row-count intervals,
>> SessionRows).
>> However, since Calcite already supports SQL OVER windows, we can reuse the
>> optimization logic for some of the Table API row windows. I also thought
>> about the semantics of the TumbleRows and SessionRows windows as defined
>> in
>> FLIP-11 and came to the conclusion that these are not well defined in
>> FLIP-11 and should rather be defined as SlideRows windows with a special
>> PARTITION BY clause.
>>
>> I propose to approach SQL OVER windows and Table API row windows as
>> follows:
>>
>> We start with three simple cases for SQL OVER windows (not Table API yet):
>>
>> * OVER RANGE for event time
>> * OVER RANGE for processing time
>> * OVER ROW for processing time
>>
>> All cases fulfill the following restrictions:
>> - All aggregations in SELECT must refer to the same window.
>> - PARTITION BY may not contain the rowtime attribute.
>> - ORDER BY must be on rowtime attribute (for event time) or on a marker
>> function that indicates processing time. Additional sort attributes are
>> not
>> supported initially.
>> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
>> PRECEDING AND CURRENT ROW" are supported.
>>
>> OVER ROW for event time cannot be easily supported. With event time, we
>> may
>> have late records which need to be injected into the order of records.
>> When
>> a record in injected in to the order where a row-count window has already
>> been computed, this and all following windows will change. We could either
>> drop the record or sent out many retraction records. I think it is best to
>> not open this can of worms at this point.
>>
>> The rational for all of the above restrictions is to have first versions
>> of
>> OVER windows soon.
>> Once we have the above cases covered we can extend and remove limitations
>> as follows:
>>
>> - Table API SlideRow windows (with the same restrictions as above). This
>> will be mostly API work since the execution part has been solved before.
>> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
>> - Add support for different windows in SELECT. All windows must be
>> partitioned and ordered in the same way.
>> - Add support for additional ORDER BY attributes (besides time).
>>
>> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not
>> well defined, IMO.
>> They can be expressed as SlideRows windows with special partitioning
>> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and
>> gap-separated, non-overlapping time ranges for SessionRows)
>> I would not start to work on those yet.
>>
>> I would like to close all related JIRA issues (FLINK-4678, FLINK-4679,
>> FLINK-4680, FLINK-5584) and restructure the development of these features
>> as outlined above with corresponding JIRA issues.
>>
>> What do others think? (I cc'ed the contributors assigned to the above JIRA
>> issues)
>>
>> Best, Fabian
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations
>>
>


Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Fabian Hueske
Hi Haohui,

our plan was in fact to piggy-back on Calcite and use the TUMBLE function
[1] once is it is available (CALCITE-1345 [2]).
Unfortunately, this issue does not seem to be very active, so I don't know
what the progress is.

I would suggest to move the discussion about group windows to a separate
thread and keep this one focused on the organization of the SQL OVER
windows.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html)
[2] https://issues.apache.org/jira/browse/CALCITE-1345

2017-01-23 22:42 GMT+01:00 Haohui Mai :

> Hi Fabian,
>
> FLINK-4692 has added the support for tumbling window and we are excited to
> try it out and expose it as a SQL construct.
>
> Just curious -- what's your thought on the SQL syntax on tumbling window?
>
> Implementation wise it might make sense to think tumbling window as a
> special case of the sliding window.
>
> The problem I see is that the OVER construct might be insufficient to
> support all the use cases of tumbling windows. For example, it fails to
> express tumbling windows that have fractional time units (as pointed out in
> http://calcite.apache.org/docs/stream.html).
>
> It looks to me that the Calcite / Azure Stream Analytics have introduced a
> new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.
>
> Do you think it is a good idea to follow the same conventions? Your ideas
> are appreciated.
>
> Regards,
> Haohui
>
>
> On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai  wrote:
>
> > +1
> >
> > We are also quite interested in these features and would love to
> > participate and contribute.
> >
> > ~Haohui
> >
> > On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske  wrote:
> >
> >> Hi everybody,
> >>
> >> it seems that currently several contributors are working on new features
> >> for the streaming Table API / SQL around row windows (as defined in
> >> FLIP-11
> >> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
> >> FLINK-5584).
> >> Since these efforts overlap quite a bit I spent some time thinking about
> >> how we can approach these features and how to avoid overlapping
> >> contributions.
> >>
> >> The challenge here is the following. Some of the Table API row windows
> as
> >> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot
> >> be
> >> easily expressed as such (TumbleRows for row-count intervals,
> >> SessionRows).
> >> However, since Calcite already supports SQL OVER windows, we can reuse
> the
> >> optimization logic for some of the Table API row windows. I also thought
> >> about the semantics of the TumbleRows and SessionRows windows as defined
> >> in
> >> FLIP-11 and came to the conclusion that these are not well defined in
> >> FLIP-11 and should rather be defined as SlideRows windows with a special
> >> PARTITION BY clause.
> >>
> >> I propose to approach SQL OVER windows and Table API row windows as
> >> follows:
> >>
> >> We start with three simple cases for SQL OVER windows (not Table API
> yet):
> >>
> >> * OVER RANGE for event time
> >> * OVER RANGE for processing time
> >> * OVER ROW for processing time
> >>
> >> All cases fulfill the following restrictions:
> >> - All aggregations in SELECT must refer to the same window.
> >> - PARTITION BY may not contain the rowtime attribute.
> >> - ORDER BY must be on rowtime attribute (for event time) or on a marker
> >> function that indicates processing time. Additional sort attributes are
> >> not
> >> supported initially.
> >> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
> >> PRECEDING AND CURRENT ROW" are supported.
> >>
> >> OVER ROW for event time cannot be easily supported. With event time, we
> >> may
> >> have late records which need to be injected into the order of records.
> >> When
> >> a record in injected in to the order where a row-count window has
> already
> >> been computed, this and all following windows will change. We could
> either
> >> drop the record or sent out many retraction records. I think it is best
> to
> >> not open this can of worms at this point.
> >>
> >> The rational for all of the above restrictions is to have first versions
> >> of
> >> OVER windows soon.
> >> Once we have the above cases covered we can extend and remove
> limitations
> >> as follows:
> >>
> >> - Table API SlideRow windows (with the same restrictions as above). This
> >> will be mostly API work since the execution part has been solved before.
> >> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> >> - Add support for different windows in SELECT. All windows must be
> >> partitioned and ordered in the same way.
> >> - Add support for additional ORDER BY attributes (besides time).
> >>
> >> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are
> not
> >> well defined, IMO.
> >> They can be expressed as SlideRows windows with special partitioning
> >> (partitioning on fixed, non-overlapping time ranges for TumbleRows, and
> >> gap-separated, non-overlapping time ranges for SessionRows)
> >> I wo

Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Hongyuhong
Hi,
We are also interested in streaming sql and very willing to participate and 
contribute.

We are now in progress and we will also contribute to calcite to push forward 
the window and stream-join support.



--
Sender: Fabian Hueske [mailto:fhue...@gmail.com] 
Send Time: 2017年1月24日 5:55
Receiver: dev@flink.apache.org
Theme: Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for 
streaming tables

Hi Haohui,

our plan was in fact to piggy-back on Calcite and use the TUMBLE function [1] 
once is it is available (CALCITE-1345 [2]).
Unfortunately, this issue does not seem to be very active, so I don't know what 
the progress is.

I would suggest to move the discussion about group windows to a separate thread 
and keep this one focused on the organization of the SQL OVER windows.

Best,
Fabian

[1] http://calcite.apache.org/docs/stream.html)
[2] https://issues.apache.org/jira/browse/CALCITE-1345

2017-01-23 22:42 GMT+01:00 Haohui Mai :

> Hi Fabian,
>
> FLINK-4692 has added the support for tumbling window and we are 
> excited to try it out and expose it as a SQL construct.
>
> Just curious -- what's your thought on the SQL syntax on tumbling window?
>
> Implementation wise it might make sense to think tumbling window as a 
> special case of the sliding window.
>
> The problem I see is that the OVER construct might be insufficient to 
> support all the use cases of tumbling windows. For example, it fails 
> to express tumbling windows that have fractional time units (as 
> pointed out in http://calcite.apache.org/docs/stream.html).
>
> It looks to me that the Calcite / Azure Stream Analytics have 
> introduced a new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.
>
> Do you think it is a good idea to follow the same conventions? Your 
> ideas are appreciated.
>
> Regards,
> Haohui
>
>
> On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai  wrote:
>
> > +1
> >
> > We are also quite interested in these features and would love to 
> > participate and contribute.
> >
> > ~Haohui
> >
> > On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske  wrote:
> >
> >> Hi everybody,
> >>
> >> it seems that currently several contributors are working on new 
> >> features for the streaming Table API / SQL around row windows (as 
> >> defined in
> >> FLIP-11
> >> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680, 
> >> FLINK-5584).
> >> Since these efforts overlap quite a bit I spent some time thinking 
> >> about how we can approach these features and how to avoid 
> >> overlapping contributions.
> >>
> >> The challenge here is the following. Some of the Table API row 
> >> windows
> as
> >> defined by FLIP-11 [1] are basically SQL OVER windows while other 
> >> cannot be easily expressed as such (TumbleRows for row-count 
> >> intervals, SessionRows).
> >> However, since Calcite already supports SQL OVER windows, we can 
> >> reuse
> the
> >> optimization logic for some of the Table API row windows. I also 
> >> thought about the semantics of the TumbleRows and SessionRows 
> >> windows as defined in
> >> FLIP-11 and came to the conclusion that these are not well defined 
> >> in
> >> FLIP-11 and should rather be defined as SlideRows windows with a 
> >> special PARTITION BY clause.
> >>
> >> I propose to approach SQL OVER windows and Table API row windows as
> >> follows:
> >>
> >> We start with three simple cases for SQL OVER windows (not Table 
> >> API
> yet):
> >>
> >> * OVER RANGE for event time
> >> * OVER RANGE for processing time
> >> * OVER ROW for processing time
> >>
> >> All cases fulfill the following restrictions:
> >> - All aggregations in SELECT must refer to the same window.
> >> - PARTITION BY may not contain the rowtime attribute.
> >> - ORDER BY must be on rowtime attribute (for event time) or on a 
> >> marker function that indicates processing time. Additional sort 
> >> attributes are not supported initially.
> >> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x 
> >> PRECEDING AND CURRENT ROW" are supported.
> >>
> >> OVER ROW for event time cannot be easily supported. With event 
> >> time, we may have late records which need to be injected into the 
> >> order of records.
> >> When
> >> a record in injected in to the order where a row-count window has
> already
> >> been computed, this and all following windows will change. We could
> either
> >> drop the record or sent out many retraction records. I think it is 
> >> best
> to
> >> not open this can of worms at this point.
> >>
> >> The rational for all of the above restrictions is to have first 
> >> versions of OVER windows soon.
> >> Once we have the above cases covered we can extend and remove
> limitations
> >> as follows:
> >>
> >> - Table API SlideRow windows (with the same restrictions as above). 
> >> This will be mostly API work since the execution part has been solved 
> >> before.
> >> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> >> - Add support for different windows in

Re: [Dev] Flink 'InputFormat' Interface execution related problem

2017-01-23 Thread Pawan Manishka Gunarathna
Hi,
Thanks for your help. Since Our data source has database tables
architecture I have a thought of follow that 'JDBCInputFormat' in Flink. It
would be great if you can provide some information regarding how that
JDBCInputFormat execution happens?

Thanks,
Pawan

On Mon, Jan 23, 2017 at 4:18 PM, Fabian Hueske  wrote:

> Hi Pawan,
>
> I don't this this works. The InputSplits are generated by the JobManager,
> i.e., not in parallel by a single process.
> After the parallel InputFormats have been started on the TaskManagers, they
> request InputSplits and open() them. If there are no InputSplits there is
> no work to be done and open will not be called.
> You can tweak the behavior by implementing your own InputSplits and
> InputSplitAssigner which assigns exactly one input split to each task.
>
> Fabian
>
> 2017-01-23 8:44 GMT+01:00 Pawan Manishka Gunarathna <
> pawan.manis...@gmail.com>:
>
> > Hi,
> >
> > When we are implementing that Flink *InputFormat* Interface, if we have
> > that*
> > input split creation* part in our data analytics server APIs can we
> > directly go to the second phase of the flink InputFormat Interface
> > execution.
> >
> > Basically I need to know that can we read those InputSplits directly,
> > without generating InputSplits inside the InputFormat Interface. So it
> > would be great if you can provide any kind of help.
> >
> > Thanks,
> > Pawan
> >
> > --
> >
> > *Pawan Gunaratne*
> > *Mob: +94 770373556*
> >
>



-- 

*Pawan Gunaratne*
*Mob: +94 770373556*


Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread Jark Wu
Hi Fabian, 

Thanks for bringing up this discussion and the nice approach to avoid 
overlapping contributions.

All of these make sense to me. But I have some questions.

Q1: If I understand correctly, we will not support TumbleRows and SessionRows 
at the beginning. But maybe support them as a syntax sugar (in Table API) when 
the SlideRows is supported in the future. Right ? 

Q2: How to support SessionRows based on SlideRows ?  I don't get how to 
partition on "gap-separated".

Q3: Should we break down the approach into smaller tasks for streaming tables 
and batch tables ? 

Q4: The implementaion of SlideRows still need a custom operator that collects 
records in a priority queue ordered by the "rowtime", which is similar to the 
design we discussed in FLINK-4697, right? 

+1 not support for OVER ROW for event time at this point.

Regards, Jark


> 在 2017年1月24日,上午10:28,Hongyuhong  写道:
> 
> Hi,
> We are also interested in streaming sql and very willing to participate and 
> contribute.
> 
> We are now in progress and we will also contribute to calcite to push forward 
> the window and stream-join support.
> 
> 
> 
> --
> Sender: Fabian Hueske [mailto:fhue...@gmail.com] 
> Send Time: 2017年1月24日 5:55
> Receiver: dev@flink.apache.org
> Theme: Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for 
> streaming tables
> 
> Hi Haohui,
> 
> our plan was in fact to piggy-back on Calcite and use the TUMBLE function [1] 
> once is it is available (CALCITE-1345 [2]).
> Unfortunately, this issue does not seem to be very active, so I don't know 
> what the progress is.
> 
> I would suggest to move the discussion about group windows to a separate 
> thread and keep this one focused on the organization of the SQL OVER windows.
> 
> Best,
> Fabian
> 
> [1] http://calcite.apache.org/docs/stream.html)
> [2] https://issues.apache.org/jira/browse/CALCITE-1345
> 
> 2017-01-23 22:42 GMT+01:00 Haohui Mai :
> 
>> Hi Fabian,
>> 
>> FLINK-4692 has added the support for tumbling window and we are 
>> excited to try it out and expose it as a SQL construct.
>> 
>> Just curious -- what's your thought on the SQL syntax on tumbling window?
>> 
>> Implementation wise it might make sense to think tumbling window as a 
>> special case of the sliding window.
>> 
>> The problem I see is that the OVER construct might be insufficient to 
>> support all the use cases of tumbling windows. For example, it fails 
>> to express tumbling windows that have fractional time units (as 
>> pointed out in http://calcite.apache.org/docs/stream.html).
>> 
>> It looks to me that the Calcite / Azure Stream Analytics have 
>> introduced a new construct (TUMBLE / TUMBLINGWINDOW) to address this issue.
>> 
>> Do you think it is a good idea to follow the same conventions? Your 
>> ideas are appreciated.
>> 
>> Regards,
>> Haohui
>> 
>> 
>> On Mon, Jan 23, 2017 at 1:02 PM Haohui Mai  wrote:
>> 
>>> +1
>>> 
>>> We are also quite interested in these features and would love to 
>>> participate and contribute.
>>> 
>>> ~Haohui
>>> 
>>> On Mon, Jan 23, 2017 at 7:31 AM Fabian Hueske  wrote:
>>> 
 Hi everybody,
 
 it seems that currently several contributors are working on new 
 features for the streaming Table API / SQL around row windows (as 
 defined in
 FLIP-11
 [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680, 
 FLINK-5584).
 Since these efforts overlap quite a bit I spent some time thinking 
 about how we can approach these features and how to avoid 
 overlapping contributions.
 
 The challenge here is the following. Some of the Table API row 
 windows
>> as
 defined by FLIP-11 [1] are basically SQL OVER windows while other 
 cannot be easily expressed as such (TumbleRows for row-count 
 intervals, SessionRows).
 However, since Calcite already supports SQL OVER windows, we can 
 reuse
>> the
 optimization logic for some of the Table API row windows. I also 
 thought about the semantics of the TumbleRows and SessionRows 
 windows as defined in
 FLIP-11 and came to the conclusion that these are not well defined 
 in
 FLIP-11 and should rather be defined as SlideRows windows with a 
 special PARTITION BY clause.
 
 I propose to approach SQL OVER windows and Table API row windows as
 follows:
 
 We start with three simple cases for SQL OVER windows (not Table 
 API
>> yet):
 
 * OVER RANGE for event time
 * OVER RANGE for processing time
 * OVER ROW for processing time
 
 All cases fulfill the following restrictions:
 - All aggregations in SELECT must refer to the same window.
 - PARTITION BY may not contain the rowtime attribute.
 - ORDER BY must be on rowtime attribute (for event time) or on a 
 marker function that indicates processing time. Additional sort 
 attributes are not supported initially.
 - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"

[Dev] Issue related to importing org.apache.flink.types.Row

2017-01-23 Thread Pawan Manishka Gunarathna
Hi,

I'm currently working on Flink InputFormat related implementation. So I
have written class as follows.

public class MyInputFormat extends RichInputFormat
implements ResultTypeQueryable {

But here I'm getting error with *Row *parameter. I can't import that
*org.apache.flink.types.Row.*
I have added the needed dependencies to the maven pom.xml as well. So how
can I solve this issue...

Thanks,
Pawan
-- 

*Pawan Gunaratne*
*Mob: +94 770373556*


Re: [DISCUSS] Development of SQL OVER / Table API Row Windows for streaming tables

2017-01-23 Thread jincheng sun
Hello Fabian,
Your plan looks good, I totally agree with your points.
While I am working on FLINK-4680, I had the similar concerns about the
semantics of TumbleRows and SessionRows. It is much clear if we define
these windows as SlideRows with PARTITION BY clause.
Regarding to the implementation plan of Table API row windows, I would also
like to share my ideas/thoughts on OVER window obtained while I am
developing FLINK-4680:

- Table API SlideRow windows (with the same restrictions as above). This
will be mostly API work since the execution part has been solved before.
Though the sliding window can work for the bounded preceding, but it is not
sufficient to support unbounded preceding. For instance, we may potentially
use SlidingProcessingTimeWindows and ProcessingTimeTrigger to implement
“OVER RANGE for processing time”, but we still need to provide a certain
fixed window size, which is not proper for unbounded processing. Same
problems exist for ”OVER RANGE for event time“  and “OVER ROW for
processing time”. Therefore, we need a new window assigner and trigger for
unbounded preceding, say SlideRowGlobalWindows and
SlideRowGlobalWindowXXXTrigger. What do you think?

- Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
If I understand you correctly, you want to implement the SlideRow windows
first without the support of FOLLOWING(I guess you want to leverage the
existing SlidingProcessing(Event)TimeWindows and
Processing(Event)TimeTrigger?). IMO, when we implement SlideRow windows, we
could just provide a new WindowAssigner and trigger, which can support both
bounded preceding and following semantics (current row is just a special
case of FOLLOWING where the following row is equal to 0). What do you think?

- Add support for additional ORDER BY attributes (besides time).
This is an important and a necessary part for OVER. But to achieve this, we
probably need a sorted state backend, maybe sortedMapstate? Is it also
included in your plan.

Best,
SunJincheng

2017-01-23 23:30 GMT+08:00 Fabian Hueske :

> Hi everybody,
>
> it seems that currently several contributors are working on new features
> for the streaming Table API / SQL around row windows (as defined in FLIP-11
> [1]) and SQL OVER-style window (FLINK-4678, FLINK-4679, FLINK-4680,
> FLINK-5584).
> Since these efforts overlap quite a bit I spent some time thinking about
> how we can approach these features and how to avoid overlapping
> contributions.
>
> The challenge here is the following. Some of the Table API row windows as
> defined by FLIP-11 [1] are basically SQL OVER windows while other cannot be
> easily expressed as such (TumbleRows for row-count intervals, SessionRows).
> However, since Calcite already supports SQL OVER windows, we can reuse the
> optimization logic for some of the Table API row windows. I also thought
> about the semantics of the TumbleRows and SessionRows windows as defined in
> FLIP-11 and came to the conclusion that these are not well defined in
> FLIP-11 and should rather be defined as SlideRows windows with a special
> PARTITION BY clause.
>
> I propose to approach SQL OVER windows and Table API row windows as
> follows:
>
> We start with three simple cases for SQL OVER windows (not Table API yet):
>
> * OVER RANGE for event time
> * OVER RANGE for processing time
> * OVER ROW for processing time
>
> All cases fulfill the following restrictions:
> - All aggregations in SELECT must refer to the same window.
> - PARTITION BY may not contain the rowtime attribute.
> - ORDER BY must be on rowtime attribute (for event time) or on a marker
> function that indicates processing time. Additional sort attributes are not
> supported initially.
> - only "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "BETWEEN x
> PRECEDING AND CURRENT ROW" are supported.
>
> OVER ROW for event time cannot be easily supported. With event time, we
> may have late records which need to be injected into the order of records.
> When a record in injected in to the order where a row-count window has
> already been computed, this and all following windows will change. We could
> either drop the record or sent out many retraction records. I think it is
> best to not open this can of worms at this point.
>
> The rational for all of the above restrictions is to have first versions
> of OVER windows soon.
> Once we have the above cases covered we can extend and remove limitations
> as follows:
>
> - Table API SlideRow windows (with the same restrictions as above). This
> will be mostly API work since the execution part has been solved before.
> - Add support for FOLLOWING (except UNBOUNDED FOLLOWING)
> - Add support for different windows in SELECT. All windows must be
> partitioned and ordered in the same way.
> - Add support for additional ORDER BY attributes (besides time).
>
> As I said before, TumbleRows and SessionRows windows as in FLIP-11 are not
> well defined, IMO.
> They can be expressed as SlideRows windows with special partitioning
> (pa