[jira] [Created] (FLINK-3638) Invalid default ports in documentation

2016-03-21 Thread Maxim Dobryakov (JIRA)
Maxim Dobryakov created FLINK-3638:
--

 Summary: Invalid default ports in documentation
 Key: FLINK-3638
 URL: https://issues.apache.org/jira/browse/FLINK-3638
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.0.0
Reporter: Maxim Dobryakov


[Documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html]
 has invalid information about ports by default. 

For example look at `taskmanager.data.port` option. It has default port 6121 in 
documentation but [in 
code|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/ConfigConstants.java#L615]
 default port set to 0.

Please review all ports in documentation and set valid default values.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment

2016-03-21 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3639:


 Summary: Add methods and utilities to register DataSets and Tables 
in the TableEnvironment
 Key: FLINK-3639
 URL: https://issues.apache.org/jira/browse/FLINK-3639
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


In order to make tables queryable from SQL we need to register them under a 
unique name in the TableEnvironment.
[This design 
document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit]
 describes the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3640) Add support for SQL queries in DataSet programs

2016-03-21 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3640:


 Summary: Add support for SQL queries in DataSet programs
 Key: FLINK-3640
 URL: https://issues.apache.org/jira/browse/FLINK-3640
 Project: Flink
  Issue Type: New Feature
  Components: Table API
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


This issue covers the task of supporting SQL queries embedded in DataSet 
programs. In this mode, the input and output of a SQL query is a Table. For 
this issue, we need to make the following additions to the Table API:
- add a {{tEnv.sql(query: String): Table}} method for converting a query result 
into a Table
- integrate Calcite's SQL parser into the batch Table API translation process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3641) Document registerCachedFile API call

2016-03-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3641:


 Summary: Document registerCachedFile API call
 Key: FLINK-3641
 URL: https://issues.apache.org/jira/browse/FLINK-3641
 Project: Flink
  Issue Type: Improvement
  Components: Java API, Scala API
Affects Versions: 1.1.0
Reporter: Till Rohrmann
Priority: Minor


Flink's stable API supports the {{registerCachedFile}} API call at the 
{{ExecutionEnvironment}}. However, it is nowhere mentioned in the online 
documentation. Furthermore, the {{DistributedCache}} is also not explained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3642) Disentangle ExecutionConfig

2016-03-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3642:


 Summary: Disentangle ExecutionConfig
 Key: FLINK-3642
 URL: https://issues.apache.org/jira/browse/FLINK-3642
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Till Rohrmann


Initially, the {{ExecutionConfig}} started out being a configuration to 
configure the behaviour of the system with respect to the associated job. As 
such it stored information about the restart strategy, registered types and the 
parallelism of the job. However, it happened that the {{ExecutionConfig}} has 
become more of an easy entry-point to pass information into the system. As 
such, the user can now set arbitrary information as part of the 
{{GlobalJobParameters}} in the {{ExecutionConfig}} which is piped to all kinds 
of different locations in the system, e.g. the serializers, JM, ExecutionGraph, 
TM, etc. 

This mixture of user code classes with system parameters makes it really 
cumbersome to send system information around, because you always need a user 
code class loader to deserialize it. Furthermore, there are different means how 
the {{ExecutionConfig}} is passed to the system. One is giving it to the 
{{Serializers}} created in the JavaAPIPostPass and another is giving it 
directly to the {{JobGraph}}, for example. The problem is that the 
{{ExecutionConfig}} contains information which is required at different stages 
of a program execution.

I think it would be beneficial to disentangle the {{ExecutionConfig}} a little 
bit along the lines of the different concerns for which the {{ExecutionConfig}} 
is used currently. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3643) Improve Window Triggers

2016-03-21 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-3643:
---

 Summary: Improve Window Triggers
 Key: FLINK-3643
 URL: https://issues.apache.org/jira/browse/FLINK-3643
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.0.0
Reporter: Aljoscha Krettek


I think there are several shortcomings in the current window trigger system and 
I started a document to keep track of them: 
https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing

The document is work-in-progress and I encourage everyone to read it and make 
suggestions:

We'll keep this issue to keep track of any sub-issues that we open for parts 
that we want to improve.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] Improving Trigger/Window API and Semantics

2016-03-21 Thread Aljoscha Krettek
Hi,
I’m also sending this to @user because the Trigger API concerns users directly.

There are some things in the Trigger API that I think require some 
improvements. The issues are trigger testability, fire semantics and composite 
triggers and lateness. I started a document to keep track of things 
(https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing).
 Please read it if you are interested and want to get involved in this. We’ll 
evolve the document together and come up with Jira issues for the subtasks. 

Cheers,
Aljoscha

Next steps: SQL / StreamSQL support

2016-03-21 Thread Fabian Hueske
Hi everybody,

on Friday we merged the working branch to put the Table API on top of
Calcite back to master.
This was the first step towards adding SQL support to Flink as outlined in
the design document [1] (the document was updated to reflect design
decisions done while implementing task 1).

According to the design doc, the next step is to add support for SQL
queries on DataSets and Table API Tables.  We created two JIRA issues to
track this effort:
- FLINK-3639: Add methods to register DataSets and Tables in
TableEnvironment
- FLINK-3640: Add support for SQL queries on registered DataSets and Tables

Subsequent efforts will be to add support for SQL queries on external
tables (CSV, Parquet, etc files, DBMS, etc.), extending coverage of SQL
standard (sort, outer joins, etc.), and defining table sinks to emit the
result.

The following document shows the syntax to register tables (DataSets,
DataStreams, Tables, external sources), query them, and to define table
sinks to write a Table to an external storage system [2].

At the same time, we are working on extending the Table API for streaming
tables (FLINK-3547).

As usual, feedback, comments, and contributions are highly welcome :-)

Best, Fabian

[1]
https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI
[2]
https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0


Re: Next steps: SQL / StreamSQL support

2016-03-21 Thread Vasiliki Kalavri
Thanks for the nice summary and for updating the design documents Fabian!

As we proceed with the upcoming tasks, we should also go through existing
JIRAs and update them, too.
There are some old issues referring to SQL and adding external data
sources, but these were created before the decision of using Calcite. It
would be nice to clean up theTable API JIRAs a bit by removing the invalid
issues and updating the ones that are still relevant.

Cheers,
-Vasia.

On 21 March 2016 at 17:56, Fabian Hueske  wrote:

> Hi everybody,
>
> on Friday we merged the working branch to put the Table API on top of
> Calcite back to master.
> This was the first step towards adding SQL support to Flink as outlined in
> the design document [1] (the document was updated to reflect design
> decisions done while implementing task 1).
>
> According to the design doc, the next step is to add support for SQL
> queries on DataSets and Table API Tables.  We created two JIRA issues to
> track this effort:
> - FLINK-3639: Add methods to register DataSets and Tables in
> TableEnvironment
> - FLINK-3640: Add support for SQL queries on registered DataSets and Tables
>
> Subsequent efforts will be to add support for SQL queries on external
> tables (CSV, Parquet, etc files, DBMS, etc.), extending coverage of SQL
> standard (sort, outer joins, etc.), and defining table sinks to emit the
> result.
>
> The following document shows the syntax to register tables (DataSets,
> DataStreams, Tables, external sources), query them, and to define table
> sinks to write a Table to an external storage system [2].
>
> At the same time, we are working on extending the Table API for streaming
> tables (FLINK-3547).
>
> As usual, feedback, comments, and contributions are highly welcome :-)
>
> Best, Fabian
>
> [1]
>
> https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI
> [2]
>
> https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0
>


Re: [DISCUSS] Improving Trigger/Window API and Semantics

2016-03-21 Thread Aljoscha Krettek
Hi,
my previous message might be a bit hard to parse for people that are not very 
deep into the Trigger implementation. So I’ll try to give a bit more 
explanation right in the mail.

The basic idea is that we observed some basic problems that keep coming up for 
people on the mailing lists and I want to try and address them.

The first problem is with the Trigger semantics and the confusion between 
triggers that purge the window contents and those that don’t. (For example, 
using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad idea 
because state will be kept indefinitely.) While working on this we should also 
tacke the issue of providing composite triggers such as Repeatedly (fires a 
child-trigger repeatedly), Any (fires when any child trigger fires) and All 
(fires when all child triggers fire). 

Lateness. Right now, it is possible to write custom triggers that can deal with 
late elements and can even behave differently based on the amount of lateness. 
There is, however, no API for dealing with lateness. We should address this.

The third issue is Trigger testability. We should introduce a testing harness 
for triggers and move the processing time triggers to use a clock provider 
instead of directly using System.currentTimeMillis(). This will allow testing 
them deterministically.

All of these are expanded upon in the document I linked to before: 
https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing
 I think all of this is very important for people working on event-time based 
pipelines. 

Feedback is very welcome and I hope that we can expand the document together 
and come up with good solutions.

Cheers,
Aljoscha
> On 21 Mar 2016, at 17:46, Aljoscha Krettek  wrote:
> 
> Hi,
> I’m also sending this to @user because the Trigger API concerns users 
> directly.
> 
> There are some things in the Trigger API that I think require some 
> improvements. The issues are trigger testability, fire semantics and 
> composite triggers and lateness. I started a document to keep track of things 
> (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing).
>  Please read it if you are interested and want to get involved in this. We’ll 
> evolve the document together and come up with Jira issues for the subtasks. 
> 
> Cheers,
> Aljoscha



[jira] [Created] (FLINK-3644) WebRuntimMonitor set java.io.tmpdir does not work for change upload dir.

2016-03-21 Thread astralidea (JIRA)
astralidea created FLINK-3644:
-

 Summary: WebRuntimMonitor set java.io.tmpdir does not work for 
change upload dir.
 Key: FLINK-3644
 URL: https://issues.apache.org/jira/browse/FLINK-3644
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 1.0.0
 Environment: flink-conf.yaml -> java.io.tmpdir: .
java -server -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:+UseCompressedOops 
-XX:+UseFastEmptyMethods -XX:+UseFastAccessorMethods -XX:+AlwaysPreTouch 
-Xmx1707m -Dlog4j.configuration=file:log4j-mesos.properties -Djava.io.tmpdir=. 
-cp 
flink-dist_2.10-1.0.0.jar:log4j-1.2.17.jar:slf4j-log4j12-1.7.7.jar:flink-python_2.10-1.0.0.jar
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
CentOS release 6.4 (Final)
Reporter: astralidea


flink-conf.yaml & -Djava.io.tmpdir=. does not work for me.
I don't know why.I look for the code System.getProperty("java.io.tmpdir") 
should work.but it is not worked.
but in web ui in job manager configuration could see the java.io.tmpdir is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)