[jira] [Created] (FLINK-3638) Invalid default ports in documentation
Maxim Dobryakov created FLINK-3638: -- Summary: Invalid default ports in documentation Key: FLINK-3638 URL: https://issues.apache.org/jira/browse/FLINK-3638 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.0.0 Reporter: Maxim Dobryakov [Documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html] has invalid information about ports by default. For example look at `taskmanager.data.port` option. It has default port 6121 in documentation but [in code|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/configuration/ConfigConstants.java#L615] default port set to 0. Please review all ports in documentation and set valid default values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment
Vasia Kalavri created FLINK-3639: Summary: Add methods and utilities to register DataSets and Tables in the TableEnvironment Key: FLINK-3639 URL: https://issues.apache.org/jira/browse/FLINK-3639 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 1.1.0 Reporter: Vasia Kalavri In order to make tables queryable from SQL we need to register them under a unique name in the TableEnvironment. [This design document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit] describes the proposed API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3640) Add support for SQL queries in DataSet programs
Vasia Kalavri created FLINK-3640: Summary: Add support for SQL queries in DataSet programs Key: FLINK-3640 URL: https://issues.apache.org/jira/browse/FLINK-3640 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 1.1.0 Reporter: Vasia Kalavri This issue covers the task of supporting SQL queries embedded in DataSet programs. In this mode, the input and output of a SQL query is a Table. For this issue, we need to make the following additions to the Table API: - add a {{tEnv.sql(query: String): Table}} method for converting a query result into a Table - integrate Calcite's SQL parser into the batch Table API translation process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3641) Document registerCachedFile API call
Till Rohrmann created FLINK-3641: Summary: Document registerCachedFile API call Key: FLINK-3641 URL: https://issues.apache.org/jira/browse/FLINK-3641 Project: Flink Issue Type: Improvement Components: Java API, Scala API Affects Versions: 1.1.0 Reporter: Till Rohrmann Priority: Minor Flink's stable API supports the {{registerCachedFile}} API call at the {{ExecutionEnvironment}}. However, it is nowhere mentioned in the online documentation. Furthermore, the {{DistributedCache}} is also not explained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3642) Disentangle ExecutionConfig
Till Rohrmann created FLINK-3642: Summary: Disentangle ExecutionConfig Key: FLINK-3642 URL: https://issues.apache.org/jira/browse/FLINK-3642 Project: Flink Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Till Rohrmann Initially, the {{ExecutionConfig}} started out being a configuration to configure the behaviour of the system with respect to the associated job. As such it stored information about the restart strategy, registered types and the parallelism of the job. However, it happened that the {{ExecutionConfig}} has become more of an easy entry-point to pass information into the system. As such, the user can now set arbitrary information as part of the {{GlobalJobParameters}} in the {{ExecutionConfig}} which is piped to all kinds of different locations in the system, e.g. the serializers, JM, ExecutionGraph, TM, etc. This mixture of user code classes with system parameters makes it really cumbersome to send system information around, because you always need a user code class loader to deserialize it. Furthermore, there are different means how the {{ExecutionConfig}} is passed to the system. One is giving it to the {{Serializers}} created in the JavaAPIPostPass and another is giving it directly to the {{JobGraph}}, for example. The problem is that the {{ExecutionConfig}} contains information which is required at different stages of a program execution. I think it would be beneficial to disentangle the {{ExecutionConfig}} a little bit along the lines of the different concerns for which the {{ExecutionConfig}} is used currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3643) Improve Window Triggers
Aljoscha Krettek created FLINK-3643: --- Summary: Improve Window Triggers Key: FLINK-3643 URL: https://issues.apache.org/jira/browse/FLINK-3643 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 1.0.0 Reporter: Aljoscha Krettek I think there are several shortcomings in the current window trigger system and I started a document to keep track of them: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing The document is work-in-progress and I encourage everyone to read it and make suggestions: We'll keep this issue to keep track of any sub-issues that we open for parts that we want to improve. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[DISCUSS] Improving Trigger/Window API and Semantics
Hi, I’m also sending this to @user because the Trigger API concerns users directly. There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). Please read it if you are interested and want to get involved in this. We’ll evolve the document together and come up with Jira issues for the subtasks. Cheers, Aljoscha
Next steps: SQL / StreamSQL support
Hi everybody, on Friday we merged the working branch to put the Table API on top of Calcite back to master. This was the first step towards adding SQL support to Flink as outlined in the design document [1] (the document was updated to reflect design decisions done while implementing task 1). According to the design doc, the next step is to add support for SQL queries on DataSets and Table API Tables. We created two JIRA issues to track this effort: - FLINK-3639: Add methods to register DataSets and Tables in TableEnvironment - FLINK-3640: Add support for SQL queries on registered DataSets and Tables Subsequent efforts will be to add support for SQL queries on external tables (CSV, Parquet, etc files, DBMS, etc.), extending coverage of SQL standard (sort, outer joins, etc.), and defining table sinks to emit the result. The following document shows the syntax to register tables (DataSets, DataStreams, Tables, external sources), query them, and to define table sinks to write a Table to an external storage system [2]. At the same time, we are working on extending the Table API for streaming tables (FLINK-3547). As usual, feedback, comments, and contributions are highly welcome :-) Best, Fabian [1] https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI [2] https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0
Re: Next steps: SQL / StreamSQL support
Thanks for the nice summary and for updating the design documents Fabian! As we proceed with the upcoming tasks, we should also go through existing JIRAs and update them, too. There are some old issues referring to SQL and adding external data sources, but these were created before the decision of using Calcite. It would be nice to clean up theTable API JIRAs a bit by removing the invalid issues and updating the ones that are still relevant. Cheers, -Vasia. On 21 March 2016 at 17:56, Fabian Hueske wrote: > Hi everybody, > > on Friday we merged the working branch to put the Table API on top of > Calcite back to master. > This was the first step towards adding SQL support to Flink as outlined in > the design document [1] (the document was updated to reflect design > decisions done while implementing task 1). > > According to the design doc, the next step is to add support for SQL > queries on DataSets and Table API Tables. We created two JIRA issues to > track this effort: > - FLINK-3639: Add methods to register DataSets and Tables in > TableEnvironment > - FLINK-3640: Add support for SQL queries on registered DataSets and Tables > > Subsequent efforts will be to add support for SQL queries on external > tables (CSV, Parquet, etc files, DBMS, etc.), extending coverage of SQL > standard (sort, outer joins, etc.), and defining table sinks to emit the > result. > > The following document shows the syntax to register tables (DataSets, > DataStreams, Tables, external sources), query them, and to define table > sinks to write a Table to an external storage system [2]. > > At the same time, we are working on extending the Table API for streaming > tables (FLINK-3547). > > As usual, feedback, comments, and contributions are highly welcome :-) > > Best, Fabian > > [1] > > https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI > [2] > > https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0 >
Re: [DISCUSS] Improving Trigger/Window API and Semantics
Hi, my previous message might be a bit hard to parse for people that are not very deep into the Trigger implementation. So I’ll try to give a bit more explanation right in the mail. The basic idea is that we observed some basic problems that keep coming up for people on the mailing lists and I want to try and address them. The first problem is with the Trigger semantics and the confusion between triggers that purge the window contents and those that don’t. (For example, using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad idea because state will be kept indefinitely.) While working on this we should also tacke the issue of providing composite triggers such as Repeatedly (fires a child-trigger repeatedly), Any (fires when any child trigger fires) and All (fires when all child triggers fire). Lateness. Right now, it is possible to write custom triggers that can deal with late elements and can even behave differently based on the amount of lateness. There is, however, no API for dealing with lateness. We should address this. The third issue is Trigger testability. We should introduce a testing harness for triggers and move the processing time triggers to use a clock provider instead of directly using System.currentTimeMillis(). This will allow testing them deterministically. All of these are expanded upon in the document I linked to before: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing I think all of this is very important for people working on event-time based pipelines. Feedback is very welcome and I hope that we can expand the document together and come up with good solutions. Cheers, Aljoscha > On 21 Mar 2016, at 17:46, Aljoscha Krettek wrote: > > Hi, > I’m also sending this to @user because the Trigger API concerns users > directly. > > There are some things in the Trigger API that I think require some > improvements. The issues are trigger testability, fire semantics and > composite triggers and lateness. I started a document to keep track of things > (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). > Please read it if you are interested and want to get involved in this. We’ll > evolve the document together and come up with Jira issues for the subtasks. > > Cheers, > Aljoscha
[jira] [Created] (FLINK-3644) WebRuntimMonitor set java.io.tmpdir does not work for change upload dir.
astralidea created FLINK-3644: - Summary: WebRuntimMonitor set java.io.tmpdir does not work for change upload dir. Key: FLINK-3644 URL: https://issues.apache.org/jira/browse/FLINK-3644 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 1.0.0 Environment: flink-conf.yaml -> java.io.tmpdir: . java -server -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:+UseCompressedOops -XX:+UseFastEmptyMethods -XX:+UseFastAccessorMethods -XX:+AlwaysPreTouch -Xmx1707m -Dlog4j.configuration=file:log4j-mesos.properties -Djava.io.tmpdir=. -cp flink-dist_2.10-1.0.0.jar:log4j-1.2.17.jar:slf4j-log4j12-1.7.7.jar:flink-python_2.10-1.0.0.jar java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode) CentOS release 6.4 (Final) Reporter: astralidea flink-conf.yaml & -Djava.io.tmpdir=. does not work for me. I don't know why.I look for the code System.getProperty("java.io.tmpdir") should work.but it is not worked. but in web ui in job manager configuration could see the java.io.tmpdir is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)