Re: [PROPOSAL] Improving Flink’s timer management for large state

2018-05-29 Thread Till Rohrmann
Thanks for the great design document Stefan. Unifying how Flink handles
state such that all checkpointable state is maintained by the StateBackend
makes a lot of sense to me. Also making the timer service scalable and
adding support for asynchronous checkpoints is really important for many
Flink users. Consequently +1 for this improvement.

Cheers,
Till

On Mon, May 28, 2018 at 9:45 AM, Stefan Richter  wrote:

> Thanks for the positive feedback so far!
>
> @Sihua: I totally agree with your comments about improvements in
> performance for the existing RocksDB timer code. In fact, that is why I
> phrased it like „ implementation that is loosely based on some ideas“ to
> point out a solution can look roughly like the existing code but that it
> probably should *not* be a simple copy-paste. As for the comment of
> supporting heap timers with RocksDB state, I think there is nothing that
> speaks fundamentally against it and I think the design can be in a way to
> support that. It just makes the configuration more complex and we need to
> slightly „special case“ in incremental checkpoints. I was already wondering
> if heap timers with RocksDB state could not already become a byproduct of a
> stepwise implementation, i.e. when the first step of the plan is pushing
> timer state into the backends and there does not yet exist a RocksDB timer
> state.
>
> > Am 27.05.2018 um 10:03 schrieb sihua zhou :
> >
> >
> >
> > I also +1 for this very good proposal!
> > In general, the design is good, especially the part the related to the
> timer on Heap, but refer to the part of the timer on RocksDB, I think there
> may still exist some improvement that we can do, I just left the comments
> on the doc.
> >
> >
> > Best, Sihua
> >
> >
> >
> >
> > On 05/27/2018 15:31,Bowen Li wrote:
> > +1 LGTM. RocksDB timer service is one of the most highly anticipated
> > features from Flink users, and it's finally coming, officially. I also
> > would love to see bringing timer more closely to state backend, for the
> > sake of easier development and maintenance of code.
> >
> > On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <
> s.rich...@data-artisans.com
> > wrote:
> >
> > Hi,
> >
> > I am currently planning how to improve Flink’s timer management for large
> > state. In particular, I would like to introduce timer state that is
> managed
> > in RocksDB and also to improve the capabilities of the heap-based timer
> > service, e.g. support for asynchronous checkpoints. You can find a short
> > outline of my planned approach in this document:
> >
> > https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
> > 26Pz04EvxdA7Jc/edit?usp=sharing
> >
> > As always, your questions, feedback, and comments are highly appreciated.
> >
> > Best,
> > Stefan
>
>


[jira] [Created] (FLINK-9461) Disentangle flink-connector-kafka from flink-table and flink-json

2018-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9461:


 Summary: Disentangle flink-connector-kafka from flink-table and 
flink-json
 Key: FLINK-9461
 URL: https://issues.apache.org/jira/browse/FLINK-9461
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.5.0
Reporter: Till Rohrmann
 Fix For: 1.6.0


Currently, the {{flink-connector-kafka}} module has a dependency on 
{{flink-table}} and {{flink-json}}. The reason seems to be that the module 
contains the {{KafkaJsonTableSource}} and {{KafkaJsonTableSink}}. Even though 
the {{flink-table}} and {{flink-json}} dependency are marked as optional, the 
{{flink-connector-kafka}} will still contain the table sources and sinks. I 
think this is not a clean design.

I would propose to move the table sources and sinks into a dedicated module 
which depends on {{flink-connector-kafka}}. That way we would better separate 
dependencies and could remove {{flink-table}} and {{flink-json}} from 
{{flink-connector-kafka}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9462) Disentangle flink-json and flink-table

2018-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9462:


 Summary: Disentangle flink-json and flink-table
 Key: FLINK-9462
 URL: https://issues.apache.org/jira/browse/FLINK-9462
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Table API & SQL
Affects Versions: 1.5.0
Reporter: Till Rohrmann
 Fix For: 1.6.0


The {{flink-json}} module defines Json serialization and deserialization 
schemas. Additionally, it defines Json table descriptor. Due to this, it has a 
dependency on {{flink-table}}. We should either rename this module into 
{{flink-json-table}} or move the table API specific classes into a different 
module. That way we could remove the dependency on {{flink-table}} which 
decouples the Json serialization and deserialization schemas from the Table API 
on which the schemas should not depend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9463) Setting taskmanager.network.netty.transport to epoll

2018-05-29 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-9463:
-

 Summary: Setting taskmanager.network.netty.transport to epoll 
 Key: FLINK-9463
 URL: https://issues.apache.org/jira/browse/FLINK-9463
 Project: Flink
  Issue Type: Bug
  Components: Network
Affects Versions: 1.4.2, 1.4.1, 1.5.0, 1.4.0
Reporter: Piotr Nowojski
Assignee: Piotr Nowojski
 Fix For: 1.6.0


https://github.com/apache/flink-shaded/issues/30 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9464) Clean up pom files

2018-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9464:


 Summary: Clean up pom files
 Key: FLINK-9464
 URL: https://issues.apache.org/jira/browse/FLINK-9464
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.5.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.6.0


Some of Flink module's {{pom.xml}} files contain unnecessary or redundant 
information. For example, the {{flink-clients}} {{pom.xml}} specifies twice the 
{{maven-jar-plugin}} in the build section. Other modules explicitly specify the 
version and scope of the {{flink-test-utils-junit}} module which is managed by 
the parent's dependency management section. I propose to clean these things up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9465) Separate timeout for savepoint and checkpoint

2018-05-29 Thread Truong Duc Kien (JIRA)
Truong Duc Kien created FLINK-9465:
--

 Summary: Separate timeout for savepoint and checkpoint
 Key: FLINK-9465
 URL: https://issues.apache.org/jira/browse/FLINK-9465
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Truong Duc Kien


Savepoint can take much longer time to perform than checkpoint, especially with 
incremental checkpoint enabled. This leads to a couple of troubles:
 * For our job, we currently have to set the checkpoint timeout much large than 
necessary, otherwise we would be unable to perform savepoint. 
 * During rush hour, our cluster would encounter high rate of checkpoint 
timeout due to backpressure, however we're unable to migrate to a larger 
configuration, because savepoint also timeout.

In my opinion, the timeout for savepoint should be configurable separately, 
both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [PROPOSAL] Improving Flink’s timer management for large state

2018-05-29 Thread Aljoscha Krettek
+1

> On 29. May 2018, at 09:34, Till Rohrmann  wrote:
> 
> Thanks for the great design document Stefan. Unifying how Flink handles
> state such that all checkpointable state is maintained by the StateBackend
> makes a lot of sense to me. Also making the timer service scalable and
> adding support for asynchronous checkpoints is really important for many
> Flink users. Consequently +1 for this improvement.
> 
> Cheers,
> Till
> 
> On Mon, May 28, 2018 at 9:45 AM, Stefan Richter > wrote:
> 
>> Thanks for the positive feedback so far!
>> 
>> @Sihua: I totally agree with your comments about improvements in
>> performance for the existing RocksDB timer code. In fact, that is why I
>> phrased it like „ implementation that is loosely based on some ideas“ to
>> point out a solution can look roughly like the existing code but that it
>> probably should *not* be a simple copy-paste. As for the comment of
>> supporting heap timers with RocksDB state, I think there is nothing that
>> speaks fundamentally against it and I think the design can be in a way to
>> support that. It just makes the configuration more complex and we need to
>> slightly „special case“ in incremental checkpoints. I was already wondering
>> if heap timers with RocksDB state could not already become a byproduct of a
>> stepwise implementation, i.e. when the first step of the plan is pushing
>> timer state into the backends and there does not yet exist a RocksDB timer
>> state.
>> 
>>> Am 27.05.2018 um 10:03 schrieb sihua zhou :
>>> 
>>> 
>>> 
>>> I also +1 for this very good proposal!
>>> In general, the design is good, especially the part the related to the
>> timer on Heap, but refer to the part of the timer on RocksDB, I think there
>> may still exist some improvement that we can do, I just left the comments
>> on the doc.
>>> 
>>> 
>>> Best, Sihua
>>> 
>>> 
>>> 
>>> 
>>> On 05/27/2018 15:31,Bowen Li wrote:
>>> +1 LGTM. RocksDB timer service is one of the most highly anticipated
>>> features from Flink users, and it's finally coming, officially. I also
>>> would love to see bringing timer more closely to state backend, for the
>>> sake of easier development and maintenance of code.
>>> 
>>> On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <
>> s.rich...@data-artisans.com
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am currently planning how to improve Flink’s timer management for large
>>> state. In particular, I would like to introduce timer state that is
>> managed
>>> in RocksDB and also to improve the capabilities of the heap-based timer
>>> service, e.g. support for asynchronous checkpoints. You can find a short
>>> outline of my planned approach in this document:
>>> 
>>> https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
>>> 26Pz04EvxdA7Jc/edit?usp=sharing
>>> 
>>> As always, your questions, feedback, and comments are highly appreciated.
>>> 
>>> Best,
>>> Stefan
>> 
>> 



Re: [DISCUSS] Adding new interfaces in [Stream]ExecutionEnvironment

2018-05-29 Thread Till Rohrmann
I see Shuyi's point that it would nice to allow adding jar files which
should be part of the user code classloader programmatically. Actually, we
expose this functionality in the `RemoteEnvironment` where you can specify
additional jars which shall be shipped to the cluster in the constructor. I
assume that is exactly the functionality you are looking for. In that
sense, it might be an API inconsistency that we allow it for some cases and
for others not.

But I could also see that the whole functionality of dynamically loading
jars at runtime could also perfectly live in the `UdfSqlOperator`. This, of
course, would entail that one has to take care of clean up of the
downloaded resources. But it should be possible to first download the
resources and create a custom URLClassLoader at startup and then use this
class loader when calling into the UDF.

Cheers,
Till

On Wed, May 16, 2018 at 9:28 PM, Shuyi Chen  wrote:

> Hi Aljoscha, Fabian, Rong, Ted and Timo,
>
> Thanks a lot for the feedback. Let me clarify the usage scenario in a bit
> more detail. The context is that we want to add support for SQL DDL to load
> UDF from external JARs located either in local filesystem or HDFS or a HTTP
> endpoint in Flink SQL. The local FS option is more for debugging purpose
> for user to submit the job jar locally, and the later 2 are for production
> uses. Below is an example User application with the *CREATE FUNCTION* DDL
> (Note: grammar and interface not finalized yet).
>
> 
> -
>
>
>
>
> *val env = StreamExecutionEnvironment.getExecutionEnvironmentval tEnv =
> TableEnvironment.getTableEnvironment(env)// setup the DataStream//..*
>
>
>
>
>
>
>
>
>
>
>
> *// register the DataStream under the name
> "OrderAtEnv.registerDataStream("OrderA", orderA, 'user, 'product,
> 'amount)tEnv.sqlDDL(  "create function helloFunc as
> 'com.example.udf.HelloWorld' using jars
> ('hdfs:///users/david/libraries/my-udf-1.0.1-SNAPSHOT.jar')")val result =
> tEnv.sqlQuery(  "SELECT user, helloFunc(product), amount FROM OrderA WHERE
> amount > 2")result.toAppendStream[Order].print()env.execute()*
> 
> -
>
> The example application above does the following:
> 1) it registers a DataStream as a Calcite table(
> *org.apache.calcite.schema.Table*) under name "OrderA", so SQL can
> reference the DataStream as table "OrderA".
> 2) it uses the SQL *CREATE FUNCTION* DDL (grammar and interface not
> finalized yet) to create a SQL UDF called *helloFunc* from a JAR located in
> a remote HDFS path.
> 3) it issues a sql query that uses the *helloFunc* UDF defined above and
> generate a Flink table (*org.apache.flink.table.api.Table*)
> 4) it convert the Flink table back to a DataStream and print it.
>
> Step 1), 3), and 4) are already implemented. To implement 2), we need to do
> the following to implement the *tEnv.sqlDDL()* function.
>
> a) parse the DDL into a SqlNode to extract the UDF *udfClasspath*, UDF
> remote path *udfUrls[]* and UDF SQL name *udfName*.
> b) use the URLClassLoader to load the JARs specified in *udfUrls[]*, and
> register the SQL UDF using the {Batch/Stream/}TableEnvironment
> registerFunction methods using*  udfClasspath* under name *udfName.*
> c) register the JARs *udfUrls[]* through the {Stream}ExecutionEnvironment,
> so that the JARs can be distributed to all the TaskManagers during runtime.
>
>
> Since the CREATE FUNCTION DDL is executed within the user application, I
> dont think we have access to the ClusterClient at the point when
> *tEnv.sqlDDL()* is executed. Also the JARs can be in a remote filesystem
> (which is the main usage scenarios), so the user can't really prepare the
> jar somehow in advance statically.
>
> For normal user application, I think {Stream}ExecutionEnvironment is the
> right place for the functionality, since it provides methods to control the
> job execution and to interact with the outside world, and also, it actually
> already does similar things provided through the *registerCachedFile*
> interface.
>
> However, in such case, SQL FUNCTION DDL and SQL client will use 2 different
> routes to register UDF jars, one through *JobGraph.jobConfiguration* and
> the other through *JobGraph.userJars*. So *maybe we can, as Fabian
> suggests, add **registerUserJarFile()/getUserJarFiles() interfaces
> in {Stream}ExecutionEnvironment, which stores the jars internally in a
> List, and when generating JobGraph, copy the jars to the JobGraph using
> the  {Stream}ExecutionEnvironment.getUserJarFiles() and
> JobGraph.addJar()* (Note,
> streaming and batch implementations might vary). In such case, both SQL
> FUNCTION DDL and SQL client will use *JobGraph.userJars* to ship the UDF
> jars.
>
> Hope that clarifies better. What do you guys think? Thanks a lot.
>
> Cheers!
> Shuyi
>
> On Wed, May 16, 2018 at 9:45 AM, Rong Rong  

[jira] [Created] (FLINK-9466) LocalRecoveryRocksDBFullITCase failed on Travis

2018-05-29 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9466:


 Summary: LocalRecoveryRocksDBFullITCase failed on Travis
 Key: FLINK-9466
 URL: https://issues.apache.org/jira/browse/FLINK-9466
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing, Tests
Affects Versions: 1.6.0
Reporter: Till Rohrmann
 Fix For: 1.6.0


The {{LocalRecoveryRocksDBFullITCase}} failed on Travis: 
https://api.travis-ci.org/v3/job/385097117/log.txt.

Not sure what caused the failure where the window computes a wrong result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9467) No Watermark display on Web UI

2018-05-29 Thread Truong Duc Kien (JIRA)
Truong Duc Kien created FLINK-9467:
--

 Summary: No Watermark display on Web UI
 Key: FLINK-9467
 URL: https://issues.apache.org/jira/browse/FLINK-9467
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Truong Duc Kien


Watermark is currently not shown on the web interface,  because it still 
queries for watermark using the old metric name `currentLowWatermark` instead 
of the new ones `currentInputWatermark` and `currentOutputWatermark`

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Pull request failing Travis IO test for unknown reason

2018-05-29 Thread jpcarterara
I am new to committing code to a git repo, so I apologize if I ask anything 
really obvious. 

So I have submitted a pull request for Flink and the TravisIO tests all pass 
except ones where TEST=“misc”. Looking through the output of the tests I can’t 
figure out what is wrong. There were tests that failed before because of 
improper formatting and the like, but I fixed those and the relevant tests 
passed. So I am at a bit of a loss with what I am missing now. 

Here is the TravisIO link: https://travis-ci.org/Jicaar/flink/builds/374488723 .

Thanks for any help in advance!


Re: Pull request failing Travis IO test for unknown reason

2018-05-29 Thread Dawid Wysakowicz
Hi,

The interesting line is:

17:43:48.333 [ERROR] Failed to execute goal 
org.apache.rat:apache-rat-plugin:0.12:check (default) on project flink-parent: 
Too many files with unapproved license: 1 See RAT report in: 
/home/travis/build/Jicaar/flink/target/rat.txt -> [Help 1]

It basically means that you've added a new file without ASF header. If
you run this command locally:

mvn -DskipTests verify 

you can check in the rat report file, which file violated that rule.

Best,
Dawid

On 29/05/18 14:18, jpcarter...@gmail.com wrote:
> I am new to committing code to a git repo, so I apologize if I ask anything 
> really obvious. 
>
> So I have submitted a pull request for Flink and the TravisIO tests all pass 
> except ones where TEST=“misc”. Looking through the output of the tests I 
> can’t figure out what is wrong. There were tests that failed before because 
> of improper formatting and the like, but I fixed those and the relevant tests 
> passed. So I am at a bit of a loss with what I am missing now. 
>
> Here is the TravisIO link: 
> https://travis-ci.org/Jicaar/flink/builds/374488723 .
>
> Thanks for any help in advance!




signature.asc
Description: OpenPGP digital signature


Re: Pull request failing Travis IO test for unknown reason

2018-05-29 Thread Chesnay Schepler
There are some tests that are unstable. Committers will evaluate whether 
the failure is related to the PR.

Please do not push commits for the sake of getting that perfect green build.

In your particular case you've added files that do not have an apache 
license header:


17:43:48.333 [ERROR] Failed to execute goal 
org.apache.rat:apache-rat-plugin:0.12:check (default) on project 
flink-parent: Too many files with unapproved license: 1 See RAT report 
in: /home/travis/build/Jicaar/flink/target/rat.txt -> [Help 1]


You can run mvn validate locally which will look for missing license 
headers.


On 29.05.2018 14:18, jpcarter...@gmail.com wrote:

I am new to committing code to a git repo, so I apologize if I ask anything 
really obvious.

So I have submitted a pull request for Flink and the TravisIO tests all pass 
except ones where TEST=“misc”. Looking through the output of the tests I can’t 
figure out what is wrong. There were tests that failed before because of 
improper formatting and the like, but I fixed those and the relevant tests 
passed. So I am at a bit of a loss with what I am missing now.

Here is the TravisIO link: https://travis-ci.org/Jicaar/flink/builds/374488723 .

Thanks for any help in advance!





[jira] [Created] (FLINK-9468) get outputLimit of LimitedConnectionsFileSystem incorrectly

2018-05-29 Thread Sihua Zhou (JIRA)
Sihua Zhou created FLINK-9468:
-

 Summary: get outputLimit of LimitedConnectionsFileSystem 
incorrectly
 Key: FLINK-9468
 URL: https://issues.apache.org/jira/browse/FLINK-9468
 Project: Flink
  Issue Type: Bug
  Components: FileSystem
Affects Versions: 1.5.0
Reporter: Sihua Zhou
Assignee: Sihua Zhou
 Fix For: 1.6.0, 1.5.1


In {{LimitedConnectionsFileSystem#createStream}}, we get the outputLimit 
incorrectly.
{code:java}
private  T createStream(
final SupplierWithException streamOpener,
final HashSet openStreams,
final boolean output) throws IOException {

final int outputLimit = output && maxNumOpenInputStreams > 0 ? 
maxNumOpenOutputStreams : Integer.MAX_VALUE;
/**/
}
{code}

should be 
{code:java}
private  T createStream(
final SupplierWithException streamOpener,
final HashSet openStreams,
final boolean output) throws IOException {

final int outputLimit = output && maxNumOpenOutputStreams > 0 ? 
maxNumOpenOutputStreams : Integer.MAX_VALUE;
/**/
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9469) Add tests that cover PatternStream#flatSelect

2018-05-29 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-9469:
---

 Summary: Add tests that cover PatternStream#flatSelect
 Key: FLINK-9469
 URL: https://issues.apache.org/jira/browse/FLINK-9469
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9470) Allow querying the key in KeyedProcessFunction

2018-05-29 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-9470:
---

 Summary: Allow querying the key in KeyedProcessFunction
 Key: FLINK-9470
 URL: https://issues.apache.org/jira/browse/FLINK-9470
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 1.6.0


{{KeyedProcessFunction.OnTimerContext}} allows querying the key of the firing 
timer while {{KeyedProcessFunction.Context}} does not allow querying the key of 
the event we're currently processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [jira] [Commented] (FLINK-8353) Add support for timezones

2018-05-29 Thread Kaibo Zhou
Time zone is a very useful feature, I think there are three levels of time
zone settings (priority from high to low):
1. connectors: For example, the time zone of the time field in the kafaka
data
2. job level: Specifies which time zone the current job uses, perhaps
specified by TableConfig or StreamQueryConfig
3. The system default, effective for all jobs, can be specified by
flink_conf.yaml

2018-05-29 22:52 GMT+08:00 Weike Dong (JIRA) :

>
> [ https://issues.apache.org/jira/browse/FLINK-8353?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16493639#comment-16493639 ]
>
> Weike Dong commented on FLINK-8353:
> ---
>
> I strongly support these features, preferably there could be a way to set
> a specific timezone for a particular job, so that all the subsequent
> temporal processing could be based on that. As user's input data are often
> collected from other systems that do not follow the rules set by Flink
> (UTC+0), currently some temporal UDFs are needed to perform such
> transformations, which adds the complexity for the whole system, especially
> in case of watermark generation or output of processing time into external
> database, etc.
>
> > Add support for timezones
> > -
> >
> > Key: FLINK-8353
> > URL: https://issues.apache.org/jira/browse/FLINK-8353
> > Project: Flink
> >  Issue Type: New Feature
> >  Components: Table API & SQL
> >Reporter: Timo Walther
> >Priority: Major
> >
> > This is an umbrella issue for adding support for timezones in the Table
> & SQL API.
> > Usually companies work with different timezones simultaneously. We could
> add support for the new time classes introduced with Java 8 and enable our
> scalar functions to also work with those (or some custom time class
> implementations like those from Calcite). We need a good design for this to
> address most of the problems users face related to timestamp and timezones.
> > It is up for discussion how to ship date, time, timestamp instances
> through the cluster.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


[jira] [Created] (FLINK-9471) Job ending exceptions being logged at Info level

2018-05-29 Thread SUBRAMANYA SURESH (JIRA)
SUBRAMANYA SURESH created FLINK-9471:


 Summary: Job ending exceptions being logged at Info level
 Key: FLINK-9471
 URL: https://issues.apache.org/jira/browse/FLINK-9471
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.4.2
Reporter: SUBRAMANYA SURESH


We are using Flink SQL, I see job ending logs that are logged at info level, 
that makes it very hard for me to tune out the Info messages in the 
configuration. Note: If I do end up using Info, the same executionGraph logs 
the entire query for the operationGraph for every info statement, and this 
fills up the logs easily if we have say 100-200 queries. 

Note the "-" below indicate an entire line of execution graph for this query 
(redacted for privacy). 



2018-03-30 03:32:09,943 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Discarding 
checkpoint 1 because: java.lang.Exception: Could not materialize checkpoint 1 
for operator Source: Custom Source -> (Map -> where: (AND(=- 

- 

- 

- 

-.}

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator 
Source: Custom Source -> (Map -> where: (AND(=(Environment, 
_UTF-16LE'SFDC-IT'), =(RuleMatch, _UTF-16LE'SFA'), =(LogType, 
_UTF-16LE'SAML-AUTH'), =(Outcome, _UTF-16LE'DENY'))), select: (proctime, 
CAST(_UTF-16LE'SFDC-IT') AS Environment, CollectedTimestamp, EventTimestamp, 
_raw, Aggregator), Map -> where: (AND(=(Environment, _UTF-16LE'SFDC-IT'), 
=(RuleMatch, _UTF-16LE'SFA'), =(LogType, _UTF-16LE'SAML-AUTH'), =(Outcome, 
_UTF-16LE'DENY'))), select: (proctime, CAST(_UTF-16LE'SFDC-IT') AS Environment, 
CollectedTimestamp, EventTimestamp, _raw, Aggregator)) (353/725).

... 6 more

Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Could 
not flush and close the file system output stream to 
hdfs://security-temp/savedSearches/checkpoint/561eb649376bef2f2d8daa1e3a0fa6db/chk-1/31b94717-9e6d-49b8-b64d-2a1a8ba04425
 in order to obtain the stream state handle

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43)

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:892)

... 5 more

Suppressed: java.lang.Exception: Could not properly cancel managed operator 
state future.

at 
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:99)

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976)

at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939)

... 5 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9472) Maven archetype code has typo in documentation

2018-05-29 Thread Alexey Tsitkin (JIRA)
Alexey Tsitkin created FLINK-9472:
-

 Summary: Maven archetype code has typo in documentation
 Key: FLINK-9472
 URL: https://issues.apache.org/jira/browse/FLINK-9472
 Project: Flink
  Issue Type: Bug
Reporter: Alexey Tsitkin


The word application is misspelled (as `appliation`) in java/scala code 
documentation in the maven archetype.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Pull request failing Travis IO test for unknown reason

2018-05-29 Thread Rong Rong
+1. you can also run `mvn clean verify` locally before a PR, or `mvn verify
-o` against the module you modified (this will speed up the build)

On Tue, May 29, 2018 at 5:34 AM, Chesnay Schepler 
wrote:

> There are some tests that are unstable. Committers will evaluate whether
> the failure is related to the PR.
> Please do not push commits for the sake of getting that perfect green
> build.
>
> In your particular case you've added files that do not have an apache
> license header:
>
> 17:43:48.333 [ERROR] Failed to execute goal 
> org.apache.rat:apache-rat-plugin:0.12:check
> (default) on project flink-parent: Too many files with unapproved license:
> 1 See RAT report in: /home/travis/build/Jicaar/flink/target/rat.txt ->
> [Help 1]
>
> You can run mvn validate locally which will look for missing license
> headers.
>
>
> On 29.05.2018 14:18, jpcarter...@gmail.com wrote:
>
>> I am new to committing code to a git repo, so I apologize if I ask
>> anything really obvious.
>>
>> So I have submitted a pull request for Flink and the TravisIO tests all
>> pass except ones where TEST=“misc”. Looking through the output of the tests
>> I can’t figure out what is wrong. There were tests that failed before
>> because of improper formatting and the like, but I fixed those and the
>> relevant tests passed. So I am at a bit of a loss with what I am missing
>> now.
>>
>> Here is the TravisIO link: https://travis-ci.org/Jicaar/f
>> link/builds/374488723 .
>>
>> Thanks for any help in advance!
>>
>>
>


[jira] [Created] (FLINK-9473) Compilation fails after upgrade to Calcite 1.17

2018-05-29 Thread Sergey Nuyanzin (JIRA)
Sergey Nuyanzin created FLINK-9473:
--

 Summary: Compilation fails after upgrade to Calcite 1.17
 Key: FLINK-9473
 URL: https://issues.apache.org/jira/browse/FLINK-9473
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin


{noformat}
/apacheFlink/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/catalog/ExternalCatalogSchema.scala:40:
 error: class ExternalCatalogSchema needs to be abstract, since:
[ERROR] it has 2 unimplemented members.
[ERROR] /** As seen from class ExternalCatalogSchema, the missing signatures 
are as follows.
[ERROR]  *  For convenience, these are usable as stub implementations.
[ERROR]  */
[ERROR]   def getType(x$1: String): 
org.apache.calcite.rel.type.RelProtoDataType = ???
[ERROR]   def getTypeNames(): java.util.Set[String] = ???
[ERROR] 
[ERROR] class ExternalCatalogSchema(
[ERROR]   ^
[WARNING] two warnings found
[ERROR] one error found

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9474) Introduce an approximate version of "count distinct"

2018-05-29 Thread Sihua Zhou (JIRA)
Sihua Zhou created FLINK-9474:
-

 Summary: Introduce an approximate version of "count distinct"
 Key: FLINK-9474
 URL: https://issues.apache.org/jira/browse/FLINK-9474
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Sihua Zhou
Assignee: Sihua Zhou


We can implement an approximate version of "count distinct" base on the 
"Elastic Bloom Filter", It could be very fast because we don't need to query 
the state anymore, its accuracy should could be configurable. e.g 95%, 98%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-05-29 Thread sihua zhou
Hi,


I did a survey of the variants of Bloom Filter and the Cuckoo filter these 
days. Finally, I found 3 of them maybe adaptable for our purpose.


1. standard bloom filter (which we have implemented base on this and used it on 
production with a good experience)
2. cuckoo filter, also a very good filter which is a space-efficient data 
structures and support fast query(even faster then BF, but the insert maybe a 
little slower than BF), addtional it support delete() operation.
3. count bloom filter, a variant of BF, it supports delete()operation, but need 
to cost 4-5x memory than the standard bloom filter(so, I'm not sure whether 
it's adaptable in practice).


Anyway, these filters are just the smallest storage unit in this "Elastic Bloom 
Filter", we can define a general interface, and provide different 
implementation of "storage unit"  base on different filter if we want. Maybe I 
should change the PROPOSAL name to the "Introduce Elastic Filter For Flink", 
the ideal of approach that I outlined in the doc is very similar to the paper 
"Optimization and Applications of Dynamic Bloom 
Filters(http://ijarcs.info/index.php/Ijarcs/article/viewFile/826/814)"(compare 
to the paper, the approach I outlined could have a better query performance and 
also support the RELAXED TTL), maybe it can help to understand the desgin doc. 
Looking forward any feedback!


Best, Sihua
On 05/24/2018 10:36,sihua zhou wrote:
Hi,
Thanks for your suggestions @Elias! I have a brief look at "Cuckoo Filter" and 
"Golumb Compressed Sequence", my first sensation is that maybe "Golumc 
Compressed Sequence" is not a good choose, because it seems to require 
non-constant lookup time, but Cuckoo Filter maybe a good choose, I should 
definitely have a deeper look at it.


Beside, to me, all of this filters seems to a "variant" of the bloom 
filter(which is the smallest unit to store data in the current desgin), the 
main challenge for introducing BF into flink is the data skewed(which is common 
phenomenon on production) problem, could you maybe also have a look at the 
solution that I posted on the google doc 
https://docs.google.com/document/d/17UY5RZ1mq--hPzFx-LfBjCAw_kkoIrI9KHovXWkxNYY/edit?usp=sharing
 for this problem, It would be nice if you could give us some advice on that.


Best, Sihua


On 05/24/2018 07:21,Elias Levy wrote:
I would suggest you consider an alternative data structures: a Cuckoo
Filter or a Golumb Compressed Sequence.

The GCS data structure was introduced in Cache-, Hash- and Space-Efficient
Bloom Filters
 by
F. Putze, P. Sanders, and J. Singler.  See section 4.



We should discuss which exact implementation of bloom filters are the best
fit.
@Fabian: There are also implementations of bloom filters that use counting
and therefore support
deletes, but obviously this comes at the cost of a potentially higher
space consumption.

Am 23.05.2018 um 11:29 schrieb Fabian Hueske :
IMO, such a feature would be very interesting. However, my concerns with
Bloom Filter
is that they are insert-only data structures, i.e., it is not possible to
remove keys once
they were added. This might render the filter useless over time.
In a different thread (see discussion in FLINK-8918 [1]), you mentioned
that the Bloom
Filters would be growing.
If we keep them in memory, how can we prevent them from exceeding memory
boundaries over
time?




[jira] [Created] (FLINK-9475) introduce an approximate version of "select distinct"

2018-05-29 Thread Sihua Zhou (JIRA)
Sihua Zhou created FLINK-9475:
-

 Summary: introduce an approximate version of "select distinct"
 Key: FLINK-9475
 URL: https://issues.apache.org/jira/browse/FLINK-9475
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Affects Versions: 1.5.0
Reporter: Sihua Zhou
Assignee: Sihua Zhou


Base on the "Elastic Bloom Filter", it easy to implement an approximate version 
of "select distinct" that have an excellent performance. Its accuracy should be 
configurable, e.g. 95%, 98%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9476) Lost sideOutPut Late Elements in CEP Operator

2018-05-29 Thread aitozi (JIRA)
aitozi created FLINK-9476:
-

 Summary: Lost sideOutPut Late Elements in CEP Operator
 Key: FLINK-9476
 URL: https://issues.apache.org/jira/browse/FLINK-9476
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Affects Versions: 1.4.2
Reporter: aitozi
Assignee: aitozi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9477) Support SQL 2016 JSON functions in Flink SQL

2018-05-29 Thread Shuyi Chen (JIRA)
Shuyi Chen created FLINK-9477:
-

 Summary: Support SQL 2016 JSON functions in Flink SQL
 Key: FLINK-9477
 URL: https://issues.apache.org/jira/browse/FLINK-9477
 Project: Flink
  Issue Type: New Feature
  Components: Table API & SQL
Reporter: Shuyi Chen
Assignee: Shuyi Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)