Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi,


no more feedbacks these days...I guess it's because you guys are too busy and 
since I didn't receive any negative feedbacks and there're already some 
positive feedbacks. So I want to implement this *Elastic Bloom Filter* based on 
the current design doc(because I have time to do it currently), even though I 
think the design can be improved definitely, but maybe we could discuss the 
improvement better base on the code, and I believe most of the code could be 
cherry picked for the "final implementation". Does anyone object this?


Best, Sihua




On 06/6/2018 22:02,sihua zhou wrote:
Hi,


Sorry, but pinging for more feedbacks on this proposal...
Even the negative feedbacks is highly appreciated!


Best, Sihua






On 05/30/2018 13:19,sihua zhou wrote:
Hi,


I did a survey of the variants of Bloom Filter and the Cuckoo filter these 
days. Finally, I found 3 of them maybe adaptable for our purpose.


1. standard bloom filter (which we have implemented base on this and used it on 
production with a good experience)
2. cuckoo filter, also a very good filter which is a space-efficient data 
structures and support fast query(even faster then BF, but the insert maybe a 
little slower than BF), addtional it support delete() operation.
3. count bloom filter, a variant of BF, it supports delete()operation, but need 
to cost 4-5x memory than the standard bloom filter(so, I'm not sure whether 
it's adaptable in practice).


Anyway, these filters are just the smallest storage unit in this "Elastic Bloom 
Filter", we can define a general interface, and provide different 
implementation of "storage unit"  base on different filter if we want. Maybe I 
should change the PROPOSAL name to the "Introduce Elastic Filter For Flink", 
the ideal of approach that I outlined in the doc is very similar to the paper 
"Optimization and Applications of Dynamic Bloom 
Filters(http://ijarcs.info/index.php/Ijarcs/article/viewFile/826/814)"(compare 
to the paper, the approach I outlined could have a better query performance and 
also support the RELAXED TTL), maybe it can help to understand the desgin doc. 
Looking forward any feedback!


Best, Sihua
On 05/24/2018 10:36,sihua zhou wrote:
Hi,
Thanks for your suggestions @Elias! I have a brief look at "Cuckoo Filter" and 
"Golumb Compressed Sequence", my first sensation is that maybe "Golumc 
Compressed Sequence" is not a good choose, because it seems to require 
non-constant lookup time, but Cuckoo Filter maybe a good choose, I should 
definitely have a deeper look at it.


Beside, to me, all of this filters seems to a "variant" of the bloom 
filter(which is the smallest unit to store data in the current desgin), the 
main challenge for introducing BF into flink is the data skewed(which is common 
phenomenon on production) problem, could you maybe also have a look at the 
solution that I posted on the google doc 
https://docs.google.com/document/d/17UY5RZ1mq--hPzFx-LfBjCAw_kkoIrI9KHovXWkxNYY/edit?usp=sharing
 for this problem, It would be nice if you could give us some advice on that.


Best, Sihua


On 05/24/2018 07:21,Elias Levy wrote:
I would suggest you consider an alternative data structures: a Cuckoo
Filter or a Golumb Compressed Sequence.

The GCS data structure was introduced in Cache-, Hash- and Space-Efficient
Bloom Filters
 by
F. Putze, P. Sanders, and J. Singler.  See section 4.



We should discuss which exact implementation of bloom filters are the best
fit.
@Fabian: There are also implementations of bloom filters that use counting
and therefore support
deletes, but obviously this comes at the cost of a potentially higher
space consumption.

Am 23.05.2018 um 11:29 schrieb Fabian Hueske :
IMO, such a feature would be very interesting. However, my concerns with
Bloom Filter
is that they are insert-only data structures, i.e., it is not possible to
remove keys once
they were added. This might render the filter useless over time.
In a different thread (see discussion in FLINK-8918 [1]), you mentioned
that the Bloom
Filters would be growing.
If we keep them in memory, how can we prevent them from exceeding memory
boundaries over
time?




Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread Fabian Hueske
Hi Sihua,

Sorry for not replying earlier.
I have one question left. If I understood the design of the linked
Bloomfilter nodes right, users would need to configure a TTL to be able to
remove a node.
When nodes are removed, we would need to insert every key into the current
node which would not be required if we don't remove nodes, right?

>From the small summary of approximated filters, cuckoo filters seem to be
most appropriate as they also support deletes.
Are you aware of any downsides compared to bloom filters (besides
potentially slower inserts)?

Best, Fabian



2018-06-12 9:29 GMT+02:00 sihua zhou :

> Hi,
>
> no more feedbacks these days...I guess it's because you guys are too busy
> and since I didn't receive any negative feedbacks and there're already some
> positive feedbacks. So I want to implement this *Elastic Bloom Filter*
> based on the current design doc(because I have time to do it currently),
> even though I think the design can be improved definitely, but maybe we
> could discuss the improvement better base on the code, and I believe most
> of the code could be cherry picked for the "final implementation". Does
> anyone object this?
>
> Best, Sihua
>
>
> On 06/6/2018 22:02,sihua zhou 
> wrote:
>
> Hi,
>
>
> Sorry, but pinging for more feedbacks on this proposal...
> Even the negative feedbacks is highly appreciated!
>
>
> Best, Sihua
>
>
>
>
>
>
> On 05/30/2018 13:19,sihua zhou wrote:
> Hi,
>
>
> I did a survey of the variants of Bloom Filter and the Cuckoo filter these
> days. Finally, I found 3 of them maybe adaptable for our purpose.
>
>
> 1. standard bloom filter (which we have implemented base on this and used
> it on production with a good experience)
> 2. cuckoo filter, also a very good filter which is a space-efficient data
> structures and support fast query(even faster then BF, but the insert maybe
> a little slower than BF), addtional it support delete() operation.
> 3. count bloom filter, a variant of BF, it supports delete()operation, but
> need to cost 4-5x memory than the standard bloom filter(so, I'm not sure
> whether it's adaptable in practice).
>
>
> Anyway, these filters are just the smallest storage unit in this "Elastic
> Bloom Filter", we can define a general interface, and provide different
> implementation of "storage unit"  base on different filter if we want.
> Maybe I should change the PROPOSAL name to the "Introduce Elastic Filter
> For Flink", the ideal of approach that I outlined in the doc is very
> similar to the paper "Optimization and Applications of Dynamic Bloom
> Filters(http://ijarcs.info/index.php/Ijarcs/article/viewFile/826/814)"(compare
> to the paper, the approach I outlined could have a better query performance
> and also support the RELAXED TTL), maybe it can help to understand the
> desgin doc. Looking forward any feedback!
>
>
> Best, Sihua
> On 05/24/2018 10:36,sihua zhou wrote:
> Hi,
> Thanks for your suggestions @Elias! I have a brief look at "Cuckoo Filter"
> and "Golumb Compressed Sequence", my first sensation is that maybe "Golumc
> Compressed Sequence" is not a good choose, because it seems to require
> non-constant lookup time, but Cuckoo Filter maybe a good choose, I should
> definitely have a deeper look at it.
>
>
> Beside, to me, all of this filters seems to a "variant" of the bloom
> filter(which is the smallest unit to store data in the current desgin), the
> main challenge for introducing BF into flink is the data skewed(which is
> common phenomenon on production) problem, could you maybe also have a look
> at the solution that I posted on the google doc https://docs.google.com/
> document/d/17UY5RZ1mq--hPzFx-LfBjCAw_kkoIrI9KHovXWkxNYY/edit?usp=sharing
> for this problem, It would be nice if you could give us some advice on that.
>
>
> Best, Sihua
>
>
> On 05/24/2018 07:21,Elias Levy wrote:
> I would suggest you consider an alternative data structures: a Cuckoo
> Filter or a Golumb Compressed Sequence.
>
> The GCS data structure was introduced in Cache-, Hash- and Space-Efficient
> Bloom Filters
>  by
> F. Putze, P. Sanders, and J. Singler.  See section 4.
>
>
>
> We should discuss which exact implementation of bloom filters are the best
> fit.
> @Fabian: There are also implementations of bloom filters that use counting
> and therefore support
> deletes, but obviously this comes at the cost of a potentially higher
> space consumption.
>
> Am 23.05.2018 um 11:29 schrieb Fabian Hueske :
> IMO, such a feature would be very interesting. However, my concerns with
> Bloom Filter
> is that they are insert-only data structures, i.e., it is not possible to
> remove keys once
> they were added. This might render the filter useless over time.
> In a different thread (see discussion in FLINK-8918 [1]), you mentioned
> that the Bloom
> Filters would be growing.
> If we keep them in memory, how can we prevent them from exceeding memory
> boundaries over
> time?
>
>
>


Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi Fabian,


Thanks a lot for your reply, you are right that users would need to configure a 
TTL for the Elastic Filter to recycle the memory resource.


For every Linked BloomFilter Nodes, only the head node is writable, the other 
nodes are all full, they are only immutable(only readable, we implement the 
relaxed ttl based on this feature). Even though we don't  need to remove the 
node, we still always need to insert the data into the current node(the head 
node), because the node is allocated lazily(to handle data skew), each node's a 
can only store "a part" of the data, once the current node is full, we allocate 
a new head node.


Concerning to the cuckoo filters, I also think it seem to be most appropriate 
in theroy. But there are some reasons that I prefer to implement this based on 
BF as the first interation.


- I didn't find a open source lib that provide the a "stable" cuckoo filter, 
maybe we need to implement it ourself, it's not a trivial work.


- The most attraction that cuckoo filter provided is that it support deletion, 
but since the cuckoo filter is a dense data structure, we can't store the time 
stamp with the record in cuckoo filter, we may need to depend on the "extra 
thing"(e.g. timer) to use it's deletion, the performance overhead may not cheap.


- No matter it's cuckoo filter or bloom fiter, they both seems as the "smallest 
storage unit" in the "Elastic Filter", after we provide a implementation base 
on Bloom Filter, it easily to extend to cuckoo filter.


How about to provide the Elastic Filter based on BF as the first iteration and 
provide the version that based on cuckoo filter as a second iteration? What do 
you think?


Best, Sihua
On 06/12/2018 15:43,Fabian Hueske wrote:
Hi Sihua,


Sorry for not replying earlier.

I have one question left. If I understood the design of the linked Bloomfilter 
nodes right, users would need to configure a TTL to be able to remove a node.

When nodes are removed, we would need to insert every key into the current node 
which would not be required if we don't remove nodes, right?


From the small summary of approximated filters, cuckoo filters seem to be most 
appropriate as they also support deletes.
Are you aware of any downsides compared to bloom filters (besides potentially 
slower inserts)?


Best, Fabian







2018-06-12 9:29 GMT+02:00 sihua zhou :

Hi,


no more feedbacks these days...I guess it's because you guys are too busy and 
since I didn't receive any negative feedbacks and there're already some 
positive feedbacks. So I want to implement this *Elastic Bloom Filter* based on 
the current design doc(because I have time to do it currently), even though I 
think the design can be improved definitely, but maybe we could discuss the 
improvement better base on the code, and I believe most of the code could be 
cherry picked for the "final implementation". Does anyone object this?


Best, Sihua




On 06/6/2018 22:02,sihua zhou wrote:
Hi,


Sorry, but pinging for more feedbacks on this proposal...
Even the negative feedbacks is highly appreciated!


Best, Sihua






On 05/30/2018 13:19,sihua zhou wrote:
Hi,


I did a survey of the variants of Bloom Filter and the Cuckoo filter these 
days. Finally, I found 3 of them maybe adaptable for our purpose.


1. standard bloom filter (which we have implemented base on this and used it on 
production with a good experience)
2. cuckoo filter, also a very good filter which is a space-efficient data 
structures and support fast query(even faster then BF, but the insert maybe a 
little slower than BF), addtional it support delete() operation.
3. count bloom filter, a variant of BF, it supports delete()operation, but need 
to cost 4-5x memory than the standard bloom filter(so, I'm not sure whether 
it's adaptable in practice).


Anyway, these filters are just the smallest storage unit in this "Elastic Bloom 
Filter", we can define a general interface, and provide different 
implementation of "storage unit"  base on different filter if we want. Maybe I 
should change the PROPOSAL name to the "Introduce Elastic Filter For Flink", 
the ideal of approach that I outlined in the doc is very similar to the paper 
"Optimization and Applications of Dynamic Bloom 
Filters(http://ijarcs.info/index.php/Ijarcs/article/viewFile/826/814)"(compare 
to the paper, the approach I outlined could have a better query performance and 
also support the RELAXED TTL), maybe it can help to understand the desgin doc. 
Looking forward any feedback!


Best, Sihua
On 05/24/2018 10:36,sihua zhou wrote:
Hi,
Thanks for your suggestions @Elias! I have a brief look at "Cuckoo Filter" and 
"Golumb Compressed Sequence", my first sensation is that maybe "Golumc 
Compressed Sequence" is not a good choose, because it seems to require 
non-constant lookup time, but Cuckoo Filter maybe a good choose, I should 
definitely have a deeper look at it.


Beside, to me, all of this filters seems to a "variant" of the bloom 
filter(which

[jira] [Created] (FLINK-9569) Confusing construction of AvroSerializers for generic records

2018-06-12 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-9569:
--

 Summary: Confusing construction of AvroSerializers for generic 
records
 Key: FLINK-9569
 URL: https://issues.apache.org/jira/browse/FLINK-9569
 Project: Flink
  Issue Type: Improvement
  Components: Type Serialization System
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


The {{AvroSerializer}} currently has a {{AvroSerializer(Class type, Schema 
schema)}} public constructor when used for generic records.

This is a bit confusing, because when using the \{{AvroSerializer}}, the type 
to be serialized should always be a {{GenericData.Record}} type.

We should either:

- have a separate subclass of {{AvroSerializer}}, say 
{{GenericRecordAvroSerializer}} that is a {{AvroSerializer}}, or
- follow a similar approach to the instantiation methods in the 
{{AvroDeserialiationSchema}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9570) SQL Client merging environments uses AbstractMap

2018-06-12 Thread JIRA
Dominik Wosiński created FLINK-9570:
---

 Summary: SQL Client merging environments uses AbstractMap
 Key: FLINK-9570
 URL: https://issues.apache.org/jira/browse/FLINK-9570
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Dominik Wosiński


Currently _Environment.merge()_ function looks like below: 

 
{code:java}
final Environment mergedEnv = new Environment();

// merge tables
final Map tables = new HashMap<>(env1.getTables());
mergedEnv.getTables().putAll(env2.getTables());
mergedEnv.tables = tables;

{code}
and no-arg constructor for _Environment_ defaults tables to 
_Collections.emptyMap()._
This basically results in calling _putAll_ on _EmptyMap_ which defaults to 
_AbstractMap_ which always throws _UnsuppoertedOperationException._

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Clarity on Flink 1.5 Rescale mechanism

2018-06-12 Thread Sampath Bhat
Hello
In flink 1.5 release notes -
https://flink.apache.org/news/2018/05/25/release-1.5.0.html#release-notes

Various Other Features and Improvements:
Applications can be rescaled without manually triggering a savepoint. Under
the hood, Flink will still take a savepoint, stop the application, and
rescale it to the new parallelism.

What exactly does this statement mean?
How to rescale the application ?
If i'm running my flink job application with parallelism of 5 and then I
execute the following command
flink run  -p 10 then will the application rescale to 10
parallelism?

Adding on what will be the behavior in flink 1.5 if I increase the number
of task manager in a flink cluster?

So if i have a flink job running on flink cluster having 1 task manager and
i increase the number of task manager to 2 will the flink rescale the flink
job also or will the flink job be unaffected?


[jira] [Created] (FLINK-9571) Switch to internal states in StateBinder

2018-06-12 Thread Andrey Zagrebin (JIRA)
Andrey Zagrebin created FLINK-9571:
--

 Summary: Switch to internal states in StateBinder
 Key: FLINK-9571
 URL: https://issues.apache.org/jira/browse/FLINK-9571
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Affects Versions: 1.6.0
Reporter: Andrey Zagrebin
Assignee: Andrey Zagrebin
 Fix For: 1.6.0


The StateBinder factory for state objects is not a part of public API and it 
produces in fact only internal states.

It can be changed it to produce internal state interfaces instead of public API.

This can help to expose internal state API for internal components which use 
the factory, e.g. for state TTL wrappers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9572) Extend InternalAppendingState with internal stored state access

2018-06-12 Thread Andrey Zagrebin (JIRA)
Andrey Zagrebin created FLINK-9572:
--

 Summary: Extend InternalAppendingState with internal stored state 
access
 Key: FLINK-9572
 URL: https://issues.apache.org/jira/browse/FLINK-9572
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Affects Versions: 1.6.0
Reporter: Andrey Zagrebin
Assignee: Andrey Zagrebin
 Fix For: 1.6.0


 
{code:java}
public interface InternalAppendingState ... {
    SV getInternal();
    void updateInternal(SV);
}
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9573) Check for leadership with leader session id

2018-06-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9573:


 Summary: Check for leadership with leader session id
 Key: FLINK-9573
 URL: https://issues.apache.org/jira/browse/FLINK-9573
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.5.0, 1.6.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann


In order to check whether a {{LeaderContender}} is still the leader, it is not 
sufficient to simply provide a {{LeaderElectionService#hasLeadership()}}. 
Instead, we should extend this method to also take the leader session id as a 
parameter to distinguish between different calls from the same leader contender 
with different leader session ids.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [PROPOSAL] Introduce Elastic Bloom Filter For Flink

2018-06-12 Thread sihua zhou
Hi,


Maybe I would like to add more information concerning to the Linked Filter 
Nodes on each key group. The reason that we need to maintance a Linked Filter 
Nodes is that we need to handle data skew, data skew is also the most 
challenging problem that we need to overcome. Because we don't know how many 
records will fall into each key group, so we couldn't allocate a Final Filter 
Node at the begin time, so we need to allocate the Filter Node lazily, each 
time we only allocate a Small Filter Node
for the incoming records, once it filled we freeze it and allocate a new node 
for the future incoming records, so we get a Linked Filter Node on each key 
group and only the head Node is writable, the rest are immutable.


Best, Sihua
On 06/12/2018 16:22,sihua zhou wrote:
Hi Fabian,


Thanks a lot for your reply, you are right that users would need to configure a 
TTL for the Elastic Filter to recycle the memory resource.


For every Linked BloomFilter Nodes, only the head node is writable, the other 
nodes are all full, they are only immutable(only readable, we implement the 
relaxed ttl based on this feature). Even though we don't  need to remove the 
node, we still always need to insert the data into the current node(the head 
node), because the node is allocated lazily(to handle data skew), each node's a 
can only store "a part" of the data, once the current node is full, we allocate 
a new head node.


Concerning to the cuckoo filters, I also think it seem to be most appropriate 
in theroy. But there are some reasons that I prefer to implement this based on 
BF as the first interation.


- I didn't find a open source lib that provide the a "stable" cuckoo filter, 
maybe we need to implement it ourself, it's not a trivial work.


- The most attraction that cuckoo filter provided is that it support deletion, 
but since the cuckoo filter is a dense data structure, we can't store the time 
stamp with the record in cuckoo filter, we may need to depend on the "extra 
thing"(e.g. timer) to use it's deletion, the performance overhead may not cheap.


- No matter it's cuckoo filter or bloom fiter, they both seems as the "smallest 
storage unit" in the "Elastic Filter", after we provide a implementation base 
on Bloom Filter, it easily to extend to cuckoo filter.


How about to provide the Elastic Filter based on BF as the first iteration and 
provide the version that based on cuckoo filter as a second iteration? What do 
you think?


Best, Sihua
On 06/12/2018 15:43,Fabian Hueske wrote:
Hi Sihua,


Sorry for not replying earlier.

I have one question left. If I understood the design of the linked Bloomfilter 
nodes right, users would need to configure a TTL to be able to remove a node.

When nodes are removed, we would need to insert every key into the current node 
which would not be required if we don't remove nodes, right?


From the small summary of approximated filters, cuckoo filters seem to be most 
appropriate as they also support deletes.
Are you aware of any downsides compared to bloom filters (besides potentially 
slower inserts)?


Best, Fabian







2018-06-12 9:29 GMT+02:00 sihua zhou :

Hi,


no more feedbacks these days...I guess it's because you guys are too busy and 
since I didn't receive any negative feedbacks and there're already some 
positive feedbacks. So I want to implement this *Elastic Bloom Filter* based on 
the current design doc(because I have time to do it currently), even though I 
think the design can be improved definitely, but maybe we could discuss the 
improvement better base on the code, and I believe most of the code could be 
cherry picked for the "final implementation". Does anyone object this?


Best, Sihua




On 06/6/2018 22:02,sihua zhou wrote:
Hi,


Sorry, but pinging for more feedbacks on this proposal...
Even the negative feedbacks is highly appreciated!


Best, Sihua






On 05/30/2018 13:19,sihua zhou wrote:
Hi,


I did a survey of the variants of Bloom Filter and the Cuckoo filter these 
days. Finally, I found 3 of them maybe adaptable for our purpose.


1. standard bloom filter (which we have implemented base on this and used it on 
production with a good experience)
2. cuckoo filter, also a very good filter which is a space-efficient data 
structures and support fast query(even faster then BF, but the insert maybe a 
little slower than BF), addtional it support delete() operation.
3. count bloom filter, a variant of BF, it supports delete()operation, but need 
to cost 4-5x memory than the standard bloom filter(so, I'm not sure whether 
it's adaptable in practice).


Anyway, these filters are just the smallest storage unit in this "Elastic Bloom 
Filter", we can define a general interface, and provide different 
implementation of "storage unit"  base on different filter if we want. Maybe I 
should change the PROPOSAL name to the "Introduce Elastic Filter For Flink", 
the ideal of approach that I outlined in the doc is very similar to the paper 
"Optimization 

Re: Clarity on Flink 1.5 Rescale mechanism

2018-06-12 Thread Stefan Richter
Hi,

it means that you can now modify the parallelism of a running job with a new 
„modify“ command in the CLI, see [1]. Adding task manager will add their 
offered slots to the pool of available slots, it will not automatically change 
the parallelism.

Best,
Stefan

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html 


> Am 12.06.2018 um 11:50 schrieb Sampath Bhat :
> 
> Hello
> In flink 1.5 release notes -
> https://flink.apache.org/news/2018/05/25/release-1.5.0.html#release-notes
> 
> Various Other Features and Improvements:
> Applications can be rescaled without manually triggering a savepoint. Under
> the hood, Flink will still take a savepoint, stop the application, and
> rescale it to the new parallelism.
> 
> What exactly does this statement mean?
> How to rescale the application ?
> If i'm running my flink job application with parallelism of 5 and then I
> execute the following command
> flink run  -p 10 then will the application rescale to 10
> parallelism?
> 
> Adding on what will be the behavior in flink 1.5 if I increase the number
> of task manager in a flink cluster?
> 
> So if i have a flink job running on flink cluster having 1 task manager and
> i increase the number of task manager to 2 will the flink rescale the flink
> job also or will the flink job be unaffected?



Static code analysis for Flink project

2018-06-12 Thread simeon.arkhipov
Hello Flink community.

I am new in Flink project and probably don't understand it a lot. Could you 
please clarify one question to me?

I download Flink sources and build it from scratch. I found checkstyle 
guidelines that every Flink developer should follow which is very useful. 
However, I didn't find anything about static analysis tools like Sonarcube. I 
have looked through mailing lists archive but without success. That seemed very 
strange to me.

I have setup Sonarcube and run analysis on whole Flink project. After a while I 
have got 442 bugs, 511 vulnerabilities and more than 13K Code Smells issues. 
You can see them all here: 
https://sonarcloud.io/dashboard?id=org.apache.flink%3Aflink-parent

I looked through some of bugs and vulnerabilities and there are many important 
ones (in my opinions) like these:
- 'other' is dereferenced. A "NullPointerException" could be thrown; "other" is 
nullable here. 
- Either re-interrupt this method or rethrow the "InterruptedException".
- Move this call to "wait()" into a synchronized block to be sure the monitor 
on "Object" is held.
- Refactor this code so that the Iterator supports multiple traversal
- Use try-with-resources or close this "JsonGenerator" in a "finally" clause. 
Use try-with-resources or close this "JsonGenerator" in a "finally" clause. 
- Cast one of the operands of this subtraction operation to a "long".
- Make "ZERO_CALENDAR" an instance variable.
- Add a "NoSuchElementException" for iteration beyond the end of the collection.
- Replace the call to "Thread.sleep(...)" with a call to "wait(...)".
- Call "Optional#isPresent()" before accessing the value.
- Change this condition so that it does not always evaluate to "false". 
Expression is always false. 
- This class overrides "equals()" and should therefore also override 
"hashCode()".
- "equals(Object obj)" should test argument type
- Not enough arguments in LOG.debug function. Not enough arguments. 
- Remove this return statement from this finally block.
- "notify" may not wake up the appropriate thread.
- Remove the boxing to "Double".
- Classes should not be compared by name
- "buffers" is a method parameter, and should not be used for synchronization.

Are there any plans to work on static analysis support for Flink project or it 
was intentionally agreed do not use static analysis as time consuming and 
worthless?

Thank you in advance for you replies.

Best Regards,
---
Alex Arkhipov



Re: Static code analysis for Flink project

2018-06-12 Thread Ted Yu
I took a look at some of the blocker defects.
e.g.
https://sonarcloud.io/project/issues?id=org.apache.flink%3Aflink-parent&open=AWPxETxA3e-qcckj1Sl1&resolved=false&severities=BLOCKER&types=BUG

For 
./flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java
, the closing of DBOptions using try-with-resources is categorized as
blocker by the analysis.

I don't think that categorization is proper.

We can locate the high priority defects, according to consensus, and fix
those.

Cheers

On Tue, Jun 12, 2018 at 2:01 PM,  wrote:

> Hello Flink community.
>
> I am new in Flink project and probably don't understand it a lot. Could
> you please clarify one question to me?
>
> I download Flink sources and build it from scratch. I found checkstyle
> guidelines that every Flink developer should follow which is very useful.
> However, I didn't find anything about static analysis tools like Sonarcube.
> I have looked through mailing lists archive but without success. That
> seemed very strange to me.
>
> I have setup Sonarcube and run analysis on whole Flink project. After a
> while I have got 442 bugs, 511 vulnerabilities and more than 13K Code
> Smells issues. You can see them all here: https://sonarcloud.io/
> dashboard?id=org.apache.flink%3Aflink-parent
>
> I looked through some of bugs and vulnerabilities and there are many
> important ones (in my opinions) like these:
> - 'other' is dereferenced. A "NullPointerException" could be thrown;
> "other" is nullable here.
> - Either re-interrupt this method or rethrow the "InterruptedException".
> - Move this call to "wait()" into a synchronized block to be sure the
> monitor on "Object" is held.
> - Refactor this code so that the Iterator supports multiple traversal
> - Use try-with-resources or close this "JsonGenerator" in a "finally"
> clause. Use try-with-resources or close this "JsonGenerator" in a "finally"
> clause.
> - Cast one of the operands of this subtraction operation to a "long".
> - Make "ZERO_CALENDAR" an instance variable.
> - Add a "NoSuchElementException" for iteration beyond the end of the
> collection.
> - Replace the call to "Thread.sleep(...)" with a call to "wait(...)".
> - Call "Optional#isPresent()" before accessing the value.
> - Change this condition so that it does not always evaluate to "false".
> Expression is always false.
> - This class overrides "equals()" and should therefore also override
> "hashCode()".
> - "equals(Object obj)" should test argument type
> - Not enough arguments in LOG.debug function. Not enough arguments.
> - Remove this return statement from this finally block.
> - "notify" may not wake up the appropriate thread.
> - Remove the boxing to "Double".
> - Classes should not be compared by name
> - "buffers" is a method parameter, and should not be used for
> synchronization.
>
> Are there any plans to work on static analysis support for Flink project
> or it was intentionally agreed do not use static analysis as time consuming
> and worthless?
>
> Thank you in advance for you replies.
>
> Best Regards,
> ---
> Alex Arkhipov
>
>


how to build the connectors and examples from the source

2018-06-12 Thread Chris Kellogg
How can one build a connectors jar from the source?

Also, is there a quick way to build the examples from the source without
having to do a mvn clean package -DskipTests?


Thanks.
Chris


Re: how to build the connectors and examples from the source

2018-06-12 Thread Ted Yu
Which connector from the following list are you trying to build ?

https://flink.apache.org/ecosystem.html#connectors

The available connectors from 1.5.0 are quite recent. Is there any
functionality missing in the 1.5.0 release ?

Thanks

On Tue, Jun 12, 2018 at 5:17 PM, Chris Kellogg  wrote:

> How can one build a connectors jar from the source?
>
> Also, is there a quick way to build the examples from the source without
> having to do a mvn clean package -DskipTests?
>
>
> Thanks.
> Chris
>


Re: WELCOME to dev@flink.apache.org

2018-06-12 Thread Sandish Kumar HN
Can someone add me as a contributor
 Mail:sanysand...@gmail.com
FullName: Sandish Kumar HN


On 12 June 2018 at 23:14,  wrote:

> Hi! This is the ezmlm program. I'm managing the
> dev@flink.apache.org mailing list.
>
> I'm working for my owner, who can be reached
> at dev-ow...@flink.apache.org.
>
> Acknowledgment: I have added the address
>
>sanysand...@gmail.com
>
> to the dev mailing list.
>
> Welcome to dev@flink.apache.org!
>
> Please save this message so that you know the address you are
> subscribed under, in case you later want to unsubscribe or change your
> subscription address.
>
>
> --- Administrative commands for the dev list ---
>
> I can handle administrative requests automatically. Please
> do not send them to the list address! Instead, send
> your message to the correct command address:
>
> To subscribe to the list, send a message to:
>
>
> To remove your address from the list, send a message to:
>
>
> Send mail to the following for info and FAQ for this list:
>
>
>
> Similar addresses exist for the digest list:
>
>
>
> To get messages 123 through 145 (a maximum of 100 per request), mail:
>
>
> To get an index with subject and author for messages 123-456 , mail:
>
>
> They are always returned as sets of 100, max 2000 per request,
> so you'll actually get 100-499.
>
> To receive all messages with the same subject as message 12345,
> send a short message to:
>
>
> The messages should contain one line or word of text to avoid being
> treated as sp@m, but I will ignore their content.
> Only the ADDRESS you send to is important.
>
> You can start a subscription for an alternate address,
> for example "john@host.domain", just add a hyphen and your
> address (with '=' instead of '@') after the command word:
> 
>
> To stop subscription for this address, mail:
> 
>
> In both cases, I'll send a confirmation message to that address. When
> you receive it, simply reply to it to complete your subscription.
>
> If despite following these instructions, you do not get the
> desired results, please contact my owner at
> dev-ow...@flink.apache.org. Please be patient, my owner is a
> lot slower than I am ;-)
>
> --- Enclosed is a copy of the request I received.
>
> Return-Path: 
> Received: (qmail 6295 invoked by uid 99); 13 Jun 2018 04:14:39 -
> Received: from pnap-us-west-generic-nat.apache.org (HELO
> spamd2-us-west.apache.org) (209.188.14.142)
> by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2018 04:14:39
> +
> Received: from localhost (localhost [127.0.0.1])
> by spamd2-us-west.apache.org (ASF Mail Server at
> spamd2-us-west.apache.org) with ESMTP id 251381A1706
> for  c...@flink.apache.org>; Wed, 13 Jun 2018 04:14:39 + (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: 1.869
> X-Spam-Level: *
> X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31
> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
> RCVD_IN_MSPIKE_H3=-0.01,
> RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01]
> autolearn=disabled
> Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
> dkim=pass (2048-bit key) header.d=gmail.com
> Received: from mx1-lw-us.apache.org ([10.40.0.8])
> by localhost (spamd2-us-west.apache.org [10.40.0.9])
> (amavisd-new, port 10024)
> with ESMTP id wHNqYMkcZMeN
> for  c...@flink.apache.org>;
> Wed, 13 Jun 2018 04:14:36 + (UTC)
> Received: from mail-it0-f68.google.com (mail-it0-f68.google.com
> [209.85.214.68])
> by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org)
> with ESMTPS id D65EB5F19B
> for  c...@flink.apache.org>; Wed, 13 Jun 2018 04:14:35 + (UTC)
> Received: by mail-it0-f68.google.com with SMTP id k17-v6so15936144ita.0
> for  c...@flink.apache.org>; Tue, 12 Jun 2018 21:14:35 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=gmail.com; s=20161025;
> h=mime-version:in-reply-to:references:from:date:message-id:
> subject:to;
> bh=faPY6QRVgah9IETsCxWks/V6Hc5P0NyYEhDF6nJk+sw=;
> b=XMfkIikJVT/kLZQ2ZWj0tAqHS0LOloCOs4PJ9AzxHftYNS1sxn7EJeqPZb
> BR9JdD2c
>  L71WxqjeVHElS12/HLn8Q9Dj0t+fWSH5fkk3mxe5bvwc4tUaEcF/31j0Zqs
> r95eKQ25W
>  Ag/ZJfgmXy3aHvQZGpm8kMyFp3wC9/fFduJ3yVId2uqDKE5RYcR4YEPJQUiwikO
> zuwqS
>  tRCVH1Wqt/vr+7ZUjD1unMrmsKhuLhDmbuZSTUy91tq0PbxLLZIJaABkI1b
> Du0E+dyy8
>  EBwXMPSvTnhqM27CZCZiYwFgV451IfhiB4046Y9jVSrIvHU1p+mfhGdPCcEo
> 1FAuy2Do
>  ITRg==
> X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=1e100.net; s=20161025;
> h=x-gm-message-state:mime-version:in-reply-to:references:from:date
>  :message-id:subject:to;
> bh=faPY6QRVgah9IETsCxWks/V6Hc5P0NyYEhDF6nJk+sw=;
> b=ZfsxzebSTO/rrAm3mSad0b5Rfvgi2LYptJXDWCGuJUAM