Re: Kinesis Connector - NoClassDefFoundError

2018-11-20 Thread Dominik Wosiński
Hey,

Have you updated the versions both on the environment and the dependency on
the job?
>From my personal experience, 95 % of such issues is due to the mismatch
between Flink versions on the cluster you are using and Your job.

Best Regards,
Dom.

wt., 20 lis 2018 o 07:41 Steve Bistline 
napisał(a):

> Hey all... upgrade from Flink 1.5.0 to 1.6.2 and for some reason cannot
> figure out what I missed in setting up the new environment. I am gettin
> this error:
>
>
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.flink.kinesis.shaded.com.amazonaws.partitions.PartitionsLoader
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.regions.RegionMetadataFactory.create(RegionMetadataFactory.java:30)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.regions.RegionUtils.initialize(RegionUtils.java:64)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.regions.RegionUtils.getRegionMetadata(RegionUtils.java:52)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.regions.RegionUtils.getRegion(RegionUtils.java:105)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:239)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:226)
>   at 
> org.apache.flink.streaming.connectors.kinesis.util.AWSUtil.createKinesisClient(AWSUtil.java:93)
>   at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.createKinesisClient(KinesisProxy.java:203)
>   at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.(KinesisProxy.java:138)
>   at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.create(KinesisProxy.java:213)
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.(KinesisDataFetcher.java:242)
>   at 
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.(KinesisDataFetcher.java:207)
>   at 
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.createFetcher(FlinkKinesisConsumer.java:417)
>   at 
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:233)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
>
>


Re: Group by with null keys

2018-11-20 Thread Fabian Hueske
Hi Flavio,

Whether groupBy with null values works or not depends on the type of the
key, or more specifically on the TypeComparator and TypeSerializer that are
used to serialize, compare, and hash the key type.
The processing engine supports null values If the comparator and serializer
can handle null input values.

Flink SQL wraps keys in the Row type and the corresponding serializer /
comparator can handle null fields.
If you use Row in DataSet / DataStream programs, null values are supported
as well.

I think it would be good to discuss the handling of null keys on the
documentation about data types [1] and link to that from operators that
require keys.
Would you mind creating a Jira issue for that?

Thank you,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/types_serialization.html

Am Mo., 19. Nov. 2018 um 12:31 Uhr schrieb Flavio Pompermaier <
pomperma...@okkam.it>:

> Hi to all,
> we wanted to do a group by on elements that can contains null values and
> we discovered that Table API support this while Dataset API does not.
> Is this documented somehwere on the Flink site?
>
> Best,
> Flavio
>
> ---
>
> PS: you can test this with the following main:
>
> public static void main(String[] args) throws Exception {
> final ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> final BatchTableEnvironment btEnv =
> TableEnvironment.getTableEnvironment(env);
> final DataSet testDs = env
> .fromElements("test", "test", "test2", "null", "null", "test3")
> .map(x -> "null".equals(x) ? null : x);
>
> boolean testDatasetApi = true;
> if (testDatasetApi) {
>   testDs.groupBy(x -> x).reduceGroup(new GroupReduceFunction Integer>() {
>
> @Override
> public void reduce(Iterable values, Collector
> out) throws Exception {
>   int cnt = 0;
>   for (String value : values) {
> cnt++;
>   }
>   out.collect(cnt);
> }
>   }).print();
> }
>
> btEnv.registerDataSet("TEST", testDs, "field1");
> Table res = btEnv.sqlQuery("SELECT field1, count(*) as cnt FROM TEST
> GROUP BY field1");
> DataSet result = btEnv.toDataSet(res,
> new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
> BasicTypeInfo.LONG_TYPE_INFO));
> result.print();
>   }
>


Re: Group by with null keys

2018-11-20 Thread Flavio Pompermaier
Sure! The problem is that Dataset API does an implicit conversion to Tuples
during chaining and I didn't found any documentation about this (actually I
was  pleasantly surprised by the fact that the Table API were supporting
aggregates on null values..).

Here it is: https://issues.apache.org/jira/browse/FLINK-10947

Thanks for the reply,
Flavio

On Tue, Nov 20, 2018 at 11:33 AM Fabian Hueske  wrote:

> Hi Flavio,
>
> Whether groupBy with null values works or not depends on the type of the
> key, or more specifically on the TypeComparator and TypeSerializer that are
> used to serialize, compare, and hash the key type.
> The processing engine supports null values If the comparator and
> serializer can handle null input values.
>
> Flink SQL wraps keys in the Row type and the corresponding serializer /
> comparator can handle null fields.
> If you use Row in DataSet / DataStream programs, null values are supported
> as well.
>
> I think it would be good to discuss the handling of null keys on the
> documentation about data types [1] and link to that from operators that
> require keys.
> Would you mind creating a Jira issue for that?
>
> Thank you,
> Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/types_serialization.html
>
> Am Mo., 19. Nov. 2018 um 12:31 Uhr schrieb Flavio Pompermaier <
> pomperma...@okkam.it>:
>
>> Hi to all,
>> we wanted to do a group by on elements that can contains null values and
>> we discovered that Table API support this while Dataset API does not.
>> Is this documented somehwere on the Flink site?
>>
>> Best,
>> Flavio
>>
>> ---
>>
>> PS: you can test this with the following main:
>>
>> public static void main(String[] args) throws Exception {
>> final ExecutionEnvironment env =
>> ExecutionEnvironment.getExecutionEnvironment();
>> final BatchTableEnvironment btEnv =
>> TableEnvironment.getTableEnvironment(env);
>> final DataSet testDs = env
>> .fromElements("test", "test", "test2", "null", "null", "test3")
>> .map(x -> "null".equals(x) ? null : x);
>>
>> boolean testDatasetApi = true;
>> if (testDatasetApi) {
>>   testDs.groupBy(x -> x).reduceGroup(new GroupReduceFunction> Integer>() {
>>
>> @Override
>> public void reduce(Iterable values, Collector
>> out) throws Exception {
>>   int cnt = 0;
>>   for (String value : values) {
>> cnt++;
>>   }
>>   out.collect(cnt);
>> }
>>   }).print();
>> }
>>
>> btEnv.registerDataSet("TEST", testDs, "field1");
>> Table res = btEnv.sqlQuery("SELECT field1, count(*) as cnt FROM TEST
>> GROUP BY field1");
>> DataSet result = btEnv.toDataSet(res,
>> new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
>> BasicTypeInfo.LONG_TYPE_INFO));
>> result.print();
>>   }
>>
>


Re: Group by with null keys

2018-11-20 Thread Timo Walther
I assigned the issue to me. Because I wanted to that for a very long 
time. I already did some prerequisite work for the documentation in 
`org.apache.flink.api.common.typeinfo.Types`.


Thanks,
Timo

Am 20.11.18 um 11:44 schrieb Flavio Pompermaier:
Sure! The problem is that Dataset API does an implicit conversion to 
Tuples during chaining and I didn't found any documentation about this 
(actually I was  pleasantly surprised by the fact that the Table API 
were supporting aggregates on null values..).


Here it is: https://issues.apache.org/jira/browse/FLINK-10947

Thanks for the reply,
Flavio

On Tue, Nov 20, 2018 at 11:33 AM Fabian Hueske > wrote:


Hi Flavio,

Whether groupBy with null values works or not depends on the type
of the key, or more specifically on the TypeComparator and
TypeSerializer that are used to serialize, compare, and hash the
key type.
The processing engine supports null values If the comparator and
serializer can handle null input values.

Flink SQL wraps keys in the Row type and the corresponding
serializer / comparator can handle null fields.
If you use Row in DataSet / DataStream programs, null values are
supported as well.

I think it would be good to discuss the handling of null keys on
the documentation about data types [1] and link to that from
operators that require keys.
Would you mind creating a Jira issue for that?

Thank you,
Fabian

[1]

https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/types_serialization.html

Am Mo., 19. Nov. 2018 um 12:31 Uhr schrieb Flavio Pompermaier
mailto:pomperma...@okkam.it>>:

Hi to all,
we wanted to do a group by on elements that can contains null
values and we discovered that Table API support this while
Dataset API does not.
Is this documented somehwere on the Flink site?

Best,
Flavio

---

PS: you can test this with the following main:

public static void main(String[] args) throws Exception {
    final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();
    final BatchTableEnvironment btEnv =
TableEnvironment.getTableEnvironment(env);
    final DataSet testDs = env
        .fromElements("test", "test", "test2", "null", "null",
"test3")
        .map(x -> "null".equals(x) ? null : x);

    boolean testDatasetApi = true;
    if (testDatasetApi) {
      testDs.groupBy(x -> x).reduceGroup(new
GroupReduceFunction() {

        @Override
        public void reduce(Iterable values,
Collector out) throws Exception {
          int cnt = 0;
          for (String value : values) {
            cnt++;
          }
          out.collect(cnt);
        }
      }).print();
    }

    btEnv.registerDataSet("TEST", testDs, "field1");
    Table res = btEnv.sqlQuery("SELECT field1, count(*) as cnt
FROM TEST GROUP BY field1");
    DataSet result = btEnv.toDataSet(res,
        new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
BasicTypeInfo.LONG_TYPE_INFO));
    result.print();
  }







Tentative release date for 1.6.3

2018-11-20 Thread Shailesh Jain
Hi,

Is the tentative release date for 1.6.3 decided?

Thanks,
Shailesh


how to override s3 key config in flink job

2018-11-20 Thread Tony Wei
Hi,

Is there any way to provide s3.access-key and s3.secret-key in flink
application, instead of setting
them in flink-conf.yaml?

In our use case, my team provide a flink standalone cluster to users.
However, we don't want to let
each user use the same s3 bucket as filesystem to store checkpoints. So, we
want to know if is it
feasible to let users provide their checkpoint path and corresponding aws
key to access their own
s3 bucket?

If not, could you show me why it doesn't work currently? And, is it
possible to become a new
feature?

Thanks in advance for your help.

Best,
Tony Wei


About the issue caused by flink's dependency jar package submission method

2018-11-20 Thread clay4444
hi all:

I know that when submitting flink jobs, flink's official recommendation is
to put all the dependencies and business logic into a fat jar, but now our
requirement is to separate the business logic and rely on dynamic commits,
so I found one. One way, use the -yt and -C parameters to submit the task,
execute it in the yarn, so that the task can be submitted, but the following
error is always reported when running.

org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
instantiate user function.
at
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:239)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:104)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: cannot assign instance of
org.apache.commons.collections.map.LinkedMap to field
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit
of type org.apache.commons.collections.map.LinkedMap in instance of
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at 
java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
at
org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
at
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:224)
... 4 more

is there anyone know about this?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


ClassNotFoundException: org.apache.kafka.common.metrics.stats.Rate$1

2018-11-20 Thread Avi Levi
looking at the log file of the taskexecutor I see this exception








*at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)Caused
by: java.lang.ClassNotFoundException:
org.apache.kafka.common.metrics.stats.Rate$1at
java.net.URLClassLoader.findClass(URLClassLoader.java:382)at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)at
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129)at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)*
... 22 more
anyone know what should I do to avoid it ?


Re: Tentative release date for 1.6.3

2018-11-20 Thread Stefan Richter
Hi,

there is no release date for 1.6.3, yet.

Best,
Stefan

> On 20. Nov 2018, at 12:18, Shailesh Jain  wrote:
> 
> Hi,
> 
> Is the tentative release date for 1.6.3 decided?
> 
> Thanks,
> Shailesh



[ANNOUNCE] Weekly community update #47

2018-11-20 Thread Till Rohrmann
Dear community,

this is the weekly community update thread #47. Please post any news and
updates you want to share with the community to this thread.

# Updates on sharing state between subtasks

Jamie opened a first PR to add a first version of sharing state between
tasks. It works by using the JobMaster as the point of synchronization [1].

# Refactor source interface

There is a lively discussion about refactoring Flink's source interface to
make it future proof [2]. Join the discussion if you want to learn more.

# Task speculative execution for batch jobs

Tao Yangyu started a discussion about executing batch tasks speculatively
in order to mitigate the straggler problem [3]. If you have good ideas for
this problem, then please chime in.

# 2nd release candidate for Flink 1.7.0

The community just published the second release candidate for Flink 1.7.0
[4]. Please help the community by testing the latest release candidate and
report any encountered problems. Thanks a lot!

# Embracing Table API in Flink ML

Weihua kicked off a discussion about how to build Flink's next Machine
Learning pipelines [5]. He drafted a design document with his proposal.
Join the discussion to learn more about and share your opinion.

# Support for interactive programming in Flink Table API

Jiangjie started a discussion about interactive programming with Flink's
Table API [6]. The idea is to make results of previous jobs accessible to
successive Flink jobs to better support a REPL like job execution.

[1]
https://lists.apache.org/thread.html/b6eca694eaf7ee19386f4fb407098ae4b58df788b539e0666e28c37c@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/70484d6aa4b8e7121181ed8d5857a94bfb7d5a76334b9c8fcc59700c@%3Cdev.flink.apache.org%3E
[3]
https://lists.apache.org/thread.html/54514523379d3768159312a6f25071547e7e63f3b9bf9e19eb4f3937@%3Cdev.flink.apache.org%3E
[4]
https://lists.apache.org/thread.html/10913c242452d018840ba541d29323314732e4498777f77b002e30ad@%3Cdev.flink.apache.org%3E
[5]
https://lists.apache.org/thread.html/cf83aea5bb5ff7a719fe4dc082325469969e5cfc49786646ffc0c8f2@%3Cdev.flink.apache.org%3E
[6]
https://lists.apache.org/thread.html/8a93d331f69ed9aa2c30dbc7793a3e8803155aa08fdaec71681aa92a@%3Cdev.flink.apache.org%3E

Cheers,
Till


Could not extract key Exception only on runtime not in dev environment

2018-11-20 Thread Avi Levi
I am running flink locally on my machine , I am getting the exception below
when reading from kafka topic. when running from the ide (intellij) it is
running perfectly. however when I deploy my jar to flink runtime (locally)
using

*/bin/flink run ~MyApp-1.0-SNAPSHOT.jar*
my class looks like this
case class Foo(id: String, value: String, timestamp: Long, counter: Int)
I am getting this exception

*java.lang.RuntimeException: Could not extract key from
Foo("some-uuid","text",1540348398,1)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
at 
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Could not extract key from
Foo("some-uuid","text",1540348398,1)
at 
org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:61)
at 
org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:32)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:106)
at 
org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 22 more
Caused by: java.lang.NullPointerException
at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:41)
at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:40)
at 
org.apache.flink.streaming.api.scala.DataStream$$anon$2.getKey(DataStream.scala:411)
at 
org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:59)
... 26 more*


my key partition is simple (partitionFactor = some number)

*.keyBy{ r =>
val h = fastHash(r.id ) % partitionFactor
math.abs(h)
}*

again, this happens only on runtime not when I run it from intellij

this so frustrating, any advice ?


Re: Could not extract key Exception only on runtime not in dev environment

2018-11-20 Thread miki haiat
What r.id  Value ?
Are you sure that is not null ?

Miki.


On Tue, 20 Nov 2018, 17:26 Avi Levi  I am running flink locally on my machine , I am getting the exception
> below when reading from kafka topic. when running from the ide (intellij)
> it is running perfectly. however when I deploy my jar to flink runtime
> (locally) using
>
> */bin/flink run ~MyApp-1.0-SNAPSHOT.jar*
> my class looks like this
> case class Foo(id: String, value: String, timestamp: Long, counter: Int)
> I am getting this exception
>
> *java.lang.RuntimeException: Could not extract key from 
> Foo("some-uuid","text",1540348398,1)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
>   at 
> org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Could not extract key from 
> Foo("some-uuid","text",1540348398,1)
>   at 
> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:61)
>   at 
> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:32)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:106)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
>   ... 22 more
> Caused by: java.lang.NullPointerException
>   at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:41)
>   at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:40)
>   at 
> org.apache.flink.streaming.api.scala.DataStream$$anon$2.getKey(DataStream.scala:411)
>   at 
> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:59)
>   ... 26 more*
>
>
> my key partition is simple (partitionFactor = some number)
>
> *.keyBy{ r =>
> val h = fastHash(r.id ) % partitionFactor
> math.abs(h)
> }*
>
> again, this happens only on runtime not when I run it from intellij
>
> this so frustrating, any advice ?
>
>
>


Re: Could not extract key Exception only on runtime not in dev environment

2018-11-20 Thread Avi Levi
yes you can also see it in the message (and also it should have crushed
also on the ide) further more to be sure I added a filter that looks like
*env*
*.addSource(kafka_source)*
*.filter(_.id != null)*
*.keyBy{ r =>*
*val h = fastHash(r.id ) % partitionFactor*
*math.abs(h)*
*}*
*.map(...)*

and still the same

On Tue, Nov 20, 2018 at 5:31 PM miki haiat  wrote:

> What r.id  Value ?
> Are you sure that is not null ?
>
> Miki.
>
>
> On Tue, 20 Nov 2018, 17:26 Avi Levi 
>> I am running flink locally on my machine , I am getting the exception
>> below when reading from kafka topic. when running from the ide (intellij)
>> it is running perfectly. however when I deploy my jar to flink runtime
>> (locally) using
>>
>> */bin/flink run ~MyApp-1.0-SNAPSHOT.jar*
>> my class looks like this
>> case class Foo(id: String, value: String, timestamp: Long, counter: Int)
>> I am getting this exception
>>
>> *java.lang.RuntimeException: Could not extract key from 
>> Foo("some-uuid","text",1540348398,1)
>>  at 
>> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
>>  at 
>> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
>>  at 
>> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
>>  at 
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
>>  at 
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
>>  at 
>> org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:40)
>>  at 
>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>>  at 
>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>>  at 
>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>>  at 
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:689)
>>  at 
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:667)
>>  at 
>> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>>  at 
>> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>>  at 
>> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>>  at 
>> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:89)
>>  at 
>> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:154)
>>  at 
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
>>  at 
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>  at 
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>  at 
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>  at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>>  at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.RuntimeException: Could not extract key from 
>> Foo("some-uuid","text",1540348398,1)
>>  at 
>> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:61)
>>  at 
>> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:32)
>>  at 
>> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:106)
>>  at 
>> org.apache.flink.streaming.runtime.io.StreamRecordWriter.emit(StreamRecordWriter.java:81)
>>  at 
>> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
>>  ... 22 more
>> Caused by: java.lang.NullPointerException
>>  at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:41)
>>  at com.bluevoyant.StreamingJob$$anonfun$3.apply(StreamingJob.scala:40)
>>  at 
>> org.apache.flink.streaming.api.scala.DataStream$$anon$2.getKey(DataStream.scala:411)
>>  at 
>> org.apache.flink.streaming.runtime.partitioner.KeyGroupStreamPartitioner.selectChannels(KeyGroupStreamPartitioner.java:59)
>>  ... 26 more*
>>
>>
>> my key partition is simple (partitionFactor = some number)
>>
>> *.keyBy{ r =>
>> val 

Flink Table Duplicate Evaluation

2018-11-20 Thread Niklas Teichmann

Hi everybody,

I have a question concerning the Flink Table API, more precisely the  
way the results of tables statements are evaluated. In the following  
code example, the statement defining the table t1 is evaluated twice,  
an effect that leads to some issues of performance and logic in the  
program I am trying to write.


List longList = Arrays.asList(1L, 2L, 3L, 4L, 5L);
DataSet longDataSet =  
getExecutionEnvironment().fromCollection(longList);


tenv.registerDataSet("longs", longDataSet, "l");
tenv.registerFunction("time", new Time()); //an example UDF that  
evaluates the current time


Table t1 = tenv.scan("longs");
t1 = t1.select("l, time() as t");

Table t2 = t1.as("l1, id1");
Table t3 = t1.as("l2, id2");

Table t4 = t2.join(t3).where("l1 == l2");

t4.writeToSink(new PrintTableSink() ); //a sink that prints the  
content of the table


I realize that this behaviour is defined in the documentation ("A  
registered Table is treated similarly to a VIEW ...") and probably  
stems from the DataStream API. But is there a preferred way to avoid  
this?


Currently I'm using a workaround that defines a TableSink which in  
turn registers its output as a new table. That seems extremely hacky  
though.


Sorry if I missed something obvious!

All the best,
Niklas


--





Re: Flink Table Duplicate Evaluation

2018-11-20 Thread Fabian Hueske
Hi Niklas,

The workaround that you described should work fine.
However, you don't need a custom sink.
Converting the Table into a DataSet and registering the DataSet again as a
Table is currently the way to solve this issue.

Best, Fabian

Am Di., 20. Nov. 2018 um 17:13 Uhr schrieb Niklas Teichmann <
mai11...@studserv.uni-leipzig.de>:

> Hi everybody,
>
> I have a question concerning the Flink Table API, more precisely the
> way the results of tables statements are evaluated. In the following
> code example, the statement defining the table t1 is evaluated twice,
> an effect that leads to some issues of performance and logic in the
> program I am trying to write.
>
> List longList = Arrays.asList(1L, 2L, 3L, 4L, 5L);
> DataSet longDataSet =
> getExecutionEnvironment().fromCollection(longList);
>
> tenv.registerDataSet("longs", longDataSet, "l");
> tenv.registerFunction("time", new Time()); //an example UDF that
> evaluates the current time
>
> Table t1 = tenv.scan("longs");
> t1 = t1.select("l, time() as t");
>
> Table t2 = t1.as("l1, id1");
> Table t3 = t1.as("l2, id2");
>
> Table t4 = t2.join(t3).where("l1 == l2");
>
> t4.writeToSink(new PrintTableSink() ); //a sink that prints the
> content of the table
>
> I realize that this behaviour is defined in the documentation ("A
> registered Table is treated similarly to a VIEW ...") and probably
> stems from the DataStream API. But is there a preferred way to avoid
> this?
>
> Currently I'm using a workaround that defines a TableSink which in
> turn registers its output as a new table. That seems extremely hacky
> though.
>
> Sorry if I missed something obvious!
>
> All the best,
> Niklas
>
>
> --
>
>
>
>


Re: ClassNotFoundException: org.apache.kafka.common.metrics.stats.Rate$1

2018-11-20 Thread Stefan Richter
Hi,

It should be as easy as making sure that there is a jar with the missing class 
in the class path of your user-code class loader.

Best,
Stefan

> On 20. Nov 2018, at 14:32, Avi Levi  wrote:
> 
> looking at the log file of the taskexecutor I see this exception
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.kafka.common.metrics.stats.Rate$1
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 22 more
> anyone know what should I do to avoid it ?



[ANNOUNCE] Flink Forward San Francisco Call for Presentations closes soon

2018-11-20 Thread Fabian Hueske
Hi Everyone,

Flink Forward San Francisco will *take place on April 1st and 2nd 2019*.
Flink Forward is a community conference organized by data Artisans and
gathers many members of the Flink community, including users, contributors,
and committers. It is the perfect event to get in touch and connect with
other stream processing enthusiasts and Flink users to exchange experiences
and ideas.

The Call for Presentations is still open but will *close soon on November
30th (next week Friday)*.

Please submit a talk proposal, if you have an interesting Flink use case or
experience running Flink applications in production that you would like to
share.

 You can submit a talk proposal here
--> https://flink-forward.org/sf-2019/call-for-presentations-submit-talk/

Best regards,
Fabian


Flink JSON (string) to Pojo (and vice versa) example

2018-11-20 Thread Flavio Pompermaier
Hi everybody,
since here at Okkam we didn't find any "native" Flink map functions that
already permit to pass from JSON strings to POJOs (and vice versa), we
decided to share with the Flink community a simple implementation for these
2 tasks:
 - JSON (string) to POJO [1]
 - POJO to JSON (string) [2].
A Flink Main class that use those 2 functions can be found at [1].

Any feedback is welcome!

Best,
Flavio

[1]
https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/Json2PojoExample.java
[2]
https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/json/JsonStringToPojo.java
[3]
https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/flink/json/Pojo2JsonString.java


Reset kafka offets to latest on restart

2018-11-20 Thread Vishal Santoshi
Is it possible to have checkpointing but reset the kafka offsets to  latest
on restart on failure ?


Exception occurred while processing valve output watermark & NullPointerException

2018-11-20 Thread Steve Bistline
Any guidance would be most appreciated.

Thx

Steve
===

java.lang.RuntimeException: Exception occurred while processing valve
output watermark:
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException:
org.apache.flink.util.FlinkRuntimeException: Failure happened in
filter function.
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:284)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at 
java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.onEventTime(AbstractKeyedCEPPatternOperator.java:279)
at 
org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:251)
at 
org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:769)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
... 7 more
Caused by: org.apache.flink.util.FlinkRuntimeException: Failure
happened in filter function.
at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:731)
at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:541)
at org.apache.flink.cep.nfa.NFA.doProcess(NFA.java:284)
at org.apache.flink.cep.nfa.NFA.process(NFA.java:220)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:379)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:282)
... 14 more
Caused by: java.lang.NullPointerException
at java.lang.String.contains(String.java:2133)
at 
com.amazonaws.prediction.FlinkCEPAudio$4.filter(FlinkCEPAudio.java:119)
at 
com.amazonaws.prediction.FlinkCEPAudio$4.filter(FlinkCEPAudio.java:114)
at 
org.apache.flink.cep.pattern.conditions.AndCondition.filter(AndCondition.java:43)
at org.apache.flink.cep.nfa.NFA.checkFilterCondition(NFA.java:742)
at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:716)
... 19 more



==


The code


  // Consume the data streams from AWS Kinesis stream
DataStream dataStream = env.addSource(new FlinkKinesisConsumer<>(
pt.getRequired("stream"),
new EventSchema(),
kinesisConsumerConfig))
.name("Kinesis Stream Consumer");

   //dataStream.print();

DataStream kinesisStream = dataStream
.assignTimestampsAndWatermarks(new TimeLagWatermarkGenerator())
.map(event -> (IoTEvent) event);

// Prints the mapped records from the Kinesis stream

//kinesisStream.print();


Pattern pattern = Pattern
. begin("first event").subtype(IoTEvent.class)
.where(new IterativeCondition()
{
//private static final long serialVersionUID =
-6301755149429716724L;

@Override
public boolean filter(IoTEvent value,
Context ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
}
})
.next("second")
.subtype(IoTEvent.class)
.where(new IterativeCondition() {
//private static final long serialVersionUID =
2392863109523984059L;

@Override
public boolean filter(IoTEvent value,
Context ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
  

Re: Per job cluster doesn't shut down after the job is canceled

2018-11-20 Thread Gary Yao
Hi Paul,

Sorry for the late reply. I had a look at the attached log. I think
FLINK-10482 affects the shut down of the "per-job cluster" after all. Here
is
the respective stacktrace:

2018-11-06 10:45:17,405 ERROR
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor  - Caught
exception while executing runnable in main thread.
java.lang.IllegalArgumentException: Negative number of in progress
checkpoints
at
org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.(CheckpointStatsCounts.java:72)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177)
at
org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553)
at
org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340)
at
org.apache.flink.runtime.jobmaster.JobMaster.jobStatusChanged(JobMaster.java:1247)
at
org.apache.flink.runtime.jobmaster.JobMaster.access$1600(JobMaster.java:147)
at
org.apache.flink.runtime.jobmaster.JobMaster$JobManagerJobStatusListener.lambda$jobStatusChanges$0(JobMaster.java:1590)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

We try to create an ArchivedExecutionGraph, which fails because we cannot
snapshot the checkpoint statistics. The subsequent code that should
ultimately
shut down the cluster is not executed [1]. If you can tell us how you run
into
the "Negative number of in progress checkpoints" problem, we might be able
to
come up with a mitigation until FLINK-10482 is fixed.

Best,
Gary

[1]
https://github.com/apache/flink/blob/614f2162e42345da7501f8f6ea724a7e0ce65e3c/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L1247-L1248

On Wed, Nov 14, 2018 at 9:46 AM Paul Lam  wrote:

> Hi Gary,
>
> Thanks for your reply and sorry for the delay. The attachment is the
> jobmanager logs after invoking the cancel command.
>
> I think it might be related to the custom source, because the jobmanager
> keeps trying to trigger a checkpoint for it,
> but in fact it’s already canceled. The source implementation is using a
> running flag to denote it’s running, and the
> cancel method is simply setting the flag to false, which I think is a
> common way of implementing a custom source.
> In addition, the cluster finally shut down because I killed it with yarn
> commands.
>
> And also thank you for the pointer, I’ll keep tracking this problem.
>
> Best,
> Paul Lam
>
>
> 在 2018年11月10日,02:10,Gary Yao  写道:
>
> Hi Paul,
>
> Can you share the complete logs, or at least the logs after invoking the
> cancel command?
>
> If you want to debug it yourself, check if
> MiniDispatcher#jobReachedGloballyTerminalState [1] is invoked, and see how
> the
> jobTerminationFuture is used.
>
> Best,
> Gary
>
> [1]
> https://github.com/apache/flink/blob/091cff3299aed4bb143619324f6ec8165348d3ae/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/MiniDispatcher.java#L141
>
>
> On Wed, Nov 7, 2018 at 3:27 AM Paul Lam  wrote:
>
>> Hi,
>>
>> I’m using Flink 1.5.3, and I’ve seen several times that the detached YARN
>> cluster doesn’t shut down after the job is canceled successfully. The only
>> errors I found in jobmanager’s log are as below (the second one appears
>> multiple times):
>>
>> ```
>>
>> 2018-11-07 09:48:38,663 WARN  
>> org.apache.flink.runtime.executiongraph.ExecutionGraph- Error while 
>> notifying JobStatusListener
>> java.lang.IllegalStateException: Incremented the completed number of 
>> che

Re: About the issue caused by flink's dependency jar package submission method

2018-11-20 Thread Ken Krugler
My only guess would be that you have two versions of the Apache Commons jar on 
your class path, or the version you compiled against doesn’t match what you’re 
running against, and that’s why you get:

Caused by: java.lang.ClassCastException: cannot assign instance of 
org.apache.commons.collections.map.LinkedMap to field 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit
 of type 
org.apache.commons.collections.map.LinkedMap

If that’s the case, often the cause is that your Yarn environment has a 
different version of a jar than what’s in your fat jar.

Though I would expect proper shading would prevent that from happening.

— Ken

> On Nov 20, 2018, at 5:05 AM, clay  > wrote:
> 
> hi all:
> 
> I know that when submitting flink jobs, flink's official recommendation is
> to put all the dependencies and business logic into a fat jar, but now our
> requirement is to separate the business logic and rely on dynamic commits,
> so I found one. One way, use the -yt and -C parameters to submit the task,
> execute it in the yarn, so that the task can be submitted, but the following
> error is always reported when running.
> 
> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
> instantiate user function.
>   at
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:239)
>   at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:104)
>   at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: cannot assign instance of
> org.apache.commons.collections.map.LinkedMap to field
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit
> of type org.apache.commons.collections.map.LinkedMap in instance of
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011
>   at
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
>   at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
>   at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
>   at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
>   at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
>   at
> org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:438)
>   at
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:224)
>   ... 4 more
> 
> is there anyone know about this?
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ 
> 

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com 
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra



--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra



Re: Tentative release date for 1.6.3

2018-11-20 Thread vino yang
Hi Shailesh,

Flink 1.7 is about to be released, and many of the problems encountered
since Flink 1.6.2 have been fixed in version 1.7. So, I suggest you look
forward and upgrade to Flink 1.7.

Thanks, vino.

Stefan Richter  于2018年11月20日周二 下午10:15写道:

> Hi,
>
> there is no release date for 1.6.3, yet.
>
> Best,
> Stefan
>
> > On 20. Nov 2018, at 12:18, Shailesh Jain 
> wrote:
> >
> > Hi,
> >
> > Is the tentative release date for 1.6.3 decided?
> >
> > Thanks,
> > Shailesh
>
>


Store Predicate or any lambda in MapState

2018-11-20 Thread Jayant Ameta
Hi,
I want to store a custom POJO in the MapState. One of the fields in the
object is a java.util.function.Predicate type.
Flink gives ClassNotFoundException exception on the lambda. How do I store
this object in the mapState?

Marking the predicate field as transient is an option. But in my use-case,
the predicate field is set using another library, and I don't want to call
it every time I want.


Jayant Ameta


Re: Exception occurred while processing valve output watermark & NullPointerException

2018-11-20 Thread vino yang
Hi Steve,

It seems the NPE caused by the property of the IoTEvent's instance. Can you
make sure the property is not null?

Thanks, vino.

Steve Bistline  于2018年11月21日周三 上午2:09写道:

> Any guidance would be most appreciated.
>
> Thx
>
> Steve
> ===
>
> java.lang.RuntimeException: Exception occurred while processing valve output 
> watermark:
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
>   at 
> org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
>   at 
> org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.util.FlinkRuntimeException: Failure happened in filter 
> function.
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:284)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at 
> java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.onEventTime(AbstractKeyedCEPPatternOperator.java:279)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:251)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:769)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
>   ... 7 more
> Caused by: org.apache.flink.util.FlinkRuntimeException: Failure happened in 
> filter function.
>   at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:731)
>   at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:541)
>   at org.apache.flink.cep.nfa.NFA.doProcess(NFA.java:284)
>   at org.apache.flink.cep.nfa.NFA.process(NFA.java:220)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:379)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:282)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at java.lang.String.contains(String.java:2133)
>   at 
> com.amazonaws.prediction.FlinkCEPAudio$4.filter(FlinkCEPAudio.java:119)
>   at 
> com.amazonaws.prediction.FlinkCEPAudio$4.filter(FlinkCEPAudio.java:114)
>   at 
> org.apache.flink.cep.pattern.conditions.AndCondition.filter(AndCondition.java:43)
>   at org.apache.flink.cep.nfa.NFA.checkFilterCondition(NFA.java:742)
>   at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:716)
>   ... 19 more
>
>
>
> ==
>
>
> The code
>
>
>   // Consume the data streams from AWS Kinesis stream
> DataStream dataStream = env.addSource(new 
> FlinkKinesisConsumer<>(
> pt.getRequired("stream"),
> new EventSchema(),
> kinesisConsumerConfig))
> .name("Kinesis Stream Consumer");
>
>//dataStream.print();
>
> DataStream kinesisStream = dataStream
> .assignTimestampsAndWatermarks(new 
> TimeLagWatermarkGenerator())
> .map(event -> (IoTEvent) event);
>
> // Prints the mapped records from the Kinesis stream
>
> //kinesisStream.print();
>
>
> Pattern pattern = Pattern
> . begin("first event").subtype(IoTEvent.class)
> .where(new IterativeCondition()
> {
> //private static final long serialVersionUID = 
> -6301755149429716724L;
>
> @Override
> public boolean filter(IoTEvent value, Context 
> ctx) throws Exception {
> return 
> PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1() );
> }
> })
> .next("second")
> .subtype(IoTEvent.class)
> .where(new IterativeCond

Re: About the issue caused by flink's dependency jar package submission method

2018-11-20 Thread clay4444
hi

I have checked all the dependences, and don't find the jar with different
version, so ,I double the way to submit jar has some issue? my commend is
like this:

/data/flink1.6/bin/flink  run -m yarn-cluster -ytm 8032 -yn 1 -ys 1 -yqu
 -yt /data/flink1.6//lib -c com.xxx.xxx.xxx.Launch -C
http://xx.xxx.xxx.xxx:80/lib/slf4j-api-1.7.6.jar -C
http://xx.xxx.xxx.xxx:80/lib/lucene-spatial3d-7.3.1.jar -C
http://xx.xxx.xxx.xxx:80/lib/lucene-backward-codecs-7.3.1.jar -C
http://xx.xxx.xxx.xxx:80/lib/log4j-to-slf4j-2.9.1.jar -C
http://xx.xxx.xxx.xxx:80/lib/jackson-dataformat-smile-2.8.10.jar -C
http://xx.xxx.xxx.xxx:80/lib/mysql-connector-java-5.1.39.jar -C
http://xx.xxx.xxx.xxx:80/lib/commons-lang-2.6.jar -C
http://xx.xxx.xxx.xxx:80/lib/apacheds-kerberos-codec-2.0.0-M15.jar -C
http://xx.xxx.xxx.xxx:80/lib/slf4j-log4j12-1.7.5.jar -C
http://xx.xxx.xxx.xxx:80/lib/guava-12.0.1.jar -C
http://xx.xxx.xxx.xxx:80/lib/netty-all-4.0.23.Final.jar -C
http://xx.xxx.xxx.xxx:80/lib/scala-parser-combinators_2.11-1.0.4.jar -C
http://xx.xxx.xxx.xxx:80/lib/lucene-spatial-7.3.1.jar -C
http://xx.xxx.xxx.xxx:80/lib/commons-compress-1.4.1.jar  x.jar

and I put all the dependence to a nginx dir,

is this way have problem? 
because I also encounter problems like this:

org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:765)
at
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:633)
at
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:615)
at
org.apache.flink.streaming.connectors.kafka.internal.Kafka09PartitionDiscoverer.initializeConnections(Kafka09PartitionDiscoverer.java:58)
at
org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.open(AbstractPartitionDiscoverer.java:94)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:469)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:424)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:290)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException:
org.apache.kafka.common.serialization.ByteArrayDeserializer is not an
instance of org.apache.kafka.common.serialization.Deserializer
at
org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:248)
at
org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:673)
... 11 more

 but in a fat jar, these work all good



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-20 Thread liujiangang
I am using IntervalJoin function to join two streams within 10 minutes. As
below:

labelStream.intervalJoin(adLogStream)
   .between(Time.milliseconds(0), Time.milliseconds(60))
   .process(new processFunction())
   .sink(kafkaProducer)
labelStream and adLogStream are proto-buf class that are keyed by Long id.

Our two input-streams are huge. After running about 30minutes, the output to
kafka go down slowly, like this:

 

When data output begins going down, I use jstack and pstack sevaral times to
get these: 

 

 

It seems the program is stucked in rockdb's seek. And I find that some
rockdb's srt file are accessed slowly by iteration.

 

I have tried several ways:

1)Reduce the input amount to half. This works well.
2)Replace labelStream and adLogStream with simple Strings. This way, data
amount will not change. This works well.
3)Use PredefinedOptions like SPINNING_DISK_OPTIMIZED and
SPINNING_DISK_OPTIMIZED_HIGH_MEM. This still fails.
4)Use new versions of rocksdbjni. This still fails.
Can anyone give me some suggestions? Thank you very much.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: TaskManager & task slots

2018-11-20 Thread yinhua.dai
Hi Fabian,

Is below description still remain the same in Flink 1.6?

Slots do not guard CPU time, IO, or JVM memory. At the moment they only
isolate managed memory which is only used for batch processing. For
streaming applications their only purpose is to limit the number of parallel
threads that can be executed by a TaskManager.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: About the issue caused by flink's dependency jar package submission method

2018-11-20 Thread yinhua.dai
As far as I know, -yt works for both job manager and task manager, -C works
for flink cli.

Did you consider putting all your jars in /flink/lib?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/