Re: Apache Flink - Evictor interface clarification

2017-11-21 Thread M Singh
Thanks Vishu for the pointer. On Saturday, November 18, 2017 6:27 PM, Vishnu Viswanath wrote: Hi Mans, Have a look at this: http://apache-flink- mailing-list-archive.1008284. n3.nabble.com/DISCUSS-Enhance- Window-Evictor-in-Flink- tp12406p12442.html Thanks,Vishnu On Sat, Nov 18, 2017 a

Re: Docker and AWS taskmanager configuration

2017-11-21 Thread Colin Williams
Hi Patrick "unique-dns-address" is an alias for private IP. If XXX-XX-XX-XXX is the private IP, then ip-XXX-XX-XX-XXX.aws-region-X is the "unique-dns-address". We were using auto scaling groups. What worked out with the givens above was setting docker --net=host , and then we saw "unique-dns-addr

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-21 Thread Gordon Weakliem
Isn't one cause for ClassNotFoundException that the class can't load due to failed dependencies or a failure in a static constructor? If jar -tf target/program.jar | grep MeasurementTable shows the class is present, are there other dependencies missing? You may need to add runtime dependencies int

Re: Job Manager Configuration

2017-11-21 Thread Joshua Griffith
We run on a dedicated cluster managed by Kubernetes. The task managers run as a DaemonSet and the job manager runs as a Deployment. We had to increase the Akka frame size and client timeout on the service that submits jobs but we haven’t altered any Akka settings in the cluster. Here’s the conta

FlinkKafkaConsumer010 does not start from the next record on startup from offsets in Kafka

2017-11-21 Thread r. r.
Hello according to https://issues.apache.org/jira/browse/FLINK-4618 "FlinkKafkaConsumer09 should start from the next record on startup from offsets in Kafka". Is the same behavior expected of FlinkKafkaConsumer010? A document in Kafka is failing my job and I want on restart of the job (via the r

Re: Missing checkpoint when restarting failed job

2017-11-21 Thread Stefan Richter
Ok, thanks for trying to reproduce this. If possible, could you also activate trace-level logging for class org.apache.flink.runtime.state.SharedStateRegistry? In case the problem occurs, this would greatly help to understand what was going on. > Am 21.11.2017 um 15:16 schrieb gerardg : > >> w

Re: Missing checkpoint when restarting failed job

2017-11-21 Thread gerardg
> where exactly did you read many times that incremental checkpoints cannot reference files from previous > checkpoints, because we would have to correct that information. In fact, > this is how incremental checkpoints work. My fault, I read it in some other posts in the mailing list but now tha

Re: Docker and AWS taskmanager configuration

2017-11-21 Thread Patrick Lucas
Hi Colin, Is each instance's "unique-dns-address" equal to the hostname of the instance or is the hostname something else? If it's different from the hostname, you're correct in assuming you need to configure each node to advertise its unique-dns-address intead. Are the unique-dns-addresses alias

Impersonate user for hdfs

2017-11-21 Thread Vishal Santoshi
Hello folks, I need to write to hdfs as a proxy user ( I am running TM /JM as root but hive on our setup is under this proxy user and thus the permissions issue ) . I think that this is possible in Hadoop MR and I am sure this is a common request. Thanks Vishal

Re: Missing checkpoint when restarting failed job

2017-11-21 Thread Stefan Richter
Hi, where exactly did you read many times that incremental checkpoints cannot reference files from previous checkpoints, because we would have to correct that information. In fact, this is how incremental checkpoints work. Now for this case, I would consider it extremely unlikely that a checkpo

Re: How graceful shutdown or resource clean up happens in Flink at task level ?

2017-11-21 Thread sohimankotia
Thanks Stefan . -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: HTTP post request example for async IO

2017-11-21 Thread shashank agarwal
Hi, Yes, I got this point. Actually, i was looking for HTTP client suggestion. Like currently I am using http4s with the normal Map function. But I have seen users are facing issues in AsyncHttpClient. If anybody used that before. Thanks Shashank ‌ On Tue, Nov 21, 2017 at 7:17 PM, Stefan Rich

Re: HTTP post request example for async IO

2017-11-21 Thread Stefan Richter
Hi, did you see the example code in the documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/asyncio.html ? This code does async calls agains a database, but I think it s

How to Create Sample Data from HDFS File using Flink ?

2017-11-21 Thread sohimankotia
Hi, I have directory in HDFS containing 20 files with 150 Million records . I just want random 20 million records from that directory . (Sampled Data ). I see that there are few implementations are there in flink https://github.com/eBay/Flink/tree/master/flink-java/src/main/java/org/apache/flin

Re: How graceful shutdown or resource clean up happens in Flink at task level ?

2017-11-21 Thread Stefan Richter
Hi, the user function’s close() method is called in AbstractStreamOperator::close() and ::dispose(). The invocation of the user function’s close() in AbstractStreamOperator::dispose() only has an effect if there was no previous invocation of the method through AbstractStreamOperator::close().

Missing checkpoint when restarting failed job

2017-11-21 Thread gerardg
Hello, We have a task that fails to restart from a checkpoint with the following error: java.lang.IllegalStateException: Could not initialize keyed state backend. at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321) at

Re: Flink session on Yarn - ClassNotFoundException

2017-11-21 Thread nishutayal
Hi Albert, Your thread is really helpful for flink setup on HDInsight cluster. I followed the same and now I hit another error: *Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V at org.apache.hadoop.fs.adl.AdlConf

Error while setting up flink on HDInsight cluster

2017-11-21 Thread Nishu
Hi, I am trying to run flink on top on HDInsight cluster. So far I have added all libraries in classpath and set YARN_CONF_DIR, HADOOP_CONF_DIR and HADOOP_CLASSPATH. It's running on hadoop 2.7.3. When I run yarn-session.sh, It throws following error. *2017-11-21 13:07:01,946 INFO org.apache.ha

Re: Apache Atlas Bridge for Flink

2017-11-21 Thread Robert Metzger
Hi Stefan, I know this is an old email thread. Did you ever look more closely into an Atlas bridge for Flink? If so, I'd be happy to take a look at any prototypes. Regards, Robert On Thu, Feb 2, 2017 at 3:37 PM, Till Rohrmann wrote: > Hi Stefan, > > as far as I know, there is no Apache Atlas B

HTTP post request example for async IO

2017-11-21 Thread shashank agarwal
Hello, Is anybody has an example of HTTP post request using Async io? Shashank ‌

Re: Getting java.lang.ClassNotFoundException: for protobuf generated class

2017-11-21 Thread Nico Kruber
Hi Shankara, sorry for the late response, but honestly, I cannot think of a reason that some of your program's classes (using only a single jar file) are found some others are not, except for the class not being in the jar. Or there's some class loader issue in the Flink Beam runner (which I fin

Re: Problem with SQL-API and nested objects in case class

2017-11-21 Thread Timo Walther
Actually, your use case should be doable with Flink's Table & SQL API with some additional UDFs. The API can handle JSON objects if they are valid composite types and you can access arrays as well. The splitting might be a bit tricky in SQL, you could model it simply as a where() clause or mayb

Re: Hive integration in table API and SQL

2017-11-21 Thread Timo Walther
Hi, no, combining batch and streaming environments is not possible at the moment. However, most operations in batch can be done in streaming fashion as well. I would recommend to use the DataStream API as it provides the most flexibility in your use case. Regards, Timo Am 11/21/17 um 4:41

Re: Correlation between data streams/operators and threads

2017-11-21 Thread Shailesh Jain
Understood. Thanks a lot! I'll try out the keyBy approach first. Shailesh On Tue, Nov 21, 2017 at 1:53 PM, Piotr Nowojski wrote: > So as long as the parallelism of my kafka source and sink operators is 1, > all the subsequent operators (multiple filters to create multiple streams, > and then

Re: Correlation between data streams/operators and threads

2017-11-21 Thread Piotr Nowojski
> So as long as the parallelism of my kafka source and sink operators is 1, all > the subsequent operators (multiple filters to create multiple streams, and > then individual CEP and Process operators per stream) will be executed in the > same task slot? Yes, unless you specify different resou