Error when accessing secure HDFS with standalone Flink

2016-03-10 Thread Stefano Baghino
Hello everybody, me and my colleagues have been running some tests on Flink 1.0.0 in a secure environment (Kerberos). Yesterday we did several tests on the standalone Flink deployment but couldn't get it to access HDFS. Judging from the error it looks like Flink is not trying to authenticate itsel

Re: DataSet -> DataStream

2016-03-10 Thread Ashutosh Kumar
As data is already collected, why do you want add one more layer of Kafka. Instead you can start processing your data. Thanks Ashutosh On Mar 11, 2016 4:19 AM, "Prez Cannady" wrote: > > I’d like to pour some data I’ve collected into a DataSet via JDBC into a > Kafka topic, but I think I need to t

Re: 404 error for Flink examples

2016-03-10 Thread janardhan shetty
Thanks Balaji. This needs to be updated in the Job.java file of quickstart application. I am using 1.0 version On Thu, Mar 10, 2016 at 9:23 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > You could try this link. > > > https://ci.apache.org/projects/flink/flink-docs-master/apis

Re: 404 error for Flink examples

2016-03-10 Thread Balaji Rajagopalan
You could try this link. https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/examples.html On Fri, Mar 11, 2016 at 9:56 AM, janardhan shetty wrote: > Hi, > > I was looking at the examples for Flink applications and the comment in > quickstart/job results in 404 for the web page. >

Re: DataSet -> DataStream

2016-03-10 Thread Balaji Rajagopalan
You could I suppose write the dateset to a sink a file and then read the file to a data stream. On Fri, Mar 11, 2016 at 4:18 AM, Prez Cannady wrote: > > I’d like to pour some data I’ve collected into a DataSet via JDBC into a > Kafka topic, but I think I need to transform my DataSet into a DataS

404 error for Flink examples

2016-03-10 Thread janardhan shetty
Hi, I was looking at the examples for Flink applications and the comment in quickstart/job results in 404 for the web page. http://flink.apache.org/docs/latest/examples.html This needs to be updated

DataSet -> DataStream

2016-03-10 Thread Prez Cannady
I’d like to pour some data I’ve collected into a DataSet via JDBC into a Kafka topic, but I think I need to transform my DataSet into a DataStream first. If anyone has a clue how to proceed, I’d appreciate it; or let me know if I’m completely off track. Prez Cannady p: 617 500 3378 e: re

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
After some poking around I noticed that flink-connector-elasticsearch_2.10-1.0.0.jar contains shaded asm classes. If I remove that dependency from my project then I do not get the IllegalArgumentException. On Thu, Mar 10, 2016 at 11:51 AM Zach Cox wrote: > Here are the jars on the classpath whe

Re: Checkpoint

2016-03-10 Thread Vijay Srinivasaraghavan
Thanks Ufuk and Stephan. I have added Identity mapper and disabled chaining. With that, I am able to see the backpressue alert on the identify mapper task. I have noticed one thing that when I introduced delay (sleep) on the subsequent task, sometimes checkpoint is not working. I could see check

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Zach Cox
Here are the jars on the classpath when I try to run our Flink job in a local environment (via `sbt run`): https://gist.githubusercontent.com/zcox/0992aba1c517b51dc879/raw/7136ec034c2beef04bd65de9f125ce3796db511f/gistfile1.txt There are many transitive dependencies pulled in from internal library

Re: Flink loading an S3 File out of order

2016-03-10 Thread Fabian Hueske
Hi Benjamin, Flink reads data usually in parallel. This is done by splitting the input (e.g., a file) into several input splits. Each input split is independently processed. Since splits are usually concurrently processed by more than one task, Flink does not care about the order by default. You

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Stephan Ewen
The following issue should track that. https://issues.apache.org/jira/browse/FLINK-3602 @Niels: Thanks for looking into this. At this point, I think it may actually be a Flink issue, since it concerns the interaction of Avro and Flink's TypeInformation. On Thu, Mar 10, 2016 at 6:00 PM, Stephan Ew

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Stephan Ewen
Hi! I think that is a TypeExtractor bug. It may actually be a bug for all recursive types. Let's check this and come up with a fix... Greetings, Stephan On Thu, Mar 10, 2016 at 4:11 PM, David Kim wrote: > Hello! > > Just wanted to check up on this again. Has anyone else seen this before or >

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread Niels Basjes
Hi, Please try to reproduce the problem in a simple (commandline) Java application (i.e. without Flink and such, just Avro). If you can reproduce it with Avro 1.8.0 then please file a bug report (preferable with the simplest reproduction path you can come up with) via. https://issues.apache.org/ji

Fwd: Flink loading an S3 File out of order

2016-03-10 Thread Benjamin Kadish
I am trying to read a file from S3 in the correct order. It seems to be that Flink is downloading the file out of order, or at least its constructing the DataSet out of order. I tried using hadoop to download the file and it seemed to download it in order. I am able to reproduce the problem with th

Re: TaskManager unable to register with JobManager

2016-03-10 Thread Ufuk Celebi
Hey Ravinder, check out the following config keys: blob.server.port taskmanager.rpc.port taskmanager.data.port – Ufuk On Wed, Feb 10, 2016 at 4:06 PM, Ravinder Kaur wrote: > Hello Fabian, > > Thank you very much for the resource. I had already gone through this and > have found port '6123' a

Re: Stack overflow from self referencing Avro schema

2016-03-10 Thread David Kim
Hello! Just wanted to check up on this again. Has anyone else seen this before or have any suggestions? Thanks! David On Tue, Mar 8, 2016 at 12:12 PM, David Kim wrote: > Hello all, > > I'm running into a StackOverflowError using flink 1.0.0. I have an Avro > schema that has a self reference. F

RE: operators

2016-03-10 Thread Radu Tudoran
Hi, It would not be feasible actually to use kafka queues or the DFS. Could you point me at which level of API I could access the CoLocationConstraint? Is it accessible from the DataSourceStream or from the operator directly? I have also dig through the documentation and API and I was curious

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Zach Cox
I see the new Event Time docs page, thanks for fixing that! I like the additional explanation of event time and watermarks. I also updated our TimestampExtractors to AssignerWithPeriodicWatermarks as described in [1]. I like the separation between periodic and punctuated watermark assigners in the

Re: asm IllegalArgumentException with 1.0.0

2016-03-10 Thread Stephan Ewen
Dependency shading changed a bit between RC4 and RC5 - maybe a different minor ASM version is now included in the "test" scope. Can you share the dependencies of the problematic project? On Thu, Mar 10, 2016 at 12:26 AM, Zach Cox wrote: > I also noticed when I try to run this application in a l

Re: Checkpoint

2016-03-10 Thread Stephan Ewen
Just to be sure: Is the task whose backpressure you want to monitor the Kafka Source? There is an open issue that backpressure monitoring does not work for the Kafka Source: https://issues.apache.org/jira/browse/FLINK-3456 To circumvent that, add an "IdentityMapper" after the Kafka source, make s

Re: Checkpoint

2016-03-10 Thread Robert Metzger
Hi Vijay, regarding your other questions: 1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions they are going to read in the log. There is currently no way of seeing the state of a checkpoint in Flink (which is the offsets). However, once a checkpoint is completed, the Kafka

Re: Checkpoint

2016-03-10 Thread Ufuk Celebi
How many vertices does the web interface show and what parallelism are you running? If the sleeping operator is chained you will not see anything. If your goal is to just see some back pressure warning, you can call env.disableOperatorChaining() and re-run the program. Does this work? – Ufuk On

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
Just removed the page. Triggering a new docs build... On Thu, Mar 10, 2016 at 10:22 AM, Aljoscha Krettek wrote: > Then Stephan should have removed the old doc when adding the new one… :-) >> On 10 Mar 2016, at 10:20, Ufuk Celebi wrote: >> >> Just talked with Stephan: the document you are referri

Re: streaming job reading from kafka stuck while cancelling

2016-03-10 Thread Ufuk Celebi
Hey Maciek! I'm working on the other proposed fix by closing the buffer pool early. I expect the fix to make it into the next bugfix release 1.0.1 (or 1.0.2 if 1.0.1 comes very soon). – Ufuk

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Aljoscha Krettek
Then Stephan should have removed the old doc when adding the new one… :-) > On 10 Mar 2016, at 10:20, Ufuk Celebi wrote: > > Just talked with Stephan: the document you are referring to is stale. > Can you check out this one here: > https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
Just talked with Stephan: the document you are referring to is stale. Can you check out this one here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/event_time.html – Ufuk On Thu, Mar 10, 2016 at 10:17 AM, Ufuk Celebi wrote: > I've added this to the migration guide

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Ufuk Celebi
I've added this to the migration guide here: https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.10.x+to+1.0.x Feel free to add any other API changes that are missing there. – Ufuk On Thu, Mar 10, 2016 at 10:13 AM, Aljoscha Krettek wrote: > Hi, > you’re right, this should be

Re: ExecutionConfig.enableTimestamps() in 1.0.0

2016-03-10 Thread Aljoscha Krettek
Hi, you’re right, this should be changed to “setStreamTimeCharacteristic(EventTime)” in the doc. Cheers, Aljoscha > On 09 Mar 2016, at 23:21, Zach Cox wrote: > > Hi - I'm upgrading our app to 1.0.0 and noticed ExecutionConfig no longer has > an enableTimestamps() method. Do we just not need t

Running Flink 1.0.0 on YARN

2016-03-10 Thread Ashutosh Kumar
I have a yarn setup with 1 master and 2 slaves. When I run yarn session with bin/yarn-session.sh -n 2 -jm 1024 -tm 1024 and submit job with bin/flink run examples/batch/WordCount.jar , the job succeeds . It shows status on yarn UI http://x.x.x.x:8088/cluster . However it does not show anythin