Logical plan optimization with Calcite

2016-07-19 Thread gallenvara
Hello, everyone. I'm new to Calcite and have some problems with it. Flink uses the Calcite to parse the sql and construct ast and logical plan. Would the plan be optimized by caicite? For example, multi filter condition: val ds = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c) v

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread milind parikh
It likely does not make sense to publish a file ( "batch data") into Kafka; unless the file is very small. An improvised pub-sub mechanism for Kafka could be to (a) write the file into a persistent store outside of kafka (b) publishing of a message into Kafka about that write so as to enable proce

Using Kafka and Flink for batch processing of a batch data source

2016-07-19 Thread Leith Mudge
I am currently working on an architecture for a big data streaming and batch processing platform. I am planning on using Apache Kafka for a distributed messaging system to handle data from streaming data sources and then pass on to Apache Flink for stream processing. I would also like to use Fli

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Thanks a ton, Till. That worked. Thank you so much. -Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-t-access-Flink-Dashboard-at-8081-running-Flink-program-using-Eclipse-tp8016p8035.html Sent from the Apache Flink User Mailing Li

Re: Aggregate events in time window

2016-07-19 Thread Till Rohrmann
Hi Dominique, your problem sounds like a good use case for session windows [1, 2]. If you know that there is only a maximum gap between your request and response message, then you could create a session window via: input .keyBy("ReqRespID") .window(EventTimeSessionWindows.withGap(Time.mi

Re: DataStreamUtils not working properly

2016-07-19 Thread subash basnet
Hello Till, Yup I can see the log output in my console, but there is no information there regarding if there is any error in conversion. Just normal warn and info as below: 22:09:16,676 WARN org.apache.flink.streaming.runtime.tasks.StreamTask - No state backend has been specified, using def

Re: Aggregate events in time window

2016-07-19 Thread Sameer W
How about using EventTime windows with watermark assignment and bounded delays. That way you allow more than 5 minutes (bounded delay) for your request and responses to arrive. Do you have a way to assign timestamp to the responses based on the request timestamp (does the response contain the reque

Aggregate events in time window

2016-07-19 Thread Dominique Rondé
Hi all, once again I need a "kick" to the right direction. I have a datastream with request and responses identified by an ReqResp-ID. I like to calculate the (avg, 95%, 99%) time between the request and response and also like to count them. I thought of ".keyBy("ReqRespID").timeWindowAll(Tim

Re: Unable to get the value of datatype in datastream

2016-07-19 Thread subash basnet
Hello Aljoscha Krettek, Thank you. As you suggested, I changed my code as below: *snippet 1:* DataStream centroids = newCentroidDataStream.map(new TupleCentroidConverter()); ConnectedIterativeStreams loop = points.iterate().withFeedbackType(Centroid.class); DataStream newCentroids = loop.flatMap(n

Re: DataStreamUtils not working properly

2016-07-19 Thread Till Rohrmann
It depends if you have a log4j.properties file specified in your classpath. If you see log output on the console, then it should also print errors there. Cheers, Till On Tue, Jul 19, 2016 at 3:08 PM, subash basnet wrote: > Hello Till, > > Shouldn't it write something in the eclipse console if t

Re: Parallelizing openCV libraries in Flink

2016-07-19 Thread Debaditya Roy
Hello, I cannot have an access to the web interface from the nodes I am using. However I will check the logs for anything suspicious and get back. Thanks :-) Regards, Debaditya On Tue, Jul 19, 2016 at 4:46 PM, Till Rohrmann wrote: > Hi Debaditya, > > you can see in the web interface how much d

Re: Parallelizing openCV libraries in Flink

2016-07-19 Thread Till Rohrmann
Hi Debaditya, you can see in the web interface how much data each source has sent to the downstream tasks and how much data was consumed by the sinks. This should tell you whether your sources have actually read some data. You can also check the log files whether you find anything suspicious there

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Till Rohrmann
Hi Biplob, if you want to start the web interface from within your IDE, then you have to create a local execution environment as Ufuk told you: Configuration config = new Configuration(); config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); StreamExecutionEnvironment env = StreamExecu

Re: Intermediate Data Caching

2016-07-19 Thread Saliya Ekanayake
Thank you, Ufuk! On Tue, Jul 19, 2016 at 5:51 AM, Ufuk Celebi wrote: > PS: I forgot to mention that also constant iteration input is cached. > > On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi wrote: > > Hey Saliya, > > > > the result of each iteration (super step) that is fed back to the > > ite

Re: DataStreamUtils not working properly

2016-07-19 Thread subash basnet
Hello Till, Shouldn't it write something in the eclipse console if there is any error or warning. But nothing about error is printed on the console. And I checked the flink project folder: flink-core, flink streaming as such but couldn't find where the log is written when run via eclipse. Best Re

Re: DataStreamUtils not working properly

2016-07-19 Thread Till Rohrmann
Have you checked your logs whether they contain some problems? In general it is not recommended collecting the streaming result back to your client. It might also be a problem with `DataStreamUtils.collect`. Cheers, Till On Tue, Jul 19, 2016 at 2:42 PM, subash basnet wrote: > Hello all, > > I t

Re: DataStreamUtils not working properly

2016-07-19 Thread subash basnet
Hello all, I tried to check if it works for tuple but same problem, the collection still shows blank result. I took the id of centroid tuple and printed it, but the collection displays empty. DataStream centroids = newCentroidDataStream.map(new TupleCentroidConverter()); DataStream> centroidId =

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Sameer W
Yes you have to provide the path of your jar. The reason is: 1. When you start in the pseudo-cluster mode the tasks are started in their own JVM's with their own class loader. 2. You client program has access to your custom operator classes but the remote JVM's don't. Hence you need to ship the JAR

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Thanks Ufuk, for the input. I tried what u suggested as well ( as follows) Configuration config = new Configuration(); config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); StreamExecutionEnvironment env = StreamExecutionEnvironment.crea

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Ufuk Celebi
You can explicitly create a LocalEnvironment and provide a Configuration: Configuration config = new Configuration(); config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); ExecutionEnvironment env = new LocalEnvironment(config); ... On Tue, Jul 19, 2016 at 1:28 PM, Sameer W wrote: >

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Hi Sameer, Thanks for that quick reply, I was using flink streaming so the program keeps on running until i close it. But anyway I am ready to try this getRemoteExecutionEnvironment(), I checked but it ask me for the jar file, which is weird because I am running the program directly. Does it mea

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Sameer W
>From Eclipse it creates a local environment and runs in the IDE. When the program finishes so does the Flink execution instance. I have never tried accessing the console when the program is running but one the program is finished there is nothing to connect to. If you need to access the dashboard

Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Biplob Biswas
Hi, I am running my flink program using Eclipse and I can't access the dashboard at http://localhost:8081, can someone help me with this? I read that I need to check my flink-conf.yaml, but its a maven project and I don't have a flink-conf. Any help would be really appreciated. Thanks a lot Bip

Re: Data point goes missing within iteration

2016-07-19 Thread Biplob Biswas
Hi Ufuk, Thanks for the update, is there any known way to fix this issue? Any workaround that you know of, which I can try? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8015.html Sent from t

Re: Unable to get the value of datatype in datastream

2016-07-19 Thread Aljoscha Krettek
Hi, you have to ensure to filter the data that you send back on the feedback edge, i.e. the loop.closeWith(newCentroids.broadcast()); statement needs to take a stream that only has the centroids that you want to send back. And you need to make sure to emit centroids with a good timestamp if you wan

Re: Intermediate Data Caching

2016-07-19 Thread Ufuk Celebi
PS: I forgot to mention that also constant iteration input is cached. On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi wrote: > Hey Saliya, > > the result of each iteration (super step) that is fed back to the > iteration is cached. For the iterate operator that is the last partial > solution and fo

Re: Data point goes missing within iteration

2016-07-19 Thread Ufuk Celebi
Unfortunately, no. It's expected for streaming iterations to loose data (known shortcoming), but I don't see why they never see the initial input. Maybe Gyula or Paris (they worked on this previously) can chime in. – Ufuk On Tue, Jul 19, 2016 at 10:15 AM, Biplob Biswas wrote: > Hi Ufuk, > > Did

Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Maximilian Michels
Hi! HDFS is mentioned in the docs but not explicitly listed as a requirement: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/python.html#project-setup I suppose the Python API could also distribute its libraries through Flink's BlobServer. Cheers, Max On Tue, Jul 19, 2016 at

Re: Data point goes missing within iteration

2016-07-19 Thread Biplob Biswas
Hi Ufuk, Did you get time to go through my issue, just wanted to follow up to see whether I can get a solution or not. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-point-goes-missing-within-iteration-tp7776p8010.html Sent from the Ap

DataStreamUtils not working properly

2016-07-19 Thread subash basnet
Hello all, I am trying to convert datastream to collection, but it's shows blank result. There is a stream of data which can be viewed on the console on print(), but the collection of the same stream shows empty after conversion. Below is the code: DataStream centroids = newCentroidDataStream.map

Parallelizing openCV libraries in Flink

2016-07-19 Thread Debaditya Roy
Hello users, I am currently doing a project in image processing with Open CV library. Have anyone here faced any issue with parallelizing the library in flink? I have written a code which is running fine on local environment, however when I try to run it in distributed environment it writes (it wa

Re: Error using S3a State Backend: Window Operators sending directory instead of fully qualified file?

2016-07-19 Thread Ufuk Celebi
Feel free to do the contribution at any time you like. We can also always make it part of a bugfix release if it does not make it into the upcoming 1.1 RC (probably end of this week or beginning of next). Feel free to ping me if you need any feed back or pointers. – Ufuk On Mon, Jul 18, 2016 at

Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Chesnay Schepler
Glad to hear it! The HDFS requirement should most definitely be documented; i assumed it already was actually... On 19.07.2016 03:42, Geoffrey Mon wrote: Hello Chesnay, Thank you very much! With your help I've managed to set up a Flink cluster that can run Python jobs successfully. I solved m