Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-06 Thread AJ Heller
Thank you Fabian, I think that solves it. I'll need to rig up some tests to verify, but it looks good. I used a RichMapFunction to assign ids incrementally to windows (mapping STREAM_OBJECT to Tuple2 using a private long value in the mapper that increments on every map call). It works, but by any

Re: Csv data stream

2016-10-06 Thread drystan mazur
Hi Fabian, I am running on a IDE and the code runs all ok just no output from the datastream Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Csv-data-stream-tp9377p9384.html Sent from the Apache Flink User Mailing List archive. mail

Re: DataStream csv reading

2016-10-06 Thread drystan mazur
Hi Fabian, The program runs with no exceptions in an IDE but I can't see the datastream print messagesThanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-csv-reading-tp9376p9382.html Sent from the Apache Flink User Mailing List

Re: Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-06 Thread Fabian Hueske
Maybe this can be done by assigning the same window id to each of the N local windows, and do a .keyBy(windowId) .countWindow(N) This should create a new global window for each window id and collect all N windows. Best, Fabian 2016-10-06 22:39 GMT+02:00 AJ Heller : > The goal is: > * to split

Re: DataStream csv reading

2016-10-06 Thread drystan mazur
Hi Greg,The program runs with no exceptions in an IDE but I can't see the datastream print messagesThanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-csv-reading-tp9376p9381.html Sent from the Apache Flink User Mailing List arc

Re: DataStream csv reading

2016-10-06 Thread Fabian Hueske
Hi Greg, print is only eagerly executed for DataSet programs. In the DataStream API, print() just appends a print sink and execute() is required to trigger an execution. 2016-10-06 22:40 GMT+02:00 Greg Hogan : > The program executes when you call print (same for collect), which is why > you are

Re: DataStream csv reading

2016-10-06 Thread Greg Hogan
The program executes when you call print (same for collect), which is why you are seeing an error when calling execute (since there is no new job to execute). As Fabian noted, you'll need to look in the TaskManager log files for the printed output if running on a cluster. On Thu, Oct 6, 2016 at 4:

Merging N parallel/partitioned WindowedStreams together, one-to-one, into a global window stream

2016-10-06 Thread AJ Heller
The goal is: * to split data, random-uniformly, across N nodes, * window the data identically on each node, * transform the windows locally on each node, and * merge the N parallel windows into a global window stream, such that one window from each parallel process is merged into a "global wind

Re: Csv data stream

2016-10-06 Thread Fabian Hueske
Hi, how are you executing your code? From an IDE or on a running Flink instance? If you execute it on a running Flink instance, you have to look into the .out files of the task managers (located in ./log/). Best, Fabian 2016-10-06 22:08 GMT+02:00 drystan mazur : > Hello I am reading a csv file

DataStream csv reading

2016-10-06 Thread drystan mazur
HelloI am reading a csv file with flink 1.1.2 the file loads and runs but printing shows nothing ?The code runs ok I just wanted to view the datastream what I am doing wrong ?Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-c

Csv data stream

2016-10-06 Thread drystan mazur
HelloI am reading a csv file with flink 1.1.2 the file loads and runs but printi= ng shows nothing ? TupleCsvInputFormat oilDataIn; TupleTypeInfo> oildataTypes; BasicTypeInfo[] types =3D {BasicTypeInfo.STRING_TYPE_INFO,BasicTypeIn=fo.STRING_TYPE_INFO,BasicTypeInfo.STRING_TYPE_INFO,

Re: readCsvFile

2016-10-06 Thread Fabian Hueske
Hi Alberto, if you want to read a single column you have to wrap it in a Tuple1: val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv" ,includedFields = Array(1)) Best, Fabian 2016-10-06 20:59 GMT+02:00 Alberto Ramón : > I'm learning readCsvFile > (I discover if the file ends on "/n", yo

readCsvFile

2016-10-06 Thread Alberto Ramón
I'm learning readCsvFile (I discover if the file ends on "/n", you will return a null exception) *if I try to read only 1 column * val text4 = env.readCsvFile[String]("file:data.csv" ,includedFields = Array(1)) The error is: he type String has to be a tuple or pojo type. [null] *If I put >

Re: Flink Kafka Consumer Behaviour

2016-10-06 Thread Stephan Ewen
Hi! There was an issue in the Kafka 0.9 consumer in Flink concerning checkpoints. It was relevant mostly for lower-throughput topics / partitions. It is fixed in the 1.1.3 release. Can you try out the release candidate and see if that solves your problem? See here for details on the release candi

Listening to timed-out patterns in Flink CEP

2016-10-06 Thread David Koch
Hello, With Flink CEP, is there a way to actively listen to pattern matches that time out? I am under the impression that this is not possible. In my case I partition a stream containing user web navigation by "userId" to look for sequences of Event A, followed by B within 4 seconds for each user

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
I guess that this is caused by a bug in the checksum calculation. Let me check that. On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier wrote: > I've ran the job once more (always using the checksum branch) and this time > I got: > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Flavio Pompermaier
I've ran the job once more (always using the checksum branch) and this time I got: Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112 at org.apache.flink.api.common.typeutils.base.EnumSerializer.deserialize(EnumSerializer.java:83) at org.apache.flink.api.common.typeutils.base.EnumSeri

Re: AWS S3 and Reshift

2016-10-06 Thread Maximilian Michels
Flink has built-in CSV support. Please check the documentation. There is not Redshift connector in Flink. You will have to use JDBC or create your own connector. -Max On Thu, Oct 6, 2016 at 11:14 AM, sunny patel wrote: > I have check the forum; I couldn’t find the stuffs that i am looking for

Re: Sensor's Data Aggregation Out of order with EventTime.

2016-10-06 Thread Maximilian Michels
This is a common problem in Event Time which is referred to as late data. You can a) change the Watermark generation code 2) Allow elements to be late and re-trigger a window execution. For 2) see https://ci.apache.org/projects/flink/flink-docs-master/dev/windows.html#dealing-with-late-data -Max

Re: Is it a good practice to create lots of jobs in a flink cluster ?

2016-10-06 Thread Maximilian Michels
It is safe to run multiple jobs concurrently in a Flink cluster. If we care about resource isolation, you are probably better off with Flink on Yarn. That said, we're moving away from this model in FLIP-6 towards a cluster-per-job model: https://cwiki.apache.org/confluence/pages/viewpage.action?pa

Re: Blobstorage Locally and on HDFS

2016-10-06 Thread Maximilian Michels
Hi Konstantin, This looks fine. Generally it is fine to delete Blobs in /tmp once the Job is running or has finished. When the job is running, the Flink classloader has already opened these files. Thus, the file system will still have these available through the file descriptor and defer deletion

Re: AWS S3 and Reshift

2016-10-06 Thread sunny patel
I have check the forum; I couldn’t find the stuffs that i am looking for long time on how to write the data from CSV to database in fink Scala. I have tried the Rich/sink function it appears not working Literally we are anticipating for the code how to sink the data into database or any suppor

Re: How to stop job through java API

2016-10-06 Thread Maximilian Michels
Just to add to your solution, you can actually use the ClusterClient (StandaloneClusterClient or YarnClusterClient) to achieve that. For example, ClusterClient client = new StandaloneClusterClient(config); JobID jobID = JobID.fromHexString(jobID); client.cancelJob(jobID); Cheers, Max On Mon, Oct

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
Yes, if that's the case you should go with option (2) and run with the checksums I think. On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier wrote: > The problem is that data is very large and usually cannot run on a single > machine :( > > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote: >>

Sensor's Data Aggregation Out of order with EventTime.

2016-10-06 Thread Steve
Hi all, We have some sensors that sends data into kafka. Each kafka partition have a set of deferent sensor writing data in it. We consume the data from flink. We want to SUM up the values in half an hour intervals in eventTime(extract from data). The result is a keyed stream by sensor_id with ti

Re: Processing events through web socket

2016-10-06 Thread Maximilian Michels
Like Fabian said, there is no built-in solution. If you want to use encryption over a socket, you will have to implement your own socket source using an encryption library. On Wed, Oct 5, 2016 at 10:03 AM, Fabian Hueske wrote: > Hi, > > the TextSocketSink is rather meant for demo purposes than to

Re: AWS S3 and Reshift

2016-10-06 Thread Maximilian Michels
I think there have been multiple threads like this one. Please see the thread "Flink scala or Java - Dataload from CSV to database" On Wed, Oct 5, 2016 at 8:15 AM, sunny patel wrote: > Hi Guys, > > We are in the process of creating Proof of concept., > I am looking for the sample project - Flink

Re: Side Inputs vs. Connected Streams

2016-10-06 Thread Maximilian Michels
> Will the above ensure that dsSide is always available before ds3 elements > arrive on the connected stream. Am I correct in assuming that ds2 changes > will continue to be broadcast to ds3 (with no ordering guarantees between ds3 > and dsSide, ofcourse). Broadcasting is just a partition strat

Is it a good practice to create lots of jobs in a flink cluster ?

2016-10-06 Thread Christophe Julien
Hi, I wonder if create and delete lots of jobs in a cluster can be a problem ? >From my point of view the impact will be limited, because the task Manager won't be impacted during the process, it is only a task slot available or not. But is it really "free" or a good practice to do that ? Especia

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Flavio Pompermaier
The problem is that data is very large and usually cannot run on a single machine :( On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote: > On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh > wrote: > > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. > Slots > > per task mana

Re: Exception while running Flink jobs (1.0.0)

2016-10-06 Thread Ufuk Celebi
On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh wrote: > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. Slots > per task manager: 2-4 (I tried varying this to see if this has any impact). > Network buffers: 5k - 20k (tried different values for it). Could you run the job fir