extract fields from nested map

2016-07-22 Thread Pauline Yeung (yeungp)
I have a file, which each line is one json record I run the following val env = ExecutionEnvironment.getExecutionEnvironment val data = env.readTextFile("file:///somefile") .map(line => JSON.parseFull(line)) and get the following for one json record. For simplicity, the ke

Re: counting words (not frequency)

2016-07-22 Thread Sameer Wadkar
It is complicated: 1. If you have a file you should consider using the DataSet API. It is more complicated to use DataStream with files as you have to simulate a stream from a file. 2. You need a tokenizer for a map operator unless you have a word per line. 3. Sum operator is fine it will count

Re: counting words (not frequency)

2016-07-22 Thread Roshan Naik
Seems a bit convoluted for such a simple problem. I am thinking a custom streaming count() operator will simplify. Wasn¹t able to find examples for custom Streaming operators. -roshan On 7/21/16, 8:00 PM, "hrajaram" wrote: >Can't you use a KeyedStream, I mean keyBy with the sameKey? something

flink batch data processing

2016-07-22 Thread Paul Joireman
I'm evaluating for some processing batches of data. As a simple example say I have 2000 points which I would like to pass through an FIR filter using functionality provided by the Python scipy libraryjk. The scipy filter is a simple function which accepts a set of coefficients and the data to

FlinkShell with standalone HA cluster

2016-07-22 Thread Scott Clasen
Hi All- I am having trouble using the FlinkShell against a standalone HA cluster (recovery.mode: zookeeper) If I remove the zookeeper conf from flink-conf.yaml and restart the cluster, I can execute stuff from the shell just fine. (One master is running) Adding back the config, and restarti

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Stephan Ewen
Initializing in "open(Configuration)" means that the ObjectMapper is created only in the cluster once the MapFunction is started. Otherwise it is created before (on the client) and Serialization-copied into the cluster, together with the MapFunction. If the second approach works well (i.e., the O

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
declare objectMapper out of map class. final ObjectMapper objectMapper = new ObjectMapper(); source.map(str -> objectMapper.readValue(value, Request.class)); On Sat, Jul 23, 2016 at 12:28 AM, Yassin Marzouki wrote: > Thank you Stephan and Kim, that solved the problem. > Just to make sure, is u

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Yassin Marzouki
Thank you Stephan and Kim, that solved the problem. Just to make sure, is using a MapFunction as in the following code any different? i.e. does it initialize the objectMapper for every element in the stream? .map(new MapFunction() { private ObjectMapper objectMapper = new ObjectMapper();

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
oops. stephan already answered. sorry. T^T On Sat, Jul 23, 2016 at 12:16 AM, Dong iL, Kim wrote: > is open method signature right? or typo? > > void open(Configuration parameters) throws Exception; > > On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen wrote: > >> I think you overrode the open meth

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Dong iL, Kim
is open method signature right? or typo? void open(Configuration parameters) throws Exception; On Sat, Jul 23, 2016 at 12:09 AM, Stephan Ewen wrote: > I think you overrode the open method with the wrong signature. The right > signature would be "open(Configuration cfg) {...}". You probably over

Re: Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Stephan Ewen
I think you overrode the open method with the wrong signature. The right signature would be "open(Configuration cfg) {...}". You probably overlooked this because you missed the "@Override" annotation. On Fri, Jul 22, 2016 at 4:49 PM, Yassin Marzouki wrote: > Hi everyone, > > I want to convert a

Variable not initialized in the open() method of RichMapFunction

2016-07-22 Thread Yassin Marzouki
Hi everyone, I want to convert a stream of json strings to POJOs using Jackson, so I did the following: .map(new RichMapFunction() { private ObjectMapper objectMapper; public void open() { objectMapper = new ObjectMapper(); } @Override publi

AW: Getting the NumberOfParallelSubtask

2016-07-22 Thread Paschek, Robert
Hi Chesnay, hi Robert Thank you for your explanations : - ) (And sorry for the late reply). Regards, Robert Von: Robert Metzger [mailto:rmetz...@apache.org] Gesendet: Dienstag, 21. Juni 2016 12:12 An: user@flink.apache.org Betreff: Re: Getting the NumberOfParallelSubtask Hi Robert, the number

Re: State in external db (dynamodb)

2016-07-22 Thread Josh
Hi all, >(1) Only write to the DB upon a checkpoint, at which point it is known that no replay of that data will occur any more. Values from partially successful writes will be overwritten >with correct value. I assume that is what you thought of when referring to the State Backend, because in so

Re: Processing windows in event time order

2016-07-22 Thread Aljoscha Krettek
@Sameer, yes, if one source stops emitting watermarks then downstream operations will buffer data until the source starts updating the watermark again. If you can live with some data being late you could change the watermark logic in the source to start advancing the watermark if no new data is arr