Re: Couldn't figure out - How to do this in Flink? - Pls assist with suggestions

2019-02-10 Thread Chesnay Schepler
You should be able to use a KeyedProcessFunction for that. Find matching elements via keyBy() on the first field. Aggregate into ValueState, send alert if necessary. Upon encoun

Re: Reduce one event under multiple keys

2019-02-10 Thread Chesnay Schepler
This sounds reasonable to me. I'm a bit confused by this question: "/Additionally, I am (naïevely) hoping that if a window has no events for a particular key, the memory/storage costs are zero for that key./" Are you asking whether a key that was received in window X (as part of an event) is

Re: Flink Standalone cluster - logging problem

2019-02-10 Thread Chesnay Schepler
What exactly are you expecting to happen? On 08.02.2019 15:06, simpleusr wrote: We are using standalone cluster and submittig jobs through command line client. As stated in https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/logging.html , we are editing log4j-cli.properties but t

Re: Flink Standalone cluster - dumps

2019-02-10 Thread Chesnay Schepler
1) Setting the slot size to 1 can be used as a work-around. I'm not aware of another solution for standalone clusters. In the YARN/Kubernetes world we support the notion of a "job cluster", which is started and run only for a single job, but this isn't supported in standalone mode. 2) None tha

Re: Running single Flink job in a job cluster, problem starting JobManager

2019-02-10 Thread Chesnay Schepler
I'm afraid we haven't had much experience with Spring Boot Flink applications. It is indeed strange that the job ends up using a StreamPlanEnvironment. As a debugging step I would look into all calls to ExecutionEnviroment#initializeContextEnvironment(). This is how specific execution environme

Re: Help with a stream processing use case

2019-02-10 Thread Chesnay Schepler
I'll need someone else to chime in here for a definitive answer (cc'd Gordon), so I'm really just guessing here. For the partitioning it looks like you can use a custom partitioner, see FlinkKinesisProducer#setCustomPartitioner. Have you looked at the KinesisSerializationSchema described in the

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-10 Thread Chesnay Schepler
There are also versions of WindowedStream#aggregate that accept an additional WindowFunction/ProcessWindowFunction, which do have access to the key via apply()/process() respectively. These functions are called post aggregation. On 08.02.2019 18:24, stephen.alan.conno...@gmail.com wrote: If I

Re: stream of large objects

2019-02-10 Thread Chesnay Schepler
The Broadcast State may be interesting to you. On 08.02.2019 15:57, Aggarwal, Ajay wrote: Yes, another KeyBy will be used. The “small size” messages will be strings of l

fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Vishal Santoshi
Can StreamingFileSink be used instead of https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/filesystem_sink.html, even though it looks it could. This code for example StreamingFileSink .forRowFormat(new Path(PATH), new Simp

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Timothy Victor
I think the only rolling policy that can be used is CheckpointRollingPolicy to ensure exactly once. Tim On Sun, Feb 10, 2019, 9:13 AM Vishal Santoshi Can StreamingFileSink be used instead of > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/filesystem_sink.html, > ev

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Vishal Santoshi
Thanks for the quick reply. I am confused. If this was a more full featured BucketingSink ,I would imagine that based on shouldRollOnEvent and shouldRollOnEvent, an in progress file could go into pending phase and on checkpoint the pending part file would be finalized. For exactly once any files

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Vishal Santoshi
That said the in the DefaultRollingPolicy it seems the check is on the file size ( mimics the check shouldRollOnEVent()). I guess the question is Is the call to shouldRollOnCheckPoint. done by the checkpointing thread ? Are the calls to the other 2 methods shouldRollOnEVent and shouldRollOnPro

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Timothy Victor
My apologies for not seeing your use case properly. The constraint on rolling policy is only applicable for bulk formats such as Parquet as highlighted in the docs. As for your questions, I'll have to defer to others more familiar with it. I mostly just use bulk formats such as avro and parquet

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Vishal Santoshi
You don't have to. Thank you for the input. On Sun, Feb 10, 2019 at 1:56 PM Timothy Victor wrote: > My apologies for not seeing your use case properly. The constraint on > rolling policy is only applicable for bulk formats such as Parquet as > highlighted in the docs. > > As for your questions

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-10 Thread Stephen Connolly
On Sun, 10 Feb 2019 at 09:48, Chesnay Schepler wrote: > There are also versions of WindowedStream#aggregate that accept an > additional WindowFunction/ProcessWindowFunction, which do have access to > the key via apply()/process() respectively. These functions are called > post aggregation. > Coo

Re: Reduce one event under multiple keys

2019-02-10 Thread Stephen Connolly
On Sun, 10 Feb 2019 at 09:09, Chesnay Schepler wrote: > This sounds reasonable to me. > > I'm a bit confused by this question: "*Additionally, I am (naïevely) > hoping that if a window has no events for a particular key, the > memory/storage costs are zero for that key.*" > > Are you asking wheth

Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
I would like to process a stream of data firom different customers, producing output say once every 15 minutes. The results will then be loaded into another system for stoage and querying. I have been using TumblingEventTimeWindows in my prototype, but I am concerned that all the windows will star

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
Looking into the code in TumblingEventTimeWindows: @Override public Collection assignWindows(Object element, long timestamp, WindowAssignerContext context) { if (timestamp > Long.MIN_VALUE) { // Long.MIN_VALUE is currently assigned when no timestamp is present long start = TimeWindow.getWindowStar

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Vishal Santoshi
Any one ? On Sun, Feb 10, 2019 at 2:07 PM Vishal Santoshi wrote: > You don't have to. Thank you for the input. > > On Sun, Feb 10, 2019 at 1:56 PM Timothy Victor wrote: > >> My apologies for not seeing your use case properly. The constraint on >> rolling policy is only applicable for bulk for

Re: Help with a stream processing use case

2019-02-10 Thread Tzu-Li (Gordon) Tai
Hi, If Firehouse already supports sinking records from a Kinesis stream to an S3 bucket, then yes, Chesnay's suggestion would work. You route each record to a specific Kinesis stream depending on some value in the record using the KinesisSerializationSchema, and sink each Kinesis stream to their

Flink 1.7 Notebook Environment

2019-02-10 Thread Faizan Ahmed
Hi all, I have been searching around quite a bit and doing my own experiments to make the latest Flink release 1.7.1 to work with Apache Zeppelin however Apache Zeppelin's Flink interpreter is quite outdated (Flink 1.3). AFAIK its not possible to use Flink running on YARN via Zeppelin as it only wo

Re: Flink 1.7 Notebook Environment

2019-02-10 Thread Jeff Zhang
Hi Faizan, I have implemented one flink interpreter for blink which is donated by alibaba to flink community recently. Maybe you notice this news recently. Here's some tutorials which you may be interested. https://flink-china.org/doc/blink/ops/zeppelin.html https://flink-china.org/doc/blink/qui

Flink Standalone cluster - production settings

2019-02-10 Thread simpleusr
I know this seems a silly question but I am trying to figure out optimal set up for our flink jobs. We are using standalone cluster with 5 jobs. Each job has 3 asynch operators with Executors with thread counts of 20,20,100. Source is kafka and cassandra and rest sinks exist. Currently we are usin

Re: Flink Standalone cluster - dumps

2019-02-10 Thread simpleusr
Hi Chesnay, Many thanks.. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Couldn't figure out - How to do this in Flink? - Pls assist with suggestions

2019-02-10 Thread Titus Rakkesh
Thanks Chesnay. I will try that and let you know. Thanks. On Sun, Feb 10, 2019 at 2:31 PM Chesnay Schepler wrote: > You should be able to use a KeyedProcessFunction > for > tha

Re: Flink Standalone cluster - logging problem

2019-02-10 Thread simpleusr
Hi Chesnay, below is the content for my log4j-cli.properties file. I expect my job logs (packaged under com.mycompany.xyz to be written to file2 appender. However no file generated with prefix XYZ. I restarted the cluster , canceled resubmitted several times but none of them helped. / log4j.root