Re: Trouble with large state

2020-06-18 Thread Timothy Victor
I had a similar problem. I ended up solving by not relying on checkpoints for recovery and instead re-read my input sources (in my case a kafka topic) from the earliest offset and rebuilding only the state I need. I only need to care about the past 1 to 2 days of state so can afford to drop anyt

RocksDB

2020-03-10 Thread Timothy Victor
Can the RocksDB state backend used by Flink be queries from outside, e.g. via SQL? Or maybe a better question, is there a RocksDB SinkFunction that exists? Thanks Tim

Re: AW: How Flink Kafka Consumer works when it restarts

2020-02-13 Thread Timothy Victor
What are the pros and cons of Kafka offset keeping vs Flink offset keeping? Is one more reliable than the other? Personally I prefer having flink manage it due to it being intrinsically tied to its checkpointing mechanism. But interested to learn from others experiences. Thanks Tim On Thu, F

Flink TaskManager Memory

2019-12-26 Thread Timothy Victor
For Streaming Jobs that use RocksDB my understanding is that state is allocated off-year via RocksDB. If this is true then does it still make sense to leave 70% (default taskmanager.memory.fraction) of the heap for Flink Manged memory given that it is likely not being used for state?Or am I mi

Re: Processing post to sink?

2019-12-14 Thread Timothy Victor
Why not implement your own SinkFunction, or maybe inherit from the one you are using now? Tim On Sat, Dec 14, 2019, 8:52 AM Theo Diefenthal < theo.diefent...@scoop-software.de> wrote: > Hi there, > > In my pipeline, I write data into a partitioned parquet directory via > StreamingFileSink and a

Re: Flink Filters have state?

2019-11-08 Thread Timothy Victor
jects/flink/flink-docs-stable/dev/stream/state/state.html > > Cheers, > Till > > On Thu, Nov 7, 2019 at 2:11 PM Timothy Victor wrote: > >> I have a FilterFunction implementation which accepts an argument in its >> constructor which it stores as an instance memb

Flink Filters have state?

2019-11-07 Thread Timothy Victor
I have a FilterFunction implementation which accepts an argument in its constructor which it stores as an instance member.For example: class ThresholdFilter implements FilterFunction { private final MyThreshold threshold; private int numElementsSeen; public ThresholdFilter(MyThreshol

Re: Submitting jobs via REST

2019-10-22 Thread Timothy Victor
n, Oct 21, 2019, 10:37 AM Pritam Sadhukhan wrote: > Can you please share your dockerfile? > Please upload your jar at /opt/flink/product-libs/flink-web-upload/. > > Regards, > Pritam. > > On Mon, 21 Oct 2019 at 19:58, Timothy Victor wrote: > >> Thanks Pritam. >>

Re: Submitting jobs via REST

2019-10-21 Thread Timothy Victor
ost:8081/jars/28f05eb0-9aab-4a18-ae66-f1e10970c11f_soar-ueba-training-service.jar/run> > > with the request parameters if any. > > > Please let me know if this helps. > > > Regards, > > Pritam. > > On Sat, 19 Oct 2019 at 20:06, Timothy Victor wrote: > >> I

Submitting jobs via REST

2019-10-19 Thread Timothy Victor
I have a flink docker image with my job's JAR already contained within. I would like to run a job with this jar via the REST api. Is that possible? I know I can run a job via REST using JarID (ID assigned by flink when a jar is uploaded). However I don't have such an ID since this jar is alrea

Re: Batch Job in a Flink 1.9 Standalone Cluster

2019-10-14 Thread Timothy Victor
https://bugs.openjdk.java.net/browse/JDK-8146436 > > Roman Grebennikov | g...@dfdx.me > > > On Sat, Oct 12, 2019, at 08:38, Timothy Victor wrote: > > This part about the GC not cleaning up after the job finishes makes > sense. However, I o served that even after I run

Re: Batch Job in a Flink 1.9 Standalone Cluster

2019-10-12 Thread Timothy Victor
ask manager that keeps the buffer to be used for the next > batch job. When the new batch job is running, the task executor allocates > new buffers, which will use the memory of the previous buffer that jvm > haven't released. > > Thank you~ > > Xintong Song > > >

Re: Batch Job in a Flink 1.9 Standalone Cluster

2019-10-11 Thread Timothy Victor
times flink/jvm do not release memory after > jobs/tasks finished, so that it can be reused directly by other jobs/tasks, > for the purpose of reducing allocate/deallocated overheads and optimizing > performance. > > > Thank you~ > > Xintong Song > > > > On Thu

Batch Job in a Flink 1.9 Standalone Cluster

2019-10-10 Thread Timothy Victor
After a batch job finishes in a flink standalone cluster, I notice that the memory isn't freed up. I understand Flink uses it's own memory manager and just allocates a large tenured byte array that is not GC'ed. But does the memory used in this byte array get released when the batch job is done

Re: Warnings connecting to Akka

2019-10-09 Thread Timothy Victor
We see a very similar (if not the same) error running version 1.9 on Kubernetes. So far what we have discovered is that a taskmanager gets killed and a new one is created, but JM still thinks it needs to connect to the old (now dead TM). I was even able to see the a taskmanager on the same host

Re: Apache flink 1.7.2 security issues

2019-08-13 Thread Timothy Victor
The flink job manager UI isn't meant to be accessed from outside a firewall I think. Plus I dont think it was designed with security in mind and honestly it doesn't need to in my opinion. If you need security then address your network setup. And if it is still a problem the just turn off the U

Flink SinkFunction for WebSockets

2019-07-18 Thread Timothy Victor
Hi I'm looking to write a sink function for writing to websockets, in particular ones that speak the WAMP protocol ( https://wamp-proto.org/index.html). Before going down that path, I wanted to ask if a) anyone has done something like that already so I dont reinvent stuff b) any caveats or warn

Re: Flink Application JAR

2019-07-03 Thread Timothy Victor
we need to deploy and > test the whole project code. And we want to work in a more modular way. > As you said, I also think the size of the jar is less important than what > I wrote above. > > Thanks. > > *From:* Timothy Victor > *Sent:* Wednesday, July 3, 2019 2:31 PM >

Re: Flink Application JAR

2019-07-03 Thread Timothy Victor
I think any jars in $FLINK_HOME/lib gets picked up in the class path. As for dynamic update, wouldn't you want to run it through your tests/CI before deploying? FWIW we also use a fat jar. Its large but also since we just build a docker container anyway it doesn't really matter. Tim On Wed, J

Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Timothy Victor
:04 AM Vishal Santoshi > wrote: > >> I have not tried on bare metal. We have no option but k8s. >> >> And this is a job cluster. >> >> On Sat, Jun 29, 2019 at 9:01 AM Timothy Victor wrote: >> >>> Hi Vishal, can this be reproduced on a bare m

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Timothy Victor
Hi Vishal, can this be reproduced on a bare metal instance as well? Also is this a job or a session cluster? Thanks Tim On Sat, Jun 29, 2019, 7:50 AM Vishal Santoshi wrote: > OK this happened again and it is bizarre ( and is definitely not what I > think should happen ) > > > > > The job fai

Re: Flink state: complex value state pojos vs explicitly managed fields

2019-06-17 Thread Timothy Victor
I would choose encapsulation if it the fields are indeed related and makes sense for your model. In general, I feel it is not a good thing to let Flink (or any other frameworks) internal mechanics dictate your data model. Tim On Mon, Jun 17, 2019, 4:59 AM Frank Wilson wrote: > Hi, > > Is it be

Re: Latency Monitoring in Flink application

2019-06-13 Thread Timothy Victor
Thanks for the insight. I was also interested in this topic. One thought occurred to me is what about the queuing delay when sending to your message bus (e.g. kafka). I am guessing the probe will be before the message is added to the send queue? Thanks again Tim On Thu, Jun 13, 2019, 6:08 AM

Re: Flink 1.8

2019-06-04 Thread Timothy Victor
It's hard to tell without more info. >From the method that threw the exception it looked like it was trying to deserialize the accumulator. By any chance did you change your accumulator class but forgot to update the serialVersionUID? Just wondering if it is trying to deserialize to a different

Re: Limitations in StreamingFileSink BulkFormat

2019-05-31 Thread Timothy Victor
Not an expert, but I would think this will not be trivial since the reason for using checkpointing to trigger is to guarantee exactly once semantics in the event of a failure which in turn is tightly integrated into the CP mechanism. The precursor the StreamingFileSink was BucketingFileSink which

Re: [External] Flink StreamingFileSink not ingesting to S3 when checkpointing is disabled

2019-05-27 Thread Timothy Victor
You must have checkpointing enabled to use the StreamingFileSink. The feature relies on CP for achieving exactly once semantics. >> This is integrated with the checkpointing mechanism to provide exactly once semantics. https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/f

Re: Job crashing cluster

2019-05-25 Thread Timothy Victor
ld you share error stack trace? > > Thanks & Regards, > Sushant Sawant > > > On Fri, 24 May 2019, 19:18 Timothy Victor, wrote: > >> If a flink job crashes during startup (throws exception) the entire >> cluster goes down. This is even on a simple bare met

Job crashing cluster

2019-05-24 Thread Timothy Victor
If a flink job crashes during startup (throws exception) the entire cluster goes down. This is even on a simple bare metal host. I have tried catching the exception, but even that didnt prevent the JM and cluster from crashing. Has anyone run into this problem? I'm on Flink 1.7.1 Thanks Tim

Re: Flink vs KStreams

2019-05-20 Thread Timothy Victor
This is probably a very subjective question, but nevertheless here are my reasons for choosing Flink over KStreams or even Spark. a) KStreams couples you tightly to Kafka, and I personally don't want my stream processing engine to be married to my message bus. There are other (even better altern

Re: Adding metadata to the jar

2019-04-08 Thread Timothy Victor
One approach I use is to write the git commit sha to the jars manifest while compiling it (I don't use semantic versioning but rather use commit sha). Then at runtime I read the implementationVersion (class.getPackage().getImplementationVersion()), and print that in the job name. Tim On Mon, Apr

Re: Flink 1.7.2 UI : Jobs removed from Completed Jobs section

2019-04-08 Thread Timothy Victor
I face the same issue in Flink 1.7.1. Would be good to know a solution. Tim On Mon, Apr 8, 2019, 12:45 PM Jins George wrote: > Hi, > > > > I am facing a weird problem in which jobs from ‘Completed Jobs’ section in > Flink 1.7.2 UI disappear. Looking at the job manager logs, I see the job > wa

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-07 Thread Timothy Victor
t;(){}); > >> kinesisStream.keyBy(new KeySelector() {...}, info); >> //specify typeInfo through >> > > TIA, > Vijay > > On Tue, Apr 2, 2019 at 6:06 PM Timothy Victor wrote: > >> Flink needs type information for serializing and deserializing objects

Re: Flink - Type Erasure Exception trying to use Tuple6 instead of Tuple

2019-04-02 Thread Timothy Victor
Flink needs type information for serializing and deserializing objects, and that is lost due to Java type erasure. The only way to workaround this is to specify the return type of the function called in the lambda. Fabian's answer here explains it well. https://stackoverflow.com/questions/50945

Flink 1.7.1 KafkaProducer error using Exactly Once semantic

2019-03-08 Thread Timothy Victor
Yesterday I came across a weird problem when attempting to run 2 nearly identical jobs on a cluster. I was able to solve it (or rather workaround it), but am sharing here so we can consider a potential fix in Flink's KafkaProducer code. My scenario is as follows. I have a Flink program that read

Re: cast from bigdecimal to bigint throws ArithmeticException

2019-02-25 Thread Timothy Victor
This is more a Java question than Flink per se. But I believe you need to specify the rounding mode because it is calling longValueExact. If it just called longValue it would have worked without throwing an exceptionbut you risk overflowing 64 bits and getting a totally erroneous answer. Ar

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Timothy Victor
gt;> checkpoint. >> >> >> Am I better off using BucketingSink ? When to use BucketingSink and when >> to use RollingSink is not clear at all, even though at the surface it sure >> looks RollingSink is a better version of .BucketingSink ( or not ) >> >> Re

Re: fllink 1.7.1 and RollingFileSink

2019-02-10 Thread Timothy Victor
I think the only rolling policy that can be used is CheckpointRollingPolicy to ensure exactly once. Tim On Sun, Feb 10, 2019, 9:13 AM Vishal Santoshi Can StreamingFileSink be used instead of > https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/filesystem_sink.html, > ev

Issue setting up Flink in Kubernetes

2019-01-28 Thread Timothy Victor
Hi - Has there been any update on the below issue? I am also facing the same problem. http://mail-archives.apache.org/mod_mbox/flink-user/201812.mbox/%3ccac2r2948lqsyu8nab5p7ydnhhmuox5i4jmyis9g7og6ic-1...@mail.gmail.com%3E There is a similar issue ( https://stackoverflow.com/questions/50806228