Re: A question about restoring state with an additional variable with kryo

2022-09-20 Thread Vishal Santoshi
] > https://www.docs.immerok.cloud/docs/cookbook/migrating-state-away-from-kryo/ > > > On Fri, Sep 16, 2022 at 8:32 AM Vishal Santoshi > wrote: > >> Thank you for the clarification. I thought so to, >> >> Unfortunately my state are generics based and those are

Re: A question about restoring state with an additional variable with kryo

2022-09-16 Thread Vishal Santoshi
e-used-for-schema-evolution > > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/ > > [3] > https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html > > > > > > > > *From:* Vishal Santoshi >

Re: A question about restoring state with an additional variable with kryo

2022-09-15 Thread Vishal Santoshi
(CompositeSerializer.java:156) at org.apache.flink.contrib.streaming.state.RocksDBValueState.value( RocksDBValueState.java:89) On Thu, Sep 15, 2022 at 7:10 PM Vishal Santoshi wrote: > << How do I make sure that when reconstituting the state, kryo does not > complain? It trie

Re: A question about restoring state with an additional variable with kryo

2022-09-15 Thread Vishal Santoshi
complain? It tries to map the previous state to the new definition of Class A and complains that it cannot read the value for `long b`. Sorry a typo On Thu, Sep 15, 2022 at 7:04 PM Vishal Santoshi wrote: > I have state in rocksDB that represents say > > class A { > String a > } &g

A question about restoring state with an additional variable with kryo

2022-09-15 Thread Vishal Santoshi
I have state in rocksDB that represents say class A { String a } I now change my class and add another variable Class A { String a; long b = 0; } How do I make sure that when reconstituting the state, kryo does not complain? It tries to map the previous state to the new definition of Cla

Re: Source with S3 bucket with millions ( billions ) of object ( small files )

2022-04-04 Thread Vishal Santoshi
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileMonitoringFunction.java#L259-L261 > > On Mon, Apr 4, 2022 at 12:07 AM Vishal Santoshi > wrote: > >> Folks, >> I am doing a simple batch job that uses readFile() with >

Source with S3 bucket with millions ( billions ) of object ( small files )

2022-04-03 Thread Vishal Santoshi
Folks, I am doing a simple batch job that uses readFile() with "s3a://[bucket_name]" as the path with setNestedFileEnumeration(true). I am a little curious about a few things. In batch mode which I think is turned on by FileProcessingMode.PROCESS_ONCE mode does the source list all the S3 o

Re: enable.auto.commit=true and checkpointing turned on

2021-12-06 Thread Vishal Santoshi
from the committed offset. > > ps: If you enabled checkpointing, there is no need to enable > enable.auto.commit. The offset will be committed to Kafka when checkpoints > complete, which is the default behavior. > > Vishal Santoshi 于2021年12月4日周六 02:11写道: > >> Hello folk

enable.auto.commit=true and checkpointing turned on

2021-12-03 Thread Vishal Santoshi
Hello folks, 2 questions 1. If we have enabled enable.auto.commit and enabled checkpointing and we restart a flink application ( without checkpoint or savepoint ) , would the kafka consumer start consuming from the last offset committed to kafka. 2. What if in the above scenario, we have "auto.o

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-23 Thread Vishal Santoshi
as some more insights. I haven't worked that much >> with lateness, yet. >> >> Matthias >> >> On Thu, Apr 22, 2021 at 10:57 PM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> << Added the Fliter upfront as belo

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
that no data is being pushed through the sideoutput and that data in *now* pulled from the simulated sideout , essentially the Process Function with a reverse predicate to the Filter Process Function. On Thu, Apr 22, 2021 at 1:56 PM Vishal Santoshi wrote: > And when I added the filter the Ex

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
= ctx.timerService().currentWatermark()) { out.collect(value); } } } I am using RocksDB as a backend if that helps. On Thu, Apr 22, 2021 at 1:50 PM Vishal Santoshi wrote: > Yes sir. The allowedLateNess and side output always existed. > > On Thu, Apr 22, 2021 at 11:47 AM Matthias Pohl

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
your pipeline when running into the > UnsupportedOperationException issue previously? > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#getting-late-data-as-a-side-output > > On Thu, Apr 22, 2021 at 5:32 PM Vishal Santo

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
ger.of(CountTrigger.of(1))) .aggregate(new SortAggregate(), new SessionIdProcessWindowFunction(this.gapInMinutes, this. lateNessInMinutes)) .name("session_aggregate").uid("session_aggregate"); On Thu, Apr 22, 2021 at 9:59 AM Vishal Santoshi wrote: > I can do that, but I am not c

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
1 at 8:52 AM Vishal Santoshi wrote: > I saw > https://stackoverflow.com/questions/57334257/the-end-timestamp-of-an-event-time-window-cannot-become-earlier-than-the-current. > and this seems to suggest a straight up filter, but I am not sure how does > that filter works as in would it f

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
ate-data-as-a-side-output > > On Thu, Apr 22, 2021 at 3:22 PM Vishal Santoshi > wrote: > >> I saw >> https://stackoverflow.com/questions/57334257/the-end-timestamp-of-an-event-time-window-cannot-become-earlier-than-the-current. >> and this seems to suggest a straigh

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
pr 21, 2021 at 7:05 PM Vishal Santoshi wrote: > Hey folks, >I had a pipe with sessionization restarts and then fail > after retries with this exception. The only thing I had done was to > increase the lateness by 12 hours ( to a day ) in this pipe and restart > fr

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-22 Thread Vishal Santoshi
, 2021 at 8:24 AM Vishal Santoshi wrote: > Well it was not a solution after all. We now have a session window that is > stuck with the same issue albeit after the additional lateness. I had > increased the lateness to 2 days and that masked the issue which again > reared it's head

event-time window cannot become earlier than the current watermark by merging

2021-04-21 Thread Vishal Santoshi
Hey folks, I had a pipe with sessionization restarts and then fail after retries with this exception. The only thing I had done was to increase the lateness by 12 hours ( to a day ) in this pipe and restart from SP and it ran for 12 hours plus without issue. I cannot imagine that i

Re: 2-phase commit and kafka

2021-04-17 Thread Vishal Santoshi
removes > all pending transactions. > > On Fri, Apr 16, 2021 at 10:28 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> Thanks for the feedback and. glad I am on the right track. >> >> > Outstanding transactions should be automatically abor

Re: 2-phase commit and kafka

2021-04-16 Thread Vishal Santoshi
that > they may linger for a longer time if you stop an application entirely (for > example for an upgrade). > > On Fri, Apr 16, 2021 at 4:08 PM Vishal Santoshi > wrote: > >> Hello folks >> >> So AFAIK data loss on exactly once will happen if >> >&

2-phase commit and kafka

2021-04-16 Thread Vishal Santoshi
Hello folks So AFAIK data loss on exactly once will happen if - start a transaction on kafka. - pre commit done ( kafka is prepared for the commit ) - commit fails ( kafka went own or n/w issue or what ever ). kafka has an uncommitted transaction - pipe was down for

Re: clear() in a ProcessWindowFunction

2021-04-10 Thread Vishal Santoshi
t; https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl > > Regards, > Roman > > > On Wed, Mar 31, 2021 at 5:45 PM Vishal Santoshi > wrote: > > > > I had a query Say I have a single key with 2 live sessions ( A and B

Re: clear() in a ProcessWindowFunction

2021-03-31 Thread Vishal Santoshi
with it's merges given that the state is scoped to the key. On Fri, Mar 12, 2021 at 12:37 PM Vishal Santoshi wrote: > Yep, makes sense. > > On Fri, Mar 12, 2021 at 10:12 AM Roman Khachatryan > wrote: > >> > Want to confirm that the keys are GCed ( along with state

Re: SP with Drain and Cancel hangs after take a SP

2021-03-30 Thread Vishal Santoshi
after the job is resumed. >>> >>> For the concrete problem at hand it is difficult to say why it does not >>> stop. It would be helpful if you could provide us with the debug logs of >>> such a run. I am also pulling Arvid who works on Flink's connector >&

Re: SP with Drain and Cancel hangs after take a SP

2021-03-30 Thread Vishal Santoshi
g Arvid who works on Flink's connector > ecosystem. > > Cheers, > Till > > On Mon, Mar 29, 2021 at 11:08 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> More interested whether a StreamingFileSink without a drain >> negatively affects it

Re: SP with Drain and Cancel hangs after take a SP

2021-03-29 Thread Vishal Santoshi
n the length, or this is not an issue with StreamingFileSink. If it is the former then I would assume we should be documented and then have to look why this hang happens. On Mon, Mar 29, 2021 at 4:08 PM Vishal Santoshi wrote: > Is this a known issue. We do a stop + savepoint with drain. I see

SP with Drain and Cancel hangs after take a SP

2021-03-29 Thread Vishal Santoshi
Is this a known issue. We do a stop + savepoint with drain. I see no back pressure on our operators. It essentially takes a SP and then the SInk ( StreamingFileSink to S3 ) just stays in the RUNNING state. Without drain i stop + savepoint works fine. I would imagine drain is important ( flush the

Re: DataDog and Flink

2021-03-24 Thread Vishal Santoshi
ices/{}/metrics | jq On Wed, Mar 24, 2021 at 9:56 AM Vishal Santoshi wrote: > Yes, I will do that. > > Regarding the metrics dump through REST, it does provide for the TM > specific but refuses to do it for all jobs and vertices/operators etc > .Moreover I am not sure I have

Re: DataDog and Flink

2021-03-24 Thread Vishal Santoshi
or something else? >> >> Best, >> Matthias >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/rest_api.html#taskmanagers-metrics >> >> On Tue, Mar 23, 2021 at 10:59 PM Vishal Santoshi < >> vishal.santo...@gmail.co

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
That said, is there a way to get a dump of all metrics exposed by TM. I was searching for it and I bet we could get it for ServieMonitor on k8s ( scrape ) but am missing a way to het a TM and dump all metrics that are pushed. Thanks and regards. On Tue, Mar 23, 2021 at 5:56 PM Vishal Santoshi

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
e size of the tags ( or keys ). On Tue, Mar 23, 2021 at 11:33 AM Vishal Santoshi wrote: > If we look at this > <https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-metrics/flink-metrics-datadog/src/main/java/org/apache/flink/

Re: DataDog and Flink

2021-03-23 Thread Vishal Santoshi
a similar rate still pass. So I'd suspect > n/w issues. > Can you log into the TM's machine and try out manually how the system > behaves? > > On Sat, Mar 20, 2021 at 1:44 PM Vishal Santoshi > wrote: > >> Hello folks, >> This is quite stran

DataDog and Flink

2021-03-20 Thread Vishal Santoshi
Hello folks, This is quite strange. We see a TM stop reporting metrics to DataDog .The logs from that specific TM for every DataDog dispatch time out with* java.net.SocketTimeoutException: timeout *and that seems to repeat over every dispatch to DataDog. It seems it is on a 10 se

Re: Flink History server ( running jobs )

2021-03-19 Thread Vishal Santoshi
can be retrieved through Flink's REST API. > > Best, > Matthias > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/historyserver.html#overview > > On Wed, Mar 17, 2021 at 10:33 PM Vishal Santoshi < > vishal.santo...@gmail.com>

Flink History server ( running jobs )

2021-03-17 Thread Vishal Santoshi
Hello folks, Does fliink server not provide for running jobs ( like spark history does ) ? Regards.

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-16 Thread Vishal Santoshi
4) low number of samples > > [1] > https://github.com/ververica/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/version_set.cc#L919-L924 > > > > Best > Yun Tang > -- > *From:* Vishal Santoshi > *Sent:* Monday, March 15, 2021 5:48

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
called as an "estimate" , but was not anticipating this much difference ... On Sun, Mar 14, 2021 at 5:32 PM Vishal Santoshi wrote: > The reason I ask is that I have a "Process Window Function" on that > Session Window and I keep key scoped Global State. I maintain a

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
gt;.Context context, Iterable> elements, Collector< KeyedSessionWithSessionID> out) throws Exception { // scoped to the key if (state.value() == null) { this.newKeysInState.inc(); state.update(new IntervalList()); }else{ this.existingKeysInState.inc(); } On Sun, Mar 14, 2021 at 3:32 P

Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
Hey folks, Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys". Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a f

Re: clear() in a ProcessWindowFunction

2021-03-12 Thread Vishal Santoshi
se TTL). > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl > > Regards, > Roman > > On Fri, Mar 12, 2021 at 2:16 PM Vishal Santoshi > wrote: > > > > Sometimes writing it down makes you think. I no

Re: clear() in a ProcessWindowFunction

2021-03-12 Thread Vishal Santoshi
Sometimes writing it down makes you think. I now realize that this is not the right approach, given that merging windows will have their own states..and how the merge happens is really at the key level On Fri, Mar 12, 2021 at 6:27 AM Vishal Santoshi wrote: > I intend to augment ev

Re: clear() in a ProcessWindowFunction

2021-03-12 Thread Vishal Santoshi
lain what you are trying to achieve and why do you need to > combine > sliding windows with state scoped to window+key? > > Regards, > Roman > > On Fri, Mar 12, 2021 at 5:13 AM Vishal Santoshi > wrote: > > > > Essentially, Does this code leak state > > > &g

Re: clear() in a ProcessWindowFunction

2021-03-11 Thread Vishal Santoshi
ID<>(elements.iterator().next(), uuid )); } } On Thu, Mar 11, 2021 at 11:09 PM Vishal Santoshi wrote: > Hello folks, > The suggestion is to use windowState() for a key key per > window state and clear the state explicitly. Also it seems that > getRuntime()

clear() in a ProcessWindowFunction

2021-03-11 Thread Vishal Santoshi
Hello folks, The suggestion is to use windowState() for a key key per window state and clear the state explicitly. Also it seems that getRuntime().getState() will return a globalWindow() where state is shared among windows with the same key. I desire of course to have state scope

Re: Flink and Nomad ( from Hashicorp)

2021-03-11 Thread Vishal Santoshi
ain/java/org/apache/flink/kubernetes/highavailability/KubernetesHaServices.java > > Cheers, > Till > > On Tue, Mar 9, 2021 at 7:56 PM Vishal Santoshi > wrote: > >> Is there any reason not to have Nomad HA on the lines of K8s HA ? I >> think it would depend on how puggable the

Flink and Nomad ( from Hashicorp)

2021-03-09 Thread Vishal Santoshi
Is there any reason not to have Nomad HA on the lines of K8s HA ? I think it would depend on how puggable the HA core code is ? Any links to how ZK/K8s code specifically for HA would be highly appreciated

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
As in https://github.com/aws/aws-sdk-java/blob/41a577e3f667bf5efb3d29a46aaf210bf70483a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java#L2378 never gets called as it is never GCed... On Wed, Feb 10, 2021 at 10:47 AM Vishal Santoshi wrote: > Thank

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
the uber jar ? On Wed, Feb 10, 2021 at 9:27 AM Vishal Santoshi wrote: > We do put the flink-hdoop-uber*.jar in the flink lib ( and thus available > to the root classloader ). That still does not explain the executor > service outliving the job. > > On Tue, Feb 9, 2021 at 6:49 PM

Re: ClassLoader leak when using s3a upload through DataSet.output

2021-02-10 Thread Vishal Santoshi
We do put the flink-hdoop-uber*.jar in the flink lib ( and thus available to the root classloader ). That still does not explain the executor service outliving the job. On Tue, Feb 9, 2021 at 6:49 PM Vishal Santoshi wrote: > Hello folks, > We see threads from &

ClassLoader leak when using s3a upload through DataSet.output

2021-02-09 Thread Vishal Santoshi
Hello folks, We see threads from https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L49 outlive a batch job that writes Parquet Files to S3, causing a ClassLoader Leak. Is this a known

Re: Implications of taskmanager.memory.process.size

2020-07-09 Thread Vishal Santoshi
ffering and JVM overhead. > > > Thank you~ > > Xintong Song > > > > On Thu, Jul 9, 2020 at 10:57 PM Vishal Santoshi > wrote: > >> ager.memory.process.size(none)MemorySizeTotal Process Memory size for >> the TaskExecutors. This includes all the memory that a Task

Re: Implications of taskmanager.memory.process.size

2020-07-09 Thread Vishal Santoshi
from the memory manager and keep their memory usage within that boundary. Is not used AFAIK . May be reduce the fraction to 0 ? We do not use offline heap ( aka batch jobs ) on our cluster ? Any help will be appreciated. On Thu, Jul 9, 2020 at 9:25 AM Vishal Santoshi wrote

Implications of taskmanager.memory.process.size

2020-07-09 Thread Vishal Santoshi
Hello folks, As established https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#memory-configuration , I set the taskmanager.memory.process.size and taskmanager.memory.task.off-heap.size in my flink-conf.yaml and I see the 2 properties being pulled in. * - Load

Re: Hadoop user jar for flink 1.9 plus

2020-03-20 Thread Vishal Santoshi
Awesome, thanks! On Tue, Mar 17, 2020 at 11:14 AM Chesnay Schepler wrote: > You can download flink-shaded-hadoop from the downloads page: > https://flink.apache.org/downloads.html#additional-components > > On 17/03/2020 15:56, Vishal Santoshi wrote: > > We have been on flink 1

Hadoop user jar for flink 1.9 plus

2020-03-17 Thread Vishal Santoshi
We have been on flink 1.8.x on production and were planning to go to flink 1.9 or above. We have always used hadoop uber jar from https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop2-uber but it seems they go up to 1.8.3 and their distribution ends 2019. How do or where do we g

Re: Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-17 Thread Vishal Santoshi
umerConfig values: > allow.auto.create.topics = true > auto.commit.interval.ms = 5000 > auto.offset.reset = earliest > > 2. sync.topic.acls.enabled = false > > > > > On Tue, Oct 15, 2019 at 4:00 PM Vishal Santoshi > wrote: > >> 2 queries >> >

Re: Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-16 Thread Vishal Santoshi
auto.commit.interval.ms = 5000 auto.offset.reset = earliest 2. sync.topic.acls.enabled = false On Tue, Oct 15, 2019 at 4:00 PM Vishal Santoshi wrote: > 2 queries > > 1. I am trying to configure MM2 to start replicating from the head ( > latest of the topic ) . Should auto.offset.reset = lat

Mirror Maker 2.0 cluster and starting from latest offset and other queries

2019-10-15 Thread Vishal Santoshi
2 queries 1. I am trying to configure MM2 to start replicating from the head ( latest of the topic ) . Should auto.offset.reset = latest in mm2.properties be enough ? Unfortunately MM2 will start from the EARLIEST. 2. I do not have "Authorizer is configured on the broker " and see this exce

Re: flink 1.9

2019-10-09 Thread Vishal Santoshi
Thanks a lot. On Wed, Oct 9, 2019, 8:55 AM Chesnay Schepler wrote: > Java 11 support will be part of Flink 1.10 (FLINK-10725). You can take the > current master and compile&run it on Java 11. > > We have not investigated later Java versions yet. > On 09/10/2019 14:14, Vi

Re: flink 1.9

2019-10-09 Thread Vishal Santoshi
Thank you. A related question, has flink been tested with jdk11 or above. ? On Tue, Oct 8, 2019, 5:18 PM Steven Nelson wrote: > > https://flink.apache.org/downloads.html#apache-flink-190 > > > Sent from my iPhone > > On Oct 8, 2019, at 3:47 PM, Vishal Santoshi > wrote:

flink 1.9

2019-10-08 Thread Vishal Santoshi
where do I get the corresponding jar for 1.9 ? flink-shaded-hadoop2-uber-2.7.5-1.8.0.jar Thanks..

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-07-03 Thread Vishal Santoshi
I guess using a session cluster rather then a job cluster will decouple the job from the container and may be the only option as of today? On Sat, Jun 29, 2019, 9:34 AM Vishal Santoshi wrote: > So there a re 2 scenerios > > 1. If JM goes down ( exits ) and k8s re launches the Job Clus

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi
even though Max. number of execution retries Restart with fixed delay (24 ms). #20 restart attempts. On Sat, Jun 29, 2019 at 10:44 AM Vishal Santoshi wrote: > This is strange, the retry strategy was 20 times with 4 minute delay. > This job tried once ( we had a hadoop Name Node

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi
org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Kafka-to-HDFS (0005) because the restart strategy prevented it.* On Sat, Jun 29, 2019 at 10:03 AM Vishal Santoshi wrote: > We are investigating that. But is the above theory plausible ( flink > gurus ) even though th

Re: Why did JM fail on K8s (see original thread below)

2019-06-29 Thread Vishal Santoshi
gt;> *2019-06-29 00:33:14,308 INFO >> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster >> entrypoint process StandaloneJobClusterEntryPoint with exit code 1443.* >> >> >> >> On Sat, Jun 29, 2019 at 9:04 AM Vishal Santoshi < >> vishal.santo...@g

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
(0005) switched from state RESTARTING to CREATED. ) , * it does the right thing. Not what is the right way to handle 1. apart from spec: restartPolicy: Never and manually restart... On Sat, Jun 29, 2019 at 9:25 AM Vishal Santoshi wrote: > Another point the JM had termina

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
, Jun 29, 2019 at 9:04 AM Vishal Santoshi wrote: > I have not tried on bare metal. We have no option but k8s. > > And this is a job cluster. > > On Sat, Jun 29, 2019 at 9:01 AM Timothy Victor wrote: > >> Hi Vishal, can this be reproduced on a bare metal instance as well? &

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
> > On Sat, Jun 29, 2019, 7:50 AM Vishal Santoshi > wrote: > >> OK this happened again and it is bizarre ( and is definitely not what I >> think should happen ) >> >> >> >> >> The job failed and I see these logs ( In essence it is keeping t

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-29 Thread Vishal Santoshi
1 checkpoints in ZooKeeper.06.28.2019 20:33:20.5502019-06-29 00:33:20,549 INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 1 checkpoints from storage.* This just does not make sense.... On Wed, Jun 5, 2019 at 9:29 AM Vishal Santoshi w

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-14 Thread Vishal Santoshi
I am sure there was a ticket open that allowed for clean manipulation of state ( that would have saved us a whole lot ).. On Fri, Jun 14, 2019 at 1:19 PM Vishal Santoshi wrote: > Yep, but > > "Consider this example: if you had a Kafka Consumer that was consuming > from t

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-14 Thread Vishal Santoshi
anges > [2] https://issues.apache.org/jira/browse/FLINK-10342 > > Am Do., 13. Juni 2019 um 22:31 Uhr schrieb Vishal Santoshi < > vishal.santo...@gmail.com>: > >> I guess, adjusting the pattern ( blacklisting the topic/s ) would >> work >> >> On Th

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-13 Thread Vishal Santoshi
I guess, adjusting the pattern ( blacklisting the topic/s ) would work On Thu, Jun 13, 2019 at 3:02 PM Vishal Santoshi wrote: > Given > > > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/conn

Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-13 Thread Vishal Santoshi
Given https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L471 it seems that if * I have a regex pattern for consuming a bunch of topics * auto create is turned on then eve

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-05 Thread Vishal Santoshi
ecovered all 10 checkpoints, > start from the latest, and start pruning old ones as new ones were created. > > So you're running into 2 separate issues here, which is a bit odd. > > On 05/06/2019 13:44, Vishal Santoshi wrote: > > Any one? > > On Tue, Jun 4, 2019, 2:41 PM Vi

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-05 Thread Vishal Santoshi
Any one? On Tue, Jun 4, 2019, 2:41 PM Vishal Santoshi wrote: > The above is flink 1.8 > > On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi > wrote: > >> I had a sequence of events that created this issue. >> >> * I started a job and I had the state.check

Re: Flink 1.8

2019-06-04 Thread Vishal Santoshi
y any chance did you change your > accumulator class but forgot to update the serialVersionUID? Just > wondering if it is trying to deserialize to a different class definition. > > A more detailed stscktrace (maybe with debug on) will help. > > Tim > > On Tue, Jun 4, 2019, 8

Re: Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-04 Thread Vishal Santoshi
The above is flink 1.8 On Tue, Jun 4, 2019 at 12:32 PM Vishal Santoshi wrote: > I had a sequence of events that created this issue. > > * I started a job and I had the state.checkpoints.num-retained: 5 > > * As expected I have 5 latest checkpoints retained in my hdfs backend.

Flink 1.8

2019-06-04 Thread Vishal Santoshi
I see tons of org.apache.flink.runtime.executiongraph.ExecutionGraph- Cannot update accumulators for job 7bfe57bb0ed1c5c2f4f40c2fccaab50d. java.lang.NullPointerException https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Exe

Job recovers from an old dangling CheckPoint in case of Job Cluster based Flink pipeline

2019-06-04 Thread Vishal Santoshi
I had a sequence of events that created this issue. * I started a job and I had the state.checkpoints.num-retained: 5 * As expected I have 5 latest checkpoints retained in my hdfs backend. * JM dies ( K8s limit etc ) without cleaning the hdfs directory. The k8s job restores from the latest che

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
1.8.0 git.commit.user.name=Aljoscha Krettek git.commit.time=03.04.2019 @ 13\:25\:54 PDT On Mon, Apr 29, 2019 at 8:07 AM Vishal Santoshi wrote: > Ok, I will check. > > On Fri, Apr 12, 2019, 4:47 AM Chesnay Schepler wrote: > >> have you compiled Flink yourself? >> >&g

Re: Version "Unknown" - Flink 1.7.0

2019-04-29 Thread Vishal Santoshi
Ok, I will check. On Fri, Apr 12, 2019, 4:47 AM Chesnay Schepler wrote: > have you compiled Flink yourself? > > Could you check whether the flink-dist jar contains a > ".version.properties" file in the root directory? > > On 12/04/2019 03:42, Vishal Santoshi wrote: &

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-25 Thread Vishal Santoshi
On Wed, Apr 24, 2019 at 4:58 PM Vishal Santoshi > wrote: > >> Verified, I think we just need to make sure that it is documented :) >> >> On Wed, Apr 24, 2019 at 9:47 AM Vishal Santoshi < >> vishal.santo...@gmail.com> wrote: >> >>> This makes tota

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-25 Thread Vishal Santoshi
changing the > documentation accordingly. > > Best, > > Dawid > > [1] https://issues.apache.org/jira/browse/FLINK-12274 > On 25/04/2019 03:27, Guowei Ma wrote: > > You could try to set queryable-state.enable to true. And check again. > > Vishal Santoshi 于2019年4月25日 周四上午1

Re: QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-24 Thread Vishal Santoshi
Any one ? On Wed, Apr 24, 2019 at 12:02 PM Vishal Santoshi wrote: > Hello folks, > > Following > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html#querying-state > . > for setting up the Queryable Server and proxy, I have

QueryableState startup regression in 1.8.0 ( migration from 1.7.2 )

2019-04-24 Thread Vishal Santoshi
Hello folks, Following https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html#querying-state . for setting up the Queryable Server and proxy, I have my classpath ( the lib directory ) that has the required jar, But I do not see the mentioned log and

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-24 Thread Vishal Santoshi
Verified, I think we just need to make sure that it is documented :) On Wed, Apr 24, 2019 at 9:47 AM Vishal Santoshi wrote: > This makes total sense and actually is smart ( defensive ). Will test and > report. I think though that this needs to be documented :) > > On Wed, Apr 24,

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-24 Thread Vishal Santoshi
onous operations which tells Flink > that their results don't need to get delivered to some client. If you would > like to have such a feature, then please open a JIRA issue for it. > > Cheers, > Till > > On Wed, Apr 24, 2019 at 3:49 AM Vishal Santoshi > wrote: > >&

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-23 Thread Vishal Santoshi
s On Tue, Apr 23, 2019 at 3:11 PM Vishal Santoshi wrote: > I see this in the TM pod > > 2019-04-23 19:08:41,828 DEBUG > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got > ping response for sessionid: 0x15cc7f3d88466a5 after 0ms > > 201

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-23 Thread Vishal Santoshi
. JM log has analogous.. 2019-04-23 19:10:49,218 DEBUG org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x25add5478fb2e7c after 0ms Does that ring a bell ? On Tue, Apr 23, 2019 at 2:04 PM Vishal Santoshi wrote: > Adding the DEBUG l

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-23 Thread Vishal Santoshi
, 2019 at 1:59 PM Vishal Santoshi wrote: > I am seeing this weird issue where I do a save point with cancel on a job > on k8s and it hangs for 5 minutes ( no INFO logs ) and then exits with code > of 2. > > > 2019-04-23 17:36:31,372 INFO > org.apache.flink.runtime.jobmaster.MiniD

No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-23 Thread Vishal Santoshi
I am seeing this weird issue where I do a save point with cancel on a job on k8s and it hangs for 5 minutes ( no INFO logs ) and then exits with code of 2. 2019-04-23 17:36:31,372 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint - Shutting down rest endpoint. 2019-04-23 17:36:

Re: status on FLINK-7129

2019-04-23 Thread Vishal Santoshi
+1 On Tue, Apr 23, 2019, 4:57 AM kant kodali wrote: > Thanks all for the reply. I believe this is one of the most important > feature that differentiates flink from other stream processing engines as > others don't even have CEP yet. so it would be great if this issue can get > more attention as

IP resolution for metrics on k8s when the JM ( job cluster ) is rolled but TMs are not

2019-04-12 Thread Vishal Santoshi
Scenerio * savepoint with Cancel followed by a restore on the Job. It brings down the JM and relaunches on a different IP, thus the resolution of dns is a new IP. * The TMs deployment is not rolled ( recreated ) * Note that `flink-conf.yaml:metrics.internal.query-service.port` is hardcoded. Re

Re: Version "Unknown" - Flink 1.7.0

2019-04-11 Thread Vishal Santoshi
t; > Is it your situation? > > Best, > tison. > > > Vishal Santoshi 于2019年2月2日周六 下午10:27写道: > >> +1 ( though testing in JOB mode on k8s ) >> >> On Fri, Feb 1, 2019 at 6:45 PM anaray wrote: >> >>> Though not a major issue. I see that Flink

Re: K8s job cluster and cancel and resume from a save point ?

2019-04-11 Thread Vishal Santoshi
I confirm that 1.8.0 fixes all the above issue . The JM process exits with code 0 and exits the pod ( TERMINATED state ) . The above is true for both PATCH cancel and POST save point with cancel as above. Thank you for fixing this issue. On Wed, Mar 13, 2019 at 10:17 AM Vishal Santoshi wrote

Re: Do we have an example of setting up Queryable state ( proxies, client etc ) on k8s ?

2019-04-01 Thread Vishal Santoshi
deeply couples the query layer with the TMs, inhibiting independent development of the query layer. Thanks. On Fri, Mar 29, 2019 at 9:08 AM Vishal Santoshi wrote: > Thanks Konstantin, > That makes sense. To give you some context, > the reason we are gravitatin

Re: Do we have an example of setting up Queryable state ( proxies, client etc ) on k8s ?

2019-03-29 Thread Vishal Santoshi
y/Jersey REST based server that sends queries to this service. > > Please le me know if this works for you. > > Hope this helps and cheers, > > Konstantin > > > On Thu, Mar 28, 2019 at 12:37 AM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> I t

Re: What are savepoint state manipulation support plans

2019-03-28 Thread Vishal Santoshi
+1 On Thu, Mar 28, 2019, 5:01 AM Ufuk Celebi wrote: > I think such a tool would be really valuable to users. > > @Gordon: What do you think about creating an umbrella ticket for this > and linking it in this thread? That way, it's easier to follow this > effort. You could also link Bravo and Set

Re: Do we have an example of setting up Queryable state ( proxies, client etc ) on k8s ?

2019-03-27 Thread Vishal Santoshi
ny advise/ideas on the 3 worry points ? Regards On Mon, Mar 25, 2019 at 8:57 PM Vishal Santoshi wrote: > I have 2 options > > 1. A Rest Based, in my case a Jetty/REST based QueryableStateClient in a > side car container colocated on JM ( Though it could on all TMs but that > l

  1   2   3   4   >