Re: Checkpointing in StateFun

2022-03-14 Thread Seth Wiesman
where > I can see how Raw state can be used in Flink? > > > Best Regards, > > Christopher Gustafson > -- > *Från:* Seth Wiesman > *Skickat:* den 11 mars 2022 17:57:21 > *Till:* Christopher Gustafson > *Kopia:* user@flink.apache.org >

Re: Checkpointing in StateFun

2022-03-11 Thread Seth Wiesman
I assume you are talking about the checkpointing in the feedback package? StateFun only relies on Flink checkpointing for fault tolerance. All state is stored in standard checkpoint / savepoints and can be used to restore from failure, upgrade a job, rescale, etc. Just like any other snapshot. St

Re: Possible BUG in 1.15 SQL JSON_OBJECT()

2022-02-25 Thread Seth Wiesman
Thank you for reporting! That is definitely a bug, and I have opened a ticket to fix which you can track here. https://issues.apache.org/jira/browse/FLINK-26374 Seth On Thu, Feb 24, 2022 at 4:18 PM Jonathan Weaver wrote: > Using the latest SNAPSHOT BUILD. > > If I have a column definition as >

Re: Read parquet data from S3 with Flink 1.12

2021-12-21 Thread Seth Wiesman
Hi Alexandre, You are correct, BatchTableEnvironment does not exist in 1.14 anymore. In 1.15 we will have the state processor API ported to DataStream for exactly this reason, it is the last piece to begin officially marking DataSet as deprecated. As you can understand, this has been a multi year

Re: Re: Will Flink loss some old Keyed State when changing the parallelism

2021-12-20 Thread Seth Wiesman
No. The default max parallelism of 128 will be applied. If you try to restore above that value, the restore will fail and you can simply restore at a smaller value. No data loss. On Mon, Dec 20, 2021 at 2:28 AM 杨浩 wrote: > > Thanks for your replay. If we don't set the max parallelism, and we ch

Re: UDF and Broadcast State Pattern

2021-12-15 Thread Seth Wiesman
Hi Krzysztof, There is a difference in semantics here between yourself and Caizhi. SQL UDFs can be used statefully - see AggregateFunction and TableAggregateFunction for examples. You even have access to ListView and MapView which are backed by ListState and MapState accordingly. These functions c

Re: Sending an Alert to Slack, AWS sns, mattermost

2021-12-14 Thread Seth Wiesman
Sure, Just implement `RichSinkFunction`. You will initialize your client inside the open method and then send alerts from invoke. Seth On Mon, Dec 13, 2021 at 9:17 PM Robert Cullen wrote: > Yes, That's the correct use case. Will this work with the DataStream > API? UDFs are for the Table API

Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread Seth Wiesman
+1 I actually thought we had already dropped this FS. If anyone is still relying on it in production, the file system abstraction in Flink has been incredibly stable over the years. They should be able to use the 1.14 MapR FS with later versions of Flink. Seth On Wed, Dec 8, 2021 at 10:03 AM Mar

Re: GCS/Object Storage Rate Limiting

2021-12-08 Thread Seth Wiesman
Not sure if you've seen this, but Flinks file systems do support connection limiting. https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/common/#connection-limiting Seth On Wed, Dec 8, 2021 at 12:18 PM Kevin Lam wrote: > Hey David, > > Thanks for the response. The

Re: Dependency injection for TypeSerializer?

2021-11-10 Thread Seth Wiesman
Yes I did, thanks for sending it back :) Copying my previous reply for the ML: Hey Thomas, > > You are correct that there is no way to inject dynamic information into > the TypeSerializer configured from the TypeSerializerSnapshot, but that > should not be a problem for your use case. > > The type

Re: to join or not to join, that is the question...

2021-11-08 Thread Seth Wiesman
There is no such restriction on connected streams; either input may modify the keyed state. Regarding performance, the difference between the two should be negligible and I would go with the option with the cleanest semantics. If both streams are the same type *and* you do not care which input an e

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
In general I would strongly encourage you to find a way to `key` your stream, it will make everything much simpler. On Thu, Nov 4, 2021 at 6:05 PM Seth Wiesman wrote: > Why not? > > All those classes have a Symbol attribute, why can't you use that to key > the stream? > >

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
Why not? All those classes have a Symbol attribute, why can't you use that to key the stream? On Thu, Nov 4, 2021 at 5:51 PM Isidoros Ioannou wrote: > Hi Seth, > > thank you for your answer. > In this case you are right and it would solve my problem. but actually my > case is a bit more complex

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Seth Wiesman
HI Isidoros, If you want the pattern to be scoped to symbols, I suggest you use a `keyBy` in your stream. Constructing the pattern will now look like this: KeyedStream keydInput = inputStream.keyBy(model -> model.getSymbol); PatternStream marketOpenPatternStream = CEP.pattern(keydInput, patter

Re: Snapshot method for custom keyed state checkpointing ?

2021-10-12 Thread Seth Wiesman
Hi Marc, I think you will find this is less efficient than just using keyed state. Remember state backends are local, reading and writing is extremely cheap. HashMapStateBackend is just an in-memory data structure and EmbeddedRocksDBStateBackend only works against local disk. Additionally, the emb

Re: How to add Flink a Flink connector to stateful functions

2021-09-28 Thread Seth Wiesman
I just want to add that the StateFun documentation does cover using custom Flink connectors[1]. [1] https://nightlies.apache.org/flink/flink-statefun-docs-release-3.1/docs/modules/io/flink-connectors/#flink-connectors On Tue, Sep 28, 2021 at 2:52 AM Christian Krudewig (Corporate Development) < c

Re: Built-in functions to manipulate MULTISET type

2021-09-20 Thread Seth Wiesman
. Seth On Mon, Sep 20, 2021 at 1:26 AM Kai Fu wrote: > Hi Seth, > > This is really helpful and inspiring, thank you for the information. > > On Sun, Sep 19, 2021 at 11:06 PM Seth Wiesman wrote: > >> Hi, >> >> I agree it would be great to see these functions

Re: Built-in functions to manipulate MULTISET type

2021-09-19 Thread Seth Wiesman
Hi, I agree it would be great to see these functions built-in, but you do not need to write a UDF for each type. You can overload a UDFs type inference and have the same capabilities as built-in functions, which means supporting generics. https://github.com/apache/flink/blob/master/flink-examples

Re: CEP library support in Python

2021-09-15 Thread Seth Wiesman
d or progress state (no counter increment or decrement) ? > > I hope this example was clear. > > Thank you for your time! > Pedro Silva > > > Em sex., 10 de set. de 2021 às 20:18, Seth Wiesman > escreveu: > >> Hi Pedro, >> >> The DataStream CEP librar

Re: Flink Kafka Partition Discovery Never Forgets Deleted Partitions

2021-09-15 Thread Seth Wiesman
I just want to chime in that if you really do need to drop a partition, Flink already supports a solution. If you manually stop the job with a savepoint and restart it with a new UID on the source operator, along with passing the --allowNonRestoredState flag to the client, the source will disregar

Re: CEP library support in Python

2021-09-10 Thread Seth Wiesman
Hi Pedro, The DataStream CEP library is not available in Python but you can use `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP library from Python. https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/ Seth On Fri, Sep

Re: State processor API very slow reading a keyed state with RocksDB

2021-09-09 Thread Seth Wiesman
Hi David, I was also able to reproduce the behavior, but was able to get significant performance improvements by reducing the number of slots on each TM to 1. My suspicion, as Piotr alluded to, has to do with the different runtime execution of DataSet over DataStream. In particular, Flink's DataS

Re: State Processor API with EmbeddedRocksDBStateBackend

2021-08-11 Thread Seth Wiesman
Hi Xianwen, Looks like the State Processor API needs to be updated for the new state backend factory stack. For now, just use RocksDBStateBackend and it will work as intended. I've opened a ticket: https://issues.apache.org/jira/browse/FLINK-23728 Seth On Wed, Aug 11, 2021 at 2:08 AM xianwen j

Re: How would Flink job react to change of partitions in Kafka topic?

2021-06-23 Thread Seth Wiesman
It will just work as long as you enable partition discovery. https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#partition-discovery On Tue, Jun 22, 2021 at 1:32 PM Thomas Wang wrote: > Hi, > > I'm wondering if anyone has changed the number of partitio

Re: Reading Flink states from svaepoint uning State Processor API

2021-06-01 Thread Seth Wiesman
Hi Min, The only requirement is that your state descriptors be configured identically as those used in your datastream API. So if you registered custom TypeInformation / serializer in your streaming job you will need those here as well. I would also look at the ExecutionConfig on your DataStream a

Re: [Statefun] Exception occurs during function chaining / Async function

2021-03-01 Thread Seth Wiesman
Hi Le, I believe the issue is the bounded source[1]. Stateful Functions only supports unbounded inputs. Additionally, you can remove all the `synchronized` blocks from your code; statefun handles all synchronization for you. Seth [1] https://gist.github.com/flint-stone/cbc60f2d41507fdf33507ba99

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-02-26 Thread Seth Wiesman
Strong +1 Having two planners is confusing to users and the diverging semantics make it difficult to provide useful learning material. It is time to rip the bandage off. Seth On Fri, Feb 26, 2021 at 12:54 AM Kurt Young wrote: > change.> > > Hi Timo, > > First of all I want to thank you for in

Re: [Statefun] Dynamic behavior

2021-02-23 Thread Seth Wiesman
actionManager instances when they receive a message > of type Transaction for the first time. > > Seth Wiesman escreveu no dia terça, 23/02/2021 à(s) > 16:02: > >> Hey Miguel, >> >> What you are describing is exactly what is implemented in this repo. The >> Tr

Re: [Statefun] Dynamic behavior

2021-02-23 Thread Seth Wiesman
Hey Miguel, What you are describing is exactly what is implemented in this repo. The TransactionManager function acts as an orchestrator to work with the other functions. The repo is structured as an exercise but the full solution exists on the branch `advanced-solution`. https://github.com/verve

Re: Flink SQL OVER window

2021-01-29 Thread Seth Wiesman
You need to use TUMBLE_ROWTIME to extract a time attribute from a window, TUMBLE_END is just a timestamp. https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#selecting-group-window-start-and-end-timestamps Seth On Fri, Jan 29, 2021 at 9:14 AM Patrick Angeles wrote:

Re: Annotating AggregateFunction accumulator type with @DataTypeHint

2021-01-26 Thread Seth Wiesman
Yes, the FunctionHint annotation has an accumulator field. There is an example in its JavaDoc. Seth On Tue, Jan 26, 2021 at 6:39 AM Yuval Itzchakov wrote: > Hi, I have an aggregate function of the form: > > class Foo extends AggregateFunction[Array[Json], util.List[Json]] > > I want to treat th

Re: question about timers

2021-01-20 Thread Seth Wiesman
Yes, Processing time timers that should have fired will fire immediately in order. Event time timers are never *late*, they will just fire when the watermark advances. Seth On Tue, Jan 19, 2021 at 3:58 PM Marco Villalobos wrote: > If there are timers that have been checkpointed (we use rocksd

Re: Remote Stateful Function Scalability

2020-10-20 Thread Seth Wiesman
As a note, I wrote that concepts section before remote functions were implemented. I've made a note to myself to go through and update it. Seth On Sat, Oct 17, 2020 at 9:29 PM Tzu-Li (Gordon) Tai wrote: > Hi Elias, > > On Sun, Oct 18, 2020 at 6:16 AM Elias Levy > wrote: > >> After reading the

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-16 Thread Seth Wiesman
+1 It has been deprecated for some time and the StreamingFileSink has stabalized with a large number of formats and features. Plus, the bucketing sink only implements a small number of stable interfaces[1]. I would expect users to continue to use the bucketing sink from the 1.11 release with futur

[Announce] Flink Forward Global Is Just Around the Corner

2020-10-09 Thread Seth Wiesman
final Keynote for Ververica will be announced next week. As a reminder, the event is virtual and free to attend[1]. There are also a limited number of paid training slots available. Looking forward to seeing everyone virtually soon! https://www.flink-forward.org/ Seth Wiesman - Committer Apache Flink

Re: [DISCUSS] Drop Scala 2.11

2020-09-10 Thread Seth Wiesman
since it's blocking us from upgrading > > certain dependencies. > > > > I would also be in favour of dropping Scala completely but that's a > > different story. > > > > Aljoscha > > > > On 10.09.20 16:51, Seth Wiesman wrote: > >

[DISCUSS] Drop Scala 2.11

2020-09-10 Thread Seth Wiesman
Hi Everyone, Think of this as a pre-flip, but what does everyone think about dropping Scala 2.11 support from Flink. The last patch release was in 2017 and in that time the scala community has released 2.13 and is working towards a 3.0 release. Apache Kafka and Spark have both dropped 2.11 suppor

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Seth Wiesman
Generally +1 The one use case I've seen of union state I've seen in production (outside of sources and sinks) is as a "poor mans" broadcast state. This was obviously before that feature was added which is now a few years ago so I don't know if those pipelines still exist. FWIW, if they do the stat

[Announce] Flink Forward Global Program is now Live

2020-08-13 Thread Seth Wiesman
tbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516 [3] https://www.flink-forward.org/global-2020/training-program Seth Wiesman Flink Forward Global Program Committee Chair Committer Apache Flink

Re: Please help, I need to bootstrap keyed state into a stream

2020-08-12 Thread Seth Wiesman
Just to summarize the conversation so far: The state processor api reads data from a 3rd party system - such as JDBC in this example - and generates a savepoint file that is written out to some DFS. This savepoint can then be used to when starting a flink streaming application. It is a two-step p

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-10 Thread Seth Wiesman
I think this sounds good. +1 On Wed, Aug 5, 2020 at 8:37 PM jincheng sun wrote: > Hi David, Thank you for sharing the problems with the current document, > and I agree with you as I also got the same feedback from Chinese users. I > am often contacted by users to ask questions such as whether Py

Re: [Announce] Flink Forward Global Lineup Released

2020-08-03 Thread Seth Wiesman
+ link https://www.flink-forward.org/global-2020/speakers On Mon, Aug 3, 2020 at 11:25 AM Seth Wiesman wrote: > Hi everyone! > > I'm very excited to announce that the speaker lineup for Flink Forward > Global has been released. This is a fully online conference on October &g

[Announce] Flink Forward Global Lineup Released

2020-08-03 Thread Seth Wiesman
s for training are limited. Thank you to everyone who submitted a talk along with our amazing Program Committee who helped put this lineup together. Best, Seth Wiesman - Program Committee Chair - Flink Forward Global - Committer Apache Flink

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-03 Thread Seth Wiesman
Hi Jincheng, I'm very excited to see the enthusiasm for documentation work but I am concerned about the communities long term ability to maintain this contribution. In particular, I'm concerned that this proposal duplicates a lot of content that will quickly get out of sync. So far the community d

Re: Flink state reconciliation

2020-07-30 Thread Seth Wiesman
That is doable via the state processor API, though Arvid's idea does sound simpler :) You could read the operator with the rules, change the data as necessary and then rewrite it out as a new savepoint to start the job. On Thu, Jul 30, 2020 at 5:24 AM Arvid Heise wrote: > Another idea: since y

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Seth Wiesman
+1 Its time to drop DataSet Flavio, those issues are expected. This FLIP isn't just to drop DataSet but to also add the necessary enhancements to DataStream such that it works well on bounded input. On Thu, Jul 30, 2020 at 8:49 AM Flavio Pompermaier wrote: > Just to contribute to the discussion

Re: Is Flink HIPAA certified

2020-07-01 Thread Seth Wiesman
Hi Prasanna, There are Flink use cases in the US healthcare space, unfortunately, I do not have any public references that I will be able to provide. Some important Flink features that are relevant when working in a field that requires compliance: - SSL: https://ci.apache.org/projects/fli

Re: Dynamic partitioner for Flink based on incoming load

2020-06-24 Thread Seth Wiesman
You can achieve this in Flink 1.10 using the StreamingFileSink. I’d also like to note that Flink 1.11 (which is currently going through release testing and should be available imminently) has support for exactly this functionality in the table API. https://ci.apache.org/projects/flink/flink-docs-

Re: RichAggregationFunction

2020-06-24 Thread Seth Wiesman
Hi Steven, AggregationFunctions (along with Reduce and other “pre aggregation” functions) are not allowed to be Rich. In general if you need to go outside the predefined bounds of what the window operator provides I’d encourage you to take a look at a KeyedProcessFunction. Seth On Wed, Jun 24,

Re: [Announce] Flink Forward Call for Proposals Extended

2020-06-24 Thread Seth Wiesman
https://dev.to/anavasiliuk/5-reasons-why-you-should-consider-presenting-at-flink-forward-global-virtual-2020-4jk Seth On Fri, Jun 19, 2020 at 10:07 AM Israel Ekpo wrote: > Thanks Seth for sharing this. > > I am looking forward to the event. > > On Fri, Jun 19, 2020 at 10:54 AM

[Announce] Flink Forward Call for Proposals Extended

2020-06-19 Thread Seth Wiesman
Hi Everyone! The Call for Presentations for Flink Forward has been extended until *Sunday, June 28, 11:59 pm PST*. We know that tech conferences are not a priority for everyone at this moment, so we wanted to ensure everyone has time to work on their ideas. As a reminder, Flink Forward Global C

Re: Understading Flink statefun deployment

2020-06-11 Thread Seth Wiesman
Hi Francesco, No, that architecture is not possible. I'm not sure if you've used Flink's DataStream API but embedded functions under the hood are very much like lightweight process functions. If you have a single DataStream application with two process functions you cannot scale their workers inde

Re: Stateful functions Harness

2020-05-27 Thread Seth Wiesman
Hi Boris, Example usage of flink sources and sink is available in the documentation[1]. [1] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/io-module/flink-connectors.html On Wed, May 27, 2020 at 1:08 PM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Thats not ex

[Announce] Flink Forward Global 2020 - Call for Proposals

2020-05-14 Thread Seth Wiesman
Hi Everyone! After a successful Virtual Flink Forward in April, we have decided to present our October edition in the same way. In these uncertain times, we are conscious of everyone's health and safety and want to make sure our events are accessible for everyone. Flink Forward Global Conferenc

Re: Assertion failed: (last_ref), function ~ColumnFamilySet, file db/column_family.cc, line 1238

2020-05-08 Thread Seth Wiesman
Gordon is correct. Additionally, if you are using flink 1.10 you may be running into a known bug that has been resolved in 1.10.1 which will be released soon. Seth https://issues.apache.org/jira/browse/FLINK-16313 On Fri, May 8, 2020 at 5:19 AM Tzu-Li (Gordon) Tai wrote: > Hi, > > The last ti

Re: How to list timers registered in timer service?

2020-05-04 Thread Seth Wiesman
Hi Lasse, In the state processor api, KeyedStateReaderFunction#readKey has a parameter called `Context` which you can use to read the registered event time and proc time timers for a key. Best, Seth On Fri, May 1, 2020 at 2:57 AM Lasse Nedergaard < lassenedergaardfl...@gmail.com> wrote: > Hi.

Re: Change to StreamingFileSink in Flink 1.10

2020-04-21 Thread Seth Wiesman
Hi All, There is a bug in the builder that prevents it from compiling in scala due to differences in type inference between java and scala[1]. It as already been resolved for 1.10.1 and 1.11. In the meantime, just go ahead and use casts or construct the object in a java class. Seth [1] https://i

Re: Cannot register for Flink Forward Conference

2020-04-20 Thread Seth Wiesman
Hi Eleanore, There was a misconfiguration on the website if you try again everything should work. Seth On Mon, Apr 20, 2020 at 1:39 PM Eleanore Jin wrote: > Hi community, > > My colleagues tried to register for the Flink forward conference: > https://www.bigmarker.com/series/flink-forward-virt

Re: TypeInformation composition ?

2020-04-11 Thread Seth Wiesman
If the type information for T is stored in a member variable called myTypeInfo you can do something like this. import org.apache.flink.api.common.typeinfo.Types; Types.TUPLE(Types.LONG, myTypeInfo); Seth > On Apr 11, 2020, at 11:06 AM, Laurent Exsteens > wrote: > >  > Hello, > > I have a

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-09 Thread Seth Wiesman
Hi David, +1 to add to the project. I agree that flink.apache.org and flink playgrounds respectively are the best places to host this content. On Thu, Apr 9, 2020 at 2:56 PM Niels Basjes wrote: > Hi, > > Sounds like a very nice thing to have as part of the project ecosystem. > > Niels > > On T

Re: ListState with millions of elements

2020-04-08 Thread Seth Wiesman
There is a limitation in RocksDB's JNI bridge that will cause applications to fail if list state exceeds 2GB. I am not aware of anyone working on this issue. Seth. [1] https://github.com/facebook/rocksdb/issues/2383 On Wed, Apr 8, 2020 at 12:02 PM Aaron Levin wrote: > Hello friendly Flink comm

Re: Creating singleton objects per task manager

2020-04-07 Thread Seth Wiesman
Hi Kristoff, You are correct that, that was a typo :) At most one instance per slot. Seth > On Apr 7, 2020, at 9:41 AM, KristoffSC wrote: > > Hi Seth, > I would like to piggyback on this question :) > > You wrote: > "I would strongly encourage you to create one instance of your object per

Re: State Processor API with Beam

2020-04-06 Thread Seth Wiesman
Hi Stephen, You will need to implement a custom operator and user the `transform` method. It's not just that you need to specify the namespace type but you will also need to look into the beam internals to see how it stores data in flink state, how it translates between beam serializers and flink

Re: Creating singleton objects per task manager

2020-04-06 Thread Seth Wiesman
Hi Salva, One TaskManager == One JVM. There is nothing Flink specific here, you can just create a singleton how you would in any other JVM application. But be careful, if your singleton does any sort of locking/coordination it will quickly become the bottleneck in your application. I would strongl

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-14 Thread Seth Wiesman
@01D3BB7E.BAFAAC20] Seth Wiesman| Software Engineer
4 World Trade Center, 46th Floor, New York, NY 10007
swies...@mediamath.com<mailto:fl...@mediamath.com> From: Seth Wiesman Date: Wednesday, March 14, 2018 at 10:14 AM To: Fabian Hueske , Stefan Richter Cc: "user@flink.apache.org&qu

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-14 Thread Seth Wiesman
/runtime/io/network/partition/consumer/LocalInputChannel.java#L151 [cid:image001.png@01D3BB7D.472CF0B0] Seth Wiesman| Software Engineer
4 World Trade Center, 46th Floor, New York, NY 10007
swies...@mediamath.com<mailto:fl...@mediamath.com> From: Fabian Hueske Date: Tuesday, March 13, 201

Re: PartitionNotFoundException when restarting from checkpoint

2018-03-13 Thread Seth Wiesman
. [cid:image001.png@01D3BAAE.915F15C0] Seth Wiesman| Software Engineer
4 World Trade Center, 46th Floor, New York, NY 10007
swies...@mediamath.com<mailto:fl...@mediamath.com> From: Seth Wiesman Date: Friday, March 9, 2018 at 11:53 AM To: "user@flink.apache.or

PartitionNotFoundException when restarting from checkpoint

2018-03-09 Thread Seth Wiesman
with 3) Right now we are logging everything under org.apache.flink.runtime.io.network, is there anywhere else to look Thank you, [cid:image001.png@01D3B79D.36E45B00] Seth Wiesman| Software Engineer
4 World Trade Center, 46th Floor, New York, NY 10007
swies...@mediamath.com<mailto

Re: 'Custom' mapping function on keyed WindowedStream

2018-02-26 Thread Seth Wiesman
I had to solve a similar problem, we use a process function with rocksdb and map state for the sub keys. So while we hit rocks on every element, only the specified sub keys are ever read from disk. Seth Wiesman| Software Engineer4 World Trade Center, 46th Floor, New York, NY 10007swies

Re: CaseClassTypeInfo fails to deserialize in flink 1.4 with Parent First Class Loading

2018-01-12 Thread Seth Wiesman
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:248) at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:220) ... 4 more Seth

CaseClassTypeInfo fails to deserialize in flink 1.4 with Parent First Class Loading

2018-01-11 Thread Seth Wiesman
1940180]<http://www.mediamath.com/> Seth Wiesman | Software Engineer, Data
 4 World Trade Center, 46th Floor, New York, NY 10007


Re: Watermark in broadcast

2017-12-13 Thread Seth Wiesman
Quick follow up question. Is there some way to notify a TimestampAssigner that is consuming from an idle source? [cid:image001.png@01D3740B.CADE87C0]<http://www.mediamath.com/> Seth Wiesman | Software Engineer, Data
 4 World Trade Center, 46th Floor, New York, NY 10007
 From: Seth W

Re: Watermark in broadcast

2017-12-13 Thread Seth Wiesman
@01D3740A.880106E0]<http://www.mediamath.com/> Seth Wiesman | Software Engineer, Data
 4 World Trade Center, 46th Floor, New York, NY 10007
 From: Timo Walther Date: Wednesday, December 13, 2017 at 11:46 AM To: "user@flink.apache.org" Subject: Re: Watermark in broadcast Hi Set

Watermark in broadcast

2017-12-13 Thread Seth Wiesman
. Is this expected behavior? Currently I have overridden processWatermark1 to unconditionally call processWatermark but that does not seem like an ideal solution. Thank you, [cid:image001.png@01D37402.F5C0B480]<http://www.mediamath.com/> Seth Wiesman | Software Engineer, Data
 4 World

Re: DataStream to Table Api idioms

2017-11-06 Thread Seth Wiesman
Not a problem, thanks for the quick feedback. https://issues.apache.org/jira/browse/FLINK-7999 Seth Wiesman From: Fabian Hueske Date: Monday, November 6, 2017 at 9:14 AM To: Seth Wiesman Cc: user Subject: Re: DataStream to Table Api idioms Hi Seth, I think the Table API is not there yet to

DataStream to Table Api idioms

2017-11-06 Thread Seth Wiesman
but would like to receive partial results say every hour. 3) Do window join time intervals have to be constant or can they depend on row attributes. I am running campaigns that have start and end dates and so I would like my join window to be that interval. Thank you, Seth Wiesman

Re: serialization error when using multiple metrics counters

2017-10-09 Thread Seth Wiesman
A scala class contains a single lazy val it is implemented using a boolean flag to track if the field has been evaluated. When a class contains, multiple lazy val’s it is implemented as a bit mask shared amongst the variables. This can lead to inconsistencies as to whether serialization forces e

Re: [POLL] Who still uses Java 7 with Flink ?

2017-07-13 Thread Seth Wiesman
+1 for dropping java 7 On 7/13/17, 4:59 AM, "Konstantin Knauf" wrote: +1 for dropping Java 7 On 13.07.2017 10:11, Niels Basjes wrote: > +1 For dropping java 1.7 > > On 13 Jul 2017 04:11, "Jark Wu" wrote: > >> +1 for dropping Java 7 >> >> 2017-07-13 9:

Re: Amazon Athena

2017-06-07 Thread Seth Wiesman
Seems straight forward. The biggest challenge is that that you don’t want Athena picking up on partially written files or for whatever reason corrupt files. The issue with S3 is you cannot allow Flink to perform delete, truncate, or rename operations because it moves faster than S3 can become co

OperatorState partioning when recovering from failure

2017-05-04 Thread Seth Wiesman
are uniformly distributed in a temporal manner or if someone had other ideas of how I could mitigate the problem. Thank you, Seth Wiesman

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread Seth Wiesman
pipeline that will in general only hold for my pipeline. This is because there were still some open questions that I had about how to solve consistency issues in the general case. I will comment on the Jira issue with more specific. Seth Wiesman From: vinay patil Reply-To: "

Re: Checkpointing with RocksDB as statebackend

2017-02-27 Thread Seth Wiesman
, Seth Wiesman From: vinay patil Reply-To: "user@flink.apache.org" Date: Saturday, February 25, 2017 at 10:50 AM To: "user@flink.apache.org" Subject: Re: Checkpointing with RocksDB as statebackend HI Stephan, Just to avoid the confusion here, I am using S3 sink for writing

Re: List State in RichWindowFunction leads to RocksDb memory leak

2017-02-24 Thread Seth Wiesman
Also while I’ve got you, is it possible to get the job id from the runtime context? Seth Wiesman From: Seth Wiesman Reply-To: "user@flink.apache.org" Date: Friday, February 24, 2017 at 2:51 PM To: "user@flink.apache.org" Subject: Re: List State in RichWindowFunction lea

Re: List State in RichWindowFunction leads to RocksDb memory leak

2017-02-24 Thread Seth Wiesman
ProcessWindowFunction eventually have access to scoped partitioned state or just timing? There are several things I have coming down the pipeline that require coordination between window evaluations. Thank you again for all the help. Seth Wiesman From: Aljoscha Krettek Reply-To: "

List State in RichWindowFunction leads to RocksDb memory leak

2017-02-23 Thread Seth Wiesman
seem to fail every 5th-7th checkpoint. I am curious if anyone here has any ideas of what I might be able to do to solve this problem. Thank you, Seth Wiesman

Re: state size in relation to cluster size and processing speed

2016-12-23 Thread Seth Wiesman
ContinousFileReaderOperator. Certainly with a larger cluster splits would be processed more quickly and as such the watermark would advance at a quicker pace. Why do you think a more quickly advancing watermark would affect state size in this case? Seth Wiesman From: Aljoscha Krettek Reply-To: "

state size in relation to cluster size and processing speed

2016-12-16 Thread Seth Wiesman
insights you may have. Seth Wiesman

Custom Window Assigner With Lateness

2016-11-08 Thread Seth Wiesman
element arrived after the watermark. Is this currently possible to do in flink? Thank you, Seth Wiesman

ValueState in RichCoFlatMap, possible 1.2-SNAPSHOT regression

2016-10-20 Thread Seth Wiesman
b.com/apache/flink/blob/6f0faf9bb35e7cac3a38ed792cdabd6400fc4c79/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapValueState.java#L88> on updates. Seth Wiesman