Re: DI in flink

2023-02-14 Thread Austin Cawley-Edwards
(note: please keep user@flink.apache.org included in replies) Ah, I see. Then no, this is not provided by Flink. When I've used dependency inject with Flink in the past, I instantiated everything in the `open()` method of the Flink Rich* classes. Could you solve this by having a common base Sink c

Re: DI in flink

2023-02-14 Thread Austin Cawley-Edwards
What would be the benefits and features over what can be done in user land? On Tue, Feb 14, 2023 at 10:41 Yashoda Krishna T wrote: > Hi Austin > > Yes this can be done in Usrr land. > Can we do it in flink land too? > > Thanks > Yashoda > > On Tue, 14 Feb 2023, 9:

Re: DI in flink

2023-02-14 Thread Austin Cawley-Edwards
Hey Yashoda, This can be done in userland (eg with Dagger ) unless you're wanting Flink to do something in addition? Best, Austin On Tue, Feb 14, 2023 at 10:01 AM Yashoda Krishna T < yashoda.kris...@unbxd.com> wrote: > Does flink support dependency injection in flink task f

Re: How to process mini-batch events in Flink with Datastream API

2023-02-10 Thread Austin Cawley-Edwards
It's been a while, but I think I've done something similar before with Async I/O [1] and batching records with a window. This was years ago, so no idea if this was/is good practice, but essentially it was: -> Window by batch size (with a timeout trigger to maintain some SLA) -> Process function t

Re: Deploying Jobmanager on k8s as a Deployment

2022-09-07 Thread Austin Cawley-Edwards
modes >>> and in both cases the JM is deployed as k8s Deployment. >>> >>> During upgrade Flink/operator deletes the deployment after savepoint and >>> waits for termination before it creates a new one with the updated spec. >>> >>> Cheers, >>

Re: Deploying Jobmanager on k8s as a Deployment

2022-09-07 Thread Austin Cawley-Edwards
n both cases the JM is deployed as k8s Deployment. > > During upgrade Flink/operator deletes the deployment after savepoint and > waits for termination before it creates a new one with the updated spec. > > Cheers, > Gyula > > On Mon, 5 Sep 2022 at 07:41, Austin Cawley-Edward

Re: Deploying Jobmanager on k8s as a Deployment

2022-09-05 Thread Austin Cawley-Edwards
Hey Marco, Unfortunately there is no built in k8s API that models an application mode JM exactly but Deployments should be fine, in general. As Gyula notes, where they can be difficult is during application upgrades as Deployments never let their pods exit, even if successful, so there is no way t

Re: Flink config driven tool ?

2022-06-07 Thread Austin Cawley-Edwards
but looks like a spark tool is there something similar in flink? > > Thanks > Sri > > On Tue, Jun 7, 2022 at 12:07 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hey there, >> >> No idea if it's any good, but just saw Apache

Re: Flink config driven tool ?

2022-06-07 Thread Austin Cawley-Edwards
Hey there, No idea if it's any good, but just saw Apache SeaTunnel[1] today which seems to fit your requirements. Best, Austin [1]: https://seatunnel.apache.org/ On Tue, Jun 7, 2022 at 2:19 PM sri hari kali charan Tummala < kali.tumm...@gmail.com> wrote: > Hi Flink Community, > > can someone p

Re: HTTP REST API as Ingress/Egress

2022-05-19 Thread Austin Cawley-Edwards
Hi Himanshu, Unfortunately, this is not supported by Statefun, though this type of application should not be too difficult to using something like the Kafka Request/Reply pattern[1], and putting that in front of a Statefun cluster. Best, Austin [1]: https://dzone.com/articles/synchronous-kafka-u

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-13 Thread Austin Cawley-Edwards
Hi all, Would just like to share an interesting article from the dbt community[1], which in part describes some of their challenges in managing Slack in a large community. The biggest point it seems to make is that their Slack has become a marketing tool for dbt/data vendors instead of a community

Re: Migrating Flink apps across cloud with state

2022-05-03 Thread Austin Cawley-Edwards
Hey Hemanga, That's quite annoying of MirrorMaker to change the offsets on you. One solution would be to use the State Processor API[1] to read the savepoint and update the offsets to the new ones — does MirrorMaker give you these ahead of time? There might also be more specific tricks people coul

Re: how to initialize few things at task managers

2022-04-18 Thread Austin Cawley-Edwards
If you are using Kubernetes to deploy Flink, you could think about an initContainer on the TMs or a custom Docker entry point that does this initialization. Best, Austin On Mon, Apr 18, 2022 at 7:49 AM huweihua wrote: > Hi, Init stuff when task manager comes up is not an option. > But if the Ke

Re: DataStream request / response

2022-04-08 Thread Austin Cawley-Edwards
table/ > > Regards, > Roman > > On Fri, Apr 8, 2022 at 6:56 PM Austin Cawley-Edwards > wrote: > > > > Hi Jason, > > > > No, there is no HTTP source/ sink support that I know of for Flink. > Would running the Spring + Kafka solution in front of Flink

Re: DataStream request / response

2022-04-08 Thread Austin Cawley-Edwards
Hi Jason, No, there is no HTTP source/ sink support that I know of for Flink. Would running the Spring + Kafka solution in front of Flink work for you? On a higher level, what drew you to migrating the microservice to Flink? Best, Austin On Fri, Apr 8, 2022 at 12:35 PM Jason Thomas wrote: > I

Re: Cannot upgrade helm chart

2022-02-21 Thread Austin Cawley-Edwards
Hey Marco, There’s unfortunately no perfect fit here, at least that I know of. A Deployment will make it possible to upgrade the image, but does not support container exits (eg if the Flink job completes, even successfully, K8s will still restart the container). If you are only running long lived

Re: RMQSource non-parallel, seems inconsistent with documentation

2022-02-18 Thread Austin Cawley-Edwards
Hey Daniel, I think you're right that the docs are misleading in this case – anything that extends SourceFunction will always execute at parallelism 1, set parallelism is ignored. Explicitly setting parallelism in the example in the docs is unnecessary and confusing. I personally have only used th

Re: No effect from --allowNonRestoredState or "execution.savepoint.ignore-unclaimed-state" in K8S application mode

2022-02-18 Thread Austin Cawley-Edwards
Hi Andrey, It's unclear to me from the docs[1] if the flink native-kubernetes integration supports setting arbitrary config keys via the CLI. I'm cc'ing Yang Wang, who has worked a lot in this area and can hopefully help us out. Best, Austin [1]: https://nightlies.apache.org/flink/flink-docs-rel

Re: Flink Overwrite parameters in ExecutorUtils

2022-02-18 Thread Austin Cawley-Edwards
Hi Dan, I'm not exactly sure why, but could you help me understand the use case for changing these parameters in Flink SQL? Thanks, Austin On Fri, Feb 18, 2022 at 8:01 AM Zou Dan wrote: > Hi, > I am using Flink Batch SQL in version 1.11. I find that Flink will > overwrite some configurations i

Re: stream consume from kafka after DB scan is done

2021-11-05 Thread Austin Cawley-Edwards
Hey Qihua, If I understand correctly, you should be able to with the HybridSource, released in 1.14 [1] Best, Austin [1]: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/ On Fri, Nov 5, 2021 at 3:53 PM Qihua Yang wrote: > Hi, > > Our stream h

Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-05 Thread Austin Cawley-Edwards
Hi Vijay, I'm not too familiar with the subject, but maybe you could have a look at the flink-faker[1], which generates fake data. I would think you could use it to write to kafka in one Flink job, and then have another Flink job to ingest and run your benchmarks. There is also this microbenchmar

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Austin Cawley-Edwards
st, Austin On Thu, Nov 4, 2021 at 4:09 PM Isidoros Ioannou wrote: > > > -- Forwarded message - > Από: Isidoros Ioannou > Date: Πέμ, 4 Νοε 2021 στις 10:01 μ.μ. > Subject: Re: IterativeCondition instead of SimpleCondition not matching > pattern > To: Aust

Re: GenericWriteAheadSink, declined checkpoint for a finished source

2021-11-04 Thread Austin Cawley-Edwards
Hi James, You are correct that since Flink 1.14 [1] (which included FLIP-147 [2]) there is support for checkpointing after some tasks has finished, which sounds like it will solve this use case. You may also want to look at the JDBC sink[3] which also supports batching, as well as some other nice

Re: IterativeCondition instead of SimpleCondition not matching pattern

2021-11-04 Thread Austin Cawley-Edwards
Hi Isidoros, Thanks for reaching out to the mailing list. I haven't worked with the CEP library in a long time but can try to help. I'm having a little trouble understanding the desired output + rules. Can you mock up the desired output like you have for the fulfilled pattern sequence? Best, Aust

Re: Flink + K8s

2021-11-02 Thread Austin Cawley-Edwards
Hi Rommel, That’s correct that K8s will restart the JM pod (assuming it’s been created by a K8s Job or Deployment), and it will pick up the HA data and resume work. The only use case for having multiple replicas is faster failover, so you don’t have to wait for K8s to provision that new pod (which

Re: HTTP or REST SQL Client

2021-10-02 Thread Austin Cawley-Edwards
Hi Declan, I think the Flink-sql-gateway[1] is what you’re after, though I’m not sure of its current state. I’m cc’ing Ingo, who may be able to help direct us. Best, Austin [1]: https://github.com/ververica/flink-sql-gateway On Sat, Oct 2, 2021 at 10:56 AM Declan Harrison wrote: > Hi All > >

Re: Can we release new flink-connector-redis? Thanks!

2021-08-17 Thread Austin Cawley-Edwards
Hi Hongbo, Thanks for your interest in the Redis connector! I'm not entirely sure what the release process is like for Bahir, but I've pulled in @Robert Metzger who has been involved in the project in the past and can give an update there. Best, Austin On Tue, Aug 17, 2021 at 10:41 AM Hongbo Mi

Re: Recover from savepoints with Kubernetes HA

2021-07-23 Thread Austin Cawley-Edwards
ckpoint pointers) stored in the ConfigMap/ZNode will be deleted. >> >> But it is strange that the "-s/--fromSavepoint" does not take effect when >> redeploying the Flink application. The JobManager logs could help a lot to >> find the root cause. >> >>

Re: Recover from savepoints with Kubernetes HA

2021-07-22 Thread Austin Cawley-Edwards
down, update the start > parameter with the recent savepoint and renamed ‚kubernetes.cluster-id‘ as > well as ‚high-availability.storageDir‘. > > When I trigger a savepoint with cancel, I also see that the HA config maps > are cleaned up. > > > Kr Thomas > > Austi

Re: Recover from savepoints with Kubernetes HA

2021-07-21 Thread Austin Cawley-Edwards
Hi Thomas, I've got a few questions that will hopefully help get to find an answer: What job properties are you trying to change? Something like parallelism? What mode is your job running in? i.e., Session, Per-Job, or Application? Can you also describe how you're redeploying the job? Are you u

Re: Running two versions of Flink with testcontainers

2021-07-13 Thread Austin Cawley-Edwards
Great, glad it worked out for you! On Tue, Jul 13, 2021 at 10:32 AM Farouk wrote: > Thanks > > Finally I tried by running docker commands (thanks for the documentation) > and it works fine. > > Thanks > Farouk > > Le mar. 13 juil. 2021 à 15:48, Austin Cawley-Edwards

Re: Running two versions of Flink with testcontainers

2021-07-13 Thread Austin Cawley-Edwards
You might also be able to put them in separate networks[1] to get around changing all the ports and still ensuring that they don't see eachother. [1]: https://www.testcontainers.org/features/networking#advanced-networking On Tue, Jul 13, 2021 at 9:07 AM Chesnay Schepler wrote: > It is possible

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-07-01 Thread Austin Cawley-Edwards
Hi Shilpa, >> >> JobType was introduced in 1.13. So I guess the cause is that the client >> which creates and submit >> the job is still 1.12.2. The client generates a outdated job graph which >> does not have its JobType >> set and resulted in this NPE

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Austin Cawley-Edwards
Hi Shilpa, Thanks for reaching out to the mailing list and providing those logs! The NullPointerException looks odd to me, but in order to better guess what's happening, can you tell me a little bit more about what your setup looks like? How are you deploying, i.e., standalone with your own manife

Re: Protobuf + Confluent Schema Registry support

2021-06-30 Thread Austin Cawley-Edwards
Hi Vishal, I don't believe there is another way to solve the problem currently besides rolling your own serializer. For the Avro + Schema Registry format, is this Table API format[1] what you're referring to? It doesn't look there have been discussions around adding a similar format for Protobuf

Re: Flink PrometheusReporter support for HTTPS

2021-06-16 Thread Austin Cawley-Edwards
>> have multiple JMs & TMs and cannot use static scrape targets >> >> Regards, >> Ashutosh >> >> On Sun, Jun 13, 2021 at 2:25 AM Austin Cawley-Edwards < >> austin.caw...@gmail.com> wrote: >> >>> Hi Ashutosh, >>> >>>

Re: Flink PrometheusReporter support for HTTPS

2021-06-12 Thread Austin Cawley-Edwards
Hi Ashutosh, How are you deploying your Flink apps? Would running a reverse proxy like Nginx or Envoy that handles the HTTPS connection work for you? Best, Austin On Sat, Jun 12, 2021 at 1:11 PM Ashutosh Uttam wrote: > Hi All, > > Does PrometheusReporter provide support for HTTPS?. I couldn't

Re: RabbitMQ source does not stop unless message arrives in queue

2021-05-18 Thread Austin Cawley-Edwards
ing an exception in the Flink job is not the same as triggering a > stateful shutdown, but it might be hitting similar unack issues. > > John > > -- > *From:* Austin Cawley-Edwards > *Sent:* Thursday 13 May 2021 16:49 > *To:* Jose Vargas ; John Morr

Re: Helm chart for Flink

2021-05-18 Thread Austin Cawley-Edwards
be possible to use Helm pre-upgrade hook to take > savepoint and stop currently running job and then Helm will upgrade image > tags. The problem is that if you hit timeout while taking savepoint, it is > not clear how to recover from this situation > > Alexey > ---------

Re: Helm chart for Flink

2021-05-17 Thread Austin Cawley-Edwards
Hi Pedro, There is currently no official Kubernetes Operator for Flink and, by extension, there is no official Helm chart. It would be relatively easy to create a chart for simply deploying standalone Flink resources via the Kubernetes manifests described here[1], though it would leave out the abi

Re: RabbitMQ source does not stop unless message arrives in queue

2021-05-13 Thread Austin Cawley-Edwards
Hey Jose, Thanks for bringing this up – it indeed sounds like a bug. There is ongoing work to update the RMQ source to the new interface, which might address some of these issues (or should, if it is not already), tracked in FLINK-20628[1]. Would you be able to create a JIRA issue for this, or wou

Re: Apache Flink - A question about Tables API and SQL interfaces

2021-05-13 Thread Austin Cawley-Edwards
Please let > me know if I missed anything. > > Thanks again. > > > On Wednesday, May 12, 2021, 04:36:06 PM EDT, Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > > > Hi Mans, > > I don't believe there are explicit triggers/evictors/timers in the

Re: Regarding Stateful Functions

2021-05-12 Thread Austin Cawley-Edwards
Hey Jessy, I'm not a Statefun expert but, hopefully, I can point you in the right direction for some of your questions. I'll also cc Gordan, who helps to maintain Statefun. *1. Is the stateful function a good candidate for a system(as above) that > should process incoming requests at the rate of

Re: Apache Flink - A question about Tables API and SQL interfaces

2021-05-12 Thread Austin Cawley-Edwards
Hi Mans, I don't believe there are explicit triggers/evictors/timers in the Table API/ SQL, as that is abstracted away from the lower-level DataStream API. If you need to get into the fine-grained details, Flink 1.13 has made some good improvements in going from the Table API to the DataStream API

Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
I know @Aaron Levin is using `rules_scala` for building Flink apps, perhaps he can help us out here (and hope he doesn't mind the ping). On Wed, May 12, 2021 at 4:13 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Yikes, I see what you mean. I also can not get

Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
Yikes, I see what you mean. I also can not get `neverlink` or adding the org.scala.lang artifacts to the deploy_env to remove them from the uber jar. I'm not super familiar with sbt/ scala, but do you know how exactly the assembly `includeScala` works? Is it just a flag that is passed to scalac?

Re: Setup of Scala/Flink project using Bazel

2021-05-12 Thread Austin Cawley-Edwards
Hi Salva, I think you're almost there. Confusion is definitely not helped by the ADDONS/ PROVIDED_ADDONS thingy – I think I tried to get too fancy with that in the linked thread. I think the only thing you have to do differently is to adjust the target you are building/ deploying – instead of `//

Re: StopWithSavepoint() method doesn't work in Java based flink native k8s operator

2021-05-04 Thread Austin Cawley-Edwards
Hey all, Thanks for the ping, Matthias. I'm not super familiar with the details of @Yang Wang 's operator, to be honest :(. Can you share some of your FlinkApplication specs? For the `kubectl logs` command, I believe that just reads stdout from the container. Which logging framework are you using

Re: Setup of Scala/Flink project using Bazel

2021-05-04 Thread Austin Cawley-Edwards
Great! Feel free to post back if you run into anything else or come up with a nice template – I agree it would be a nice thing for the community to have. Best, Austin On Tue, May 4, 2021 at 12:37 AM Salva Alcántara wrote: > Hey Austin, > > There was no special reason for vendoring using `bazel-

Re: Setup of Scala/Flink project using Bazel

2021-05-03 Thread Austin Cawley-Edwards
Hey Salva, This appears to be a bug in the `bazel-deps` tool, caused by mixing scala and Java dependencies. The tool seems to use the same target name for both, and thus produces duplicate targets (one for scala and one for java). If you look at the dict lines that are reported as conflicting, yo

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-21 Thread Austin Cawley-Edwards
t; > On Tue, Apr 20, 2021 at 8:11 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi Sambaran, >> >> I'm not sure if this is the best approach, though I don't know your full >> use case/ implementation. >> >> What

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-20 Thread Austin Cawley-Edwards
tStreamOperator . > Can you please let me know if this is the right approach and if yes, how do > we map the SingleOutputStreamOperator to the preparedstatement in > JdbcStatementBuilder? > > Thanks for your help! > > Regards > Sambaran > > On Tue, Apr 20, 2021 at 6:3

Re: Max-parellelism limitation

2021-04-20 Thread Austin Cawley-Edwards
Hi Olivier, Someone will correct me if I'm wrong, but I believe the max-parallelism limitation, where you cannot scale up past the previously defined max-parallelism, applies to all stateful jobs no matter which type of state you are using. If you haven't seen it already, I think the Production R

Re: Are configs stored as part of savepoints

2021-04-20 Thread Austin Cawley-Edwards
Hi Guarav, Which configs are you referring to? Everything usually stored in `flink-conf.yaml`[1]? The State Processor API[2] is also a good resource to understand what is actually stored, and how you can access it outside of a running job. The SavepointMetadata class[3] is another place to referen

Re: Alternatives to JDBCAppendTableSink in Flink 1.11

2021-04-20 Thread Austin Cawley-Edwards
Hey Sambaran, I'm not too familiar with the 1.7 JDBCAppendTableSink, but to make sure I understand what you're current solution looks like, it's something like the following, where you're triggering a procedure on each element of a stream? JDBCAppendTableSink sink = JDBCAppendTableSink.buil

Re: Flink support for Kafka versions

2021-04-20 Thread Austin Cawley-Edwards
Hi Prasanna, It looks like the Kafka 2.5.0 connector upgrade is tied to dropping support for Scala 2.11. The best place to track that would be the ticket for Scala 2.13 support, FLINK-13414 [1], and its subtask FLINK-20845 [2]. I have listed FLINK-20845 as a blocker for FLINK-19168 for better vis

Re: CRD compatible with native and standalone mode

2021-04-20 Thread Austin Cawley-Edwards
Hi Gaurav, I think the name "Native Kubernetes" is a bit misleading – this just means that you can use the Flink CLI/ scripts to run Flink applications on Kubernetes without using the Kubernetes APIs/ kubectl directly. What features are you looking to use in the native mode? I think it would be d

Re: Async + Broadcast?

2021-04-07 Thread Austin Cawley-Edwards
Hey Alex, I'm not sure if there is a best practice here, but what I can tell you is that I worked on a job that did exactly what you're suggesting with a non-async operator to create a [record, config] tuple, which was then passed to the async stage. Our config objects were also not tiny (~500kb)

Re: Flink - Pod Identity

2021-04-06 Thread Austin Cawley-Edwards
for your help and support. Especially Austin, he stands > out due to his interest in the issue and helping to find ways to resolve it. > > Regards, > Swagat > > On Tue, Apr 6, 2021 at 2:35 AM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> And ac

Re: Flink - Pod Identity

2021-04-05 Thread Austin Cawley-Edwards
e.org/jira/browse/FLINK-18676 On Mon, Apr 5, 2021 at 4:53 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > That looks interesting! I've also found the full list of S3 properties[1] > for the version of presto-hive bundled with Flink 1.12 (see [2]), which > include

Re: Flink - Pod Identity

2021-04-05 Thread Austin Cawley-Edwards
at are your thoughts on this? > > presto.s3.credentials-provider > > > On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> I've confirmed that for the bundled + shaded aws dependency, the only way >> to upgrade i

Re: Flink - Pod Identity

2021-04-05 Thread Austin Cawley-Edwards
>> The webhook can make the STS call for the service account to role mapping. >> A side car container to which the main container has no access can even >> renew credentials becoz STS returns temp credentials. >> >> Sent from my iPhone >> >> On Apr 3, 2021, at 10:

Re: Flink - Pod Identity

2021-04-03 Thread Austin Cawley-Edwards
://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html [2]: https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#kubernetes-service-account On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Can you desc

Re: Flink - Pod Identity

2021-04-03 Thread Austin Cawley-Edwards
s access to S3, shouldn't we have a > simpler mechanism to connect to underlying resources based on the service > account authorization? > > On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi Swagat, >> >> I’ve used

Re: Flink - Pod Identity

2021-04-03 Thread Austin Cawley-Edwards
Hi Swagat, I’ve used kube2iam[1] for granting AWS access to Flink pods in the past with good results. It’s all based on mapping pod annotations to AWS IAM roles. Is this something that might work for you? Best, Austin [1]: https://github.com/jtblin/kube2iam On Sat, Apr 3, 2021 at 10:40 AM Swaga

Re: Print on screen DataStream content

2020-11-23 Thread Austin Cawley-Edwards
Hey Simone, I'd suggest trying out the `DataStream#print()` function to start, but there are a few other easy-to-integrate sinks for testing that you can check out in the docs here[1] Best, Austin [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/datastream_api.html#data-sink

Re: Un-ignored Parsing Exceptions in the CsvFormat

2020-10-22 Thread Austin Cawley-Edwards
> > > On Fri, Oct 16, 2020 at 8:32 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hey all, >> >> I'm ingesting CSV files with Flink 1.10.2 using SQL and the CSV >> Format[1]. >> >> Even with the `ignoreParseE

Re: Flink Kubernetes / Helm

2020-10-16 Thread Austin Cawley-Edwards
We use the Ververica Platform and have built an operator for it here[1] :) and use Helm with it as well. Best, Austin [1]: https://github.com/fintechstudios/ververica-platform-k8s-operator On Fri, Oct 16, 2020 at 3:12 PM Dan Hill wrote: > What libraries do people use for running Flink on Kube

Un-ignored Parsing Exceptions in the CsvFormat

2020-10-16 Thread Austin Cawley-Edwards
Hey all, I'm ingesting CSV files with Flink 1.10.2 using SQL and the CSV Format[1]. Even with the `ignoreParseErrors()` set, the job fails when it encounters some types of malformed rows. The root cause is indeed a `ParseException`, so I'm wondering if there's anything more I need to do to ignore

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-09 Thread Austin Cawley-Edwards
e MAX_WATERMARK when the > pipeline shuts down even without having a timestamp assigned in the > StreamRecord. Watermark will leave SQL also without a time attribute as > far as I know. > > Regards, > Timo > > > On 08.10.20 17:38, Austin Cawley-Edwards wrote: > > He

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-08 Thread Austin Cawley-Edwards
y absolute path >>> reference. However, the actual log calls are not printing to the console. >>> Only errors appear in my terminal window and the test logs. Maybe console >>> logger does not work for this junit setup. I'll see if the file version >>>

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-08 Thread Austin Cawley-Edwards
; > > > > > On 05.10.20 09:25, Till Rohrmann wrote: > >> Hi Austin, > >> > >> thanks for offering to help. First I would suggest asking Timo whether > >> this is an aspect which is still missing or whether we overlooked it. > >> Based on tha

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-06 Thread Austin Cawley-Edwards
> ["-Dlog4j.configuration=file:///Users/danhill/code/src/ai/promoted/logprocessor/batch/log4j.properties"], > >> ) > >> > >> # log4j.properties > >> status = error > >> name = Log4j2PropertiesConfig > >> appenders = console > >>

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-06 Thread Austin Cawley-Edwards
82.html On Tue, Oct 6, 2020 at 6:32 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Unless it's related to this issue[1], which was w/ my JOIN and time > characteristics, though not sure that applies for batch. > > Best, > Austin > > [1]: > apache-fli

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-06 Thread Austin Cawley-Edwards
20 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Dan, > > We use Junit5 and Bazel to run Flink SQL tests on a mini cluster and > haven’t had issues, though we’re only testing on streaming jobs. > > Happy to help setting up logging with that if you’d like.

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-06 Thread Austin Cawley-Edwards
Hey Dan, We use Junit5 and Bazel to run Flink SQL tests on a mini cluster and haven’t had issues, though we’re only testing on streaming jobs. Happy to help setting up logging with that if you’d like. Best, Austin On Tue, Oct 6, 2020 at 6:02 PM Dan Hill wrote: > I don't think any of the gotch

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-02 Thread Austin Cawley-Edwards
/streaming/time_attributes.html > > Cheers, > Till > > On Fri, Oct 2, 2020 at 12:10 AM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hey Till, >> >> Just a quick question on time characteristics -- this should work for >> Ingestio

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
conversion? I'm currently getting the `Record has Long.MIN_VALUE timestamp (= no timestamp marker)` error, though I'm seeing timestamps being assigned if I step through the AutomaticWatermarkContext. Thanks, Austin On Thu, Oct 1, 2020 at 10:52 AM Austin Cawley-Edwards < austin.caw...@gma

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
s, > Till > > On Thu, Oct 1, 2020 at 12:01 AM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hey all, >> >> Thanks for your patience. I've got a small repo that reproduces the issue >> here: https://github.com/austince/flink-1.10-sql-w

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-30 Thread Austin Cawley-Edwards
Hey all, Thanks for your patience. I've got a small repo that reproduces the issue here: https://github.com/austince/flink-1.10-sql-windowing-error Not sure what I'm doing wrong but it feels silly. Thanks so much! Austin On Tue, Sep 29, 2020 at 3:48 PM Austin Cawley-Edwards &l

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-29 Thread Austin Cawley-Edwards
om window trigger)? This would help us to better understand your > problem. > > I am also pulling in Klou and Timo who might help with the windowing logic > and the Table to DataStream conversion. > > Cheers, > Till > > On Mon, Sep 28, 2020 at 11:49 PM Austin Cawley-Edwards

Streaming SQL Job Switches to FINISHED before all records processed

2020-09-28 Thread Austin Cawley-Edwards
Hey all, I'm not sure if I've missed something in the docs, but I'm having a bit of trouble with a streaming SQL job that starts w/ raw SQL queries and then transitions to a more traditional streaming job. I'm on Flink 1.10 using the Blink planner, running locally with no checkpointing. The job l

Re: Apache Qpid connector.

2020-09-25 Thread Austin Cawley-Edwards
Hey (Master) Parag, I don't know anything about Apache Qpid, but from the homepage[1], it looks like the protocol is just AMQP? Are there more specifics than that? If it is just AMQP would the RabbitMQ connector[2] work for you? Best, Austin [1]: https://qpid.apache.org/ [2]: https://ci.apache.o

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-31 Thread Austin Cawley-Edwards
could > you please update your current progress? > > Best, > > Arvid > > On Thu, Aug 27, 2020 at 11:41 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Ah, I think the "Result Updating" is what got me -- INNER joins do the

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Ah, I think the "Result Updating" is what got me -- INNER joins do the job! On Thu, Aug 27, 2020 at 3:38 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > oops, the example query should actually be: > > SELECT table_1.a, table_1.b, table_2.c > FROM table_1

Re: Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
data a 1", b = "data b 1", c = null) Record(a = "data a 2", b = "data b 2", c = "data c 2") Record(a = "data a 2", b = "data b 2", c = null) On Thu, Aug 27, 2020 at 3:34 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: &

Flink SQL Streaming Join Creates Duplicates

2020-08-27 Thread Austin Cawley-Edwards
Hey all, I've got a Flink 1.10 Streaming SQL job using the Blink Planner that is reading from a few CSV files and joins some records across them into a couple of data streams (yes, this could be a batch job won't get into why we chose streams unless it's relevant). These joins are producing some d

Re: K8s operator for dlink

2020-08-14 Thread Austin Cawley-Edwards
Hey Narasimha, We use an operator at FinTech Studios[1] (built by me) to deploy Flink via the Ververica Platform[2]. We've been using it in production for the past 7 months with no "show-stopping" bugs, and know some others have been experimenting with bringing it to production as well. Best, Aus

Re: Two Queries and a Kafka Topic

2020-08-10 Thread Austin Cawley-Edwards
for each operator that needs the *data** 🤦‍♂️ * On Mon, Aug 10, 2020 at 3:58 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey all, > > A bit late here and I’m not sure it’s totally valuable, but we have a > similar job where we need to query an external data

Re: Two Queries and a Kafka Topic

2020-08-10 Thread Austin Cawley-Edwards
Hey all, A bit late here and I’m not sure it’s totally valuable, but we have a similar job where we need to query an external data source on startup before processing the main stream as well. Instead of making that request in the Jobmanager process when building the graph, we make those requests

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
On Tue, Jul 7, 2020 at 10:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Xiaolong, > > Thanks for the suggestions. Just to make sure I understand, are you saying > to run the download and decompression in the Job Manager before executing > the job? >

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
me supported filesystem. Decompressing the file is supported for > selected formats (deflate, gzip, bz2, xz), but this seems to be an > undocumented feature, so I'm not sure how usable it is in reality. > > On 07/07/2020 01:30, Austin Cawley-Edwards wrote: > > Hey all,

Decompressing Tar Files for Batch Processing

2020-07-06 Thread Austin Cawley-Edwards
Hey all, I need to ingest a tar file containing ~1GB of data in around 10 CSVs. The data is fairly connected and needs some cleaning, which I'd like to do with the Batch Table API + SQL (but have never used before). I've got a small prototype loading the uncompressed CSVs and applying the necessar

Re: Does anyone have an example of Bazel working with Flink?

2020-06-18 Thread Austin Cawley-Edwards
a` and `rules_java` are enough for us at > this point. > > It's entirely possible I'm not thinking far enough into the future, > though, so don't take our lack of investment as a sign it's not worth > investing in :) > > Best, > > Aaron Levin > > On Thu

Re: Does anyone have an example of Bazel working with Flink?

2020-06-18 Thread Austin Cawley-Edwards
ave to add a MANIFEST.MF >> file to your jar under META-INF and this file needs to contain `Main-Class: >> org.a.b.Foobar`. >> >> Cheers, >> Till >> >> On Fri, Jun 12, 2020 at 12:30 AM Austin Cawley-Edwards < >> austin.caw...@gmail.com> wrote: &

Re: Does anyone have an example of Bazel working with Flink?

2020-06-11 Thread Austin Cawley-Edwards
Hey all, Adding to Aaron's response, we use Bazel to build our Flink apps. We've open-sourced some of our setup here[1] though a bit outdated. There are definitely rough edges/ probably needs a good deal of work to fit other setups. We have written a wrapper around the `java_library` and `java_bin

Re: Single stream, two sinks

2020-03-05 Thread Austin Cawley-Edwards
We have the same setup and it works quite well. One thing to take into account is that your HTTP call may happen multiple times if you’re using checkpointing/ fault tolerance mechanism, so it’s important that those calls are idempotent and won’t duplicate data. Also we’ve found that it’s important

Re: StreamingFileSink Not Flushing All Data

2020-03-04 Thread Austin Cawley-Edwards
ed. > > Cheers, > Kostas > > [1] https://issues.apache.org/jira/browse/FLINK-13634 > > > On Tue, Mar 3, 2020 at 11:13 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > >> Hi all, >> >> Thanks for the docs pointer/ FLIP Rafi, and th

Re: StreamingFileSink Not Flushing All Data

2020-03-03 Thread Austin Cawley-Edwards
ere's a FLIP >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-46%3A+Graceful+Shutdown+Handling+by+UDFs> >> and >> an issue <https://issues.apache.org/jira/browse/FLINK-13103> about >> fixing this. I'm not sure what's the status though,

Re: StreamingFileSink Not Flushing All Data

2020-03-02 Thread Austin Cawley-Edwards
s flushed. > > > > I think you can also adjust that behavior with: > > > > forBulkFormat(...) > > > > .withRollingPolicy(/* your custom logic */) > > > > I also cc Kostas who should be able to correct me if I am wrong. > > > > Best, > > > >

  1   2   >