Re: [ANNOUNCE] Welcome Dmitri Bourlatchkov, Dennis Huo and Yufei Gu to the Apache Polaris PPMC

2025-03-27 Thread Talat Uyarer
Congrats folks! On Thu, Mar 27, 2025 at 10:35 AM Rulin Xing wrote: > Congrats folks! > > Rulin >

Re: [URGENT] Re: [Action Required] GSoC 2025 is coming

2025-02-12 Thread Talat Uyarer via dev
e some progress on > managed IO but there is still more to be done there. > > > > Adding folks, who can help you with these questions and see if these > projects still makes sense this year as well. : @Danny McCormick > @Talat Uyarer @Chamikara > Jayalath > > > >

Apache Beam Summit 2024 is September 4 2024!

2024-08-15 Thread Talat Uyarer via user
Hey Flink Users, Heads up! The Apache Beam Summit 2024 is happening next month( September 04, 2024) at the Google Campus in Sunnyvale, CA. It's two days packed with learning and networking with the Beam community. Key highlights: - Improving Stability for Running Python SDK with Flink Runner: im

Re: Issue in PrefetchCount

2024-05-07 Thread Talat Uyarer
Hi ajay, When you have 3 parallelisms you will have 3 independent clients. If you want to keep prefetch count 3 you need to set setRequestedChannelMax as 1 and setParallelism 3. So All 3 clients can have one connection. Talat On Tue, May 7, 2024 at 5:52 AM ajay pandey wrote: > Hi Flink Team, >

Re: Issue in PrefetchCount

2024-05-07 Thread Talat Uyarer via user
Hi ajay, When you have 3 parallelisms you will have 3 independent clients. If you want to keep prefetch count 3 you need to set setRequestedChannelMax as 1 and setParallelism 3. So All 3 clients can have one connection. Talat On Tue, May 7, 2024 at 5:52 AM ajay pandey wrote: > Hi Flink Team, >

Re: Exception in Flink 1.18 (Time should be non negative)

2024-05-07 Thread Talat Uyarer via user
Hi Lasse, If there's a significant difference in the system time between Flink TaskManagers, it can lead to negative time calculations when comparing timestamps from different sources. On Mon, May 6, 2024 at 5:40 AM Lasse Nedergaard < lassenedergaardfl...@gmail.com> wrote: > Hi. > > In Flink job

Re: Evolving Flink SQL statement set and restoring from savepoints

2024-05-07 Thread Talat Uyarer via user
Hi Keith, When you add a new insert statement to your EXECUTE STATEMENT you change your job graph with independent two graphs.Unfortunately, Flink doesn't currently provide a way to directly force specific UIDs for operators through configuration or SQL hints. This is primarily due to how Flink's

[jira] [Created] (FLINK-33530) Add entropy to Google Cloud Storage path for better scalability

2023-11-12 Thread Talat Uyarer (Jira)
Talat Uyarer created FLINK-33530: Summary: Add entropy to Google Cloud Storage path for better scalability Key: FLINK-33530 URL: https://issues.apache.org/jira/browse/FLINK-33530 Project: Flink

[jira] [Created] (FLINK-33530) Add entropy to Google Cloud Storage path for better scalability

2023-11-12 Thread Talat Uyarer (Jira)
Talat Uyarer created FLINK-33530: Summary: Add entropy to Google Cloud Storage path for better scalability Key: FLINK-33530 URL: https://issues.apache.org/jira/browse/FLINK-33530 Project: Flink

Re: Count(distinct) not working in beam sql

2023-11-11 Thread Talat Uyarer via user
Hi, I saw this a little bit late. I implement a custom count distinct for our streaming use case. If you are looking for something close enough but not exact you can use my UDF. It uses the HyperLogLogPlus algorithm, which is an efficient and scalable way to estimate cardinality with a controlled

[jira] [Commented] (FLINK-32818) Support region failover for adaptive scheduler

2023-09-12 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764298#comment-17764298 ] Talat Uyarer commented on FLINK-32818: -- Thank you [~fanrui] I look forward you

[jira] [Commented] (FLINK-32818) Support region failover for adaptive scheduler

2023-09-09 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763366#comment-17763366 ] Talat Uyarer commented on FLINK-32818: -- Hey [~fanrui] At Palo Alto Network

[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator

2023-07-30 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748908#comment-17748908 ] Talat Uyarer commented on FLINK-32700: -- Our customers are ok to lose data if

[jira] [Comment Edited] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator

2023-07-28 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748759#comment-17748759 ] Talat Uyarer edited comment on FLINK-32700 at 7/29/23 1:3

[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator

2023-07-28 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748759#comment-17748759 ] Talat Uyarer commented on FLINK-32700: -- [~gyfora] Currently there is an issu

Re: Scaling Flink Jobs without Restarting Job

2023-07-24 Thread Talat Uyarer via dev
rtitions (for Kafka sources for example) and requires some coordination > works. > > Best, > Zhanghao Chen > -- > *发件人:* Talat Uyarer via dev > *发送时间:* 2023年7月23日 15:28 > *收件人:* dev > *主题:* Scaling Flink Jobs without Restarting Job > > HI

Scaling Flink Jobs without Restarting Job

2023-07-23 Thread Talat Uyarer via dev
HI, We are using Flink with Adaptive Scheduler(Reactive Mode) on Kubernetes with Standalone deployment Application mode for our streaming infrastructure. Our autoscaler is scaling up or down our jobs. However, each scale action causes a job restart. Our customers complain about fluctuating traffi

Local Combiner for GroupByKey on Flink Streaming jobs

2023-05-23 Thread Talat Uyarer via dev
Sorry for cross posting -- Forwarded message - From: Talat Uyarer Date: Fri, May 19, 2023, 2:25 AM Subject: Local Combiner for GroupByKey on Flink Streaming jobs To: Hi, I have a stream aggregation job which is running on Flink 1.13 I generate DAG by using Beam SQL. My SQL

Re: Watermark Alignment on Flink Runner's UnboundedSourceWrapper

2023-05-23 Thread Talat Uyarer via dev
.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_FLINK_FLIP-2D182-253A-2BSupport-2Bwatermark-2Balignment-2Bof-2BFLIP-2D27-2BSources&d=DwMDaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=aecDGh6mOmdF13fznxqZ2eCSP--lTT02C2dNHJ

Watermark Alignment on Flink Runner's UnboundedSourceWrapper

2023-05-22 Thread Talat Uyarer via dev
Maybe the User list does not have knowledge about this. That's why I also resend on the Dev list. Sorry for cross posting Hi All, I have a stream aggregation job which reads from Kafka and writes some Sinks. When I submit my job Flink checkpoint size keeps increasing if I use unaligned checkpoi

Watermark Alignment on Flink Runner's UnboundedSourceWrapper

2023-05-19 Thread Talat Uyarer via user
Hi All, I have a stream aggregation job which reads from Kafka and writes some Sinks. When I submit my job Flink checkpoint size keeps increasing if I use unaligned checkpoint settings and it does not emit any window results. If I use an aligned checkpoint, size is somewhat under control(still bi

Local Combiner for GroupByKey on Flink Streaming jobs

2023-05-19 Thread Talat Uyarer via user
Hi, I have a stream aggregation job which is running on Flink 1.13 I generate DAG by using Beam SQL. My SQL query has a TUMBLE window. Basically My pipeline reads from kafka aggregate, counts/sums some values by streamin aggregation and writes a Sink. BeamSQl uses Groupbykey for the aggregation p

Flink Kubernetes Operator Scale Issue

2023-04-27 Thread Talat Uyarer via user
Hi All, We are using Flink Kubernetes Operator on our production. We have 3k+ jobs in standalone mode. But after 2.5k jobs operator getting slow. Now when we submit a job it takes 10+ minutes to the job runs. Does anyone use similar scale or more job ? Now we run as a single pod. Does operator su

Re: Task Failure Strategy for Adaptive Scheduler

2023-04-18 Thread Talat Uyarer via user
; It's doable but might not be straightforward since the AS recycles > ExecutionGraph during restart. It has been a low priority so far because > it's mainly valuable for batch jobs, but we might reconsider it if there > are enough use cases. > > Best, > D. > > On Sat, Apr

Re: Task Failure Strategy for Adaptive Scheduler

2023-04-14 Thread Talat Uyarer via user
, > D. > > On Tue, Apr 11, 2023 at 4:35 AM Weihua Hu wrote: > >> Hi, >> >> AFAIK, the reactive mode always restarts the whole pipeline now. >> >> Best, >> Weihua >> >> >> On Tue, Apr 11, 2023 at 8:38 AM Talat Uyarer via user < >&

Task Failure Strategy for Adaptive Scheduler

2023-04-10 Thread Talat Uyarer via user
Hi All, We use Flink 1.13 with reactive mode for our streaming jobs. When we have an issue/exception on our pipeline. Flink rescheduled all tasks. Is there any way to reschedule only task that had exceptions ? Thanks

Re: Beam SQL Alias issue while using With Clause

2023-03-02 Thread Talat Uyarer via dev
Vr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e= > > I'm experimenting with ideas on how to work around this but a fix will > likely require a Calcite upgrade, which is not something I'd have time > to help with. (I'm not on the Google Beam team anymore.) >

Re: Beam SQL Alias issue while using With Clause

2023-03-02 Thread Talat Uyarer via user
Vr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e= > > I'm experimenting with ideas on how to work around this but a fix will > likely require a Calcite upgrade, which is not something I'd have time > to help with. (I'm not on the Google Beam team anymore.) >

Re: Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Talat Uyarer via dev
Would you like to be a volunteer +Andrew Pilloud :) On Thu, Feb 23, 2023 at 4:51 PM Andrew Pilloud wrote: > The bot says there are no reviewers for Flink. Possibly you'll find a > volunteer to review it here? > > On Thu, Feb 23, 2023 at 4:47 PM Talat Uyarer via dev > wro

Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Talat Uyarer via dev
Hi, I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and try to run assign reviewer command. But I am not sure. I pressed the right button :) If you know some who can review this change https://github.com/apache/beam/pull/25554 Could you assign him/her to this mr ? Thanks

Re: Beam SQL Alias issue while using With Clause

2023-02-22 Thread Talat Uyarer via user
ovider); > +PCollection stream = > +BeamSqlRelUtils.toPCollection( > +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM > basicRowTestTable WHERE basicRowTestTable.col.string_field = 'innerStr') > SELECT * FROM tempTable")); > +Schema outputS

Re: Beam SQL Alias issue while using With Clause

2023-02-22 Thread Talat Uyarer via dev
ovider); > +PCollection stream = > +BeamSqlRelUtils.toPCollection( > +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM > basicRowTestTable WHERE basicRowTestTable.col.string_field = 'innerStr') > SELECT * FROM tempTable")); > +Schema outputS

Re: Beam SQL Alias issue while using With Clause

2023-02-03 Thread Talat Uyarer via dev
t;>> >>> One minor issue I encountered. It took me a while to get your test case >>> running, it doesn't appear there are any calcite gradle rules to run >>> CoreQuidemTest and constructing the classpath manually was tedious. Did I >>> miss something? >&g

Re: Beam SQL Alias issue while using With Clause

2023-02-03 Thread Talat Uyarer via user
t;>> >>> One minor issue I encountered. It took me a while to get your test case >>> running, it doesn't appear there are any calcite gradle rules to run >>> CoreQuidemTest and constructing the classpath manually was tedious. Did I >>> miss something? >&g

Re: How to submit beam python pipeline to GKE flink cluster

2023-02-03 Thread Talat Uyarer via user
Hi, Do you use Flink operator or manually deployed session cluster ? Thanks On Fri, Feb 3, 2023, 4:32 AM P Singh wrote: > Hi Team, > > I have set up a flink cluster on GKE and am trying to submit a beam > pipeline with below options. I was able to run this on a local machine but > I don't unde

Re: Reducing Checkpoint Count for Chain Operator

2023-02-02 Thread Talat Uyarer via user
Dataflow has similar behavior with checkpoint support they call pipeline fusion. [1] [1] https://cloud.google.com/dataflow/docs/pipeline-lifecycle#fusion_optimization Thanks On Thu, Feb 2, 2023 at 9:25 AM Schwalbe Matthias < matthias.schwa...@viseca.ch> wrote: > Hi Talat Uyarer, > > >

Beam Portable Framework Question for Same SDK and Runner Language

2023-02-02 Thread Talat Uyarer via user
Hi, I know we use the portability framework when the sdk language (python) is different from the runner's language(java) . If my runner is Java based and I want to use the portability framework for Java SDK. Is there any optimization on the Beam side rather than running two separate docker images

Re: Reducing Checkpoint Count for Chain Operator

2023-02-02 Thread Talat Uyarer via user
t 7 > > > > Hope we shed a little light on this > > > > Best regards > > > > Thias > > > > > > > > *From:* Kishore Pola > *Sent:* Thursday, February 2, 2023 4:12 AM > *To:* weijie guo ; Talat Uyarer < > tuya...@paloaltonetwor

Re: Reducing Checkpoint Count for Chain Operator

2023-02-01 Thread Talat Uyarer via user
o checkpoint > independently for each task, this is not supported. > > > Best regards, > > Weijie > > > Talat Uyarer via user 于2023年2月1日周三 15:34写道: > >> Hi, >> >> We have a job that is reading from kafka and writing some endpoints. The >> job does no

Reducing Checkpoint Count for Chain Operator

2023-01-31 Thread Talat Uyarer via user
Hi, We have a job that is reading from kafka and writing some endpoints. The job does not have any shuffling steps. I implement it with multiple steps. Flink chained those operators in one operator in submission time. However I see all operators are doing checkpointing. Is there any way to crea

Re: Beam SQL Alias issue while using With Clause

2023-01-27 Thread Talat Uyarer via user
suspect the problem lies there. I spent some time looking but wasn't able > to find it by code inspection, it looks like this code path is doing the > right thing with names. I'll spend some time tomorrow trying to reproduce > this on pure Calcite. > > Andrew > > > O

Re: Beam SQL Alias issue while using With Clause

2023-01-27 Thread Talat Uyarer via dev
suspect the problem lies there. I spent some time looking but wasn't able > to find it by code inspection, it looks like this code path is doing the > right thing with names. I'll spend some time tomorrow trying to reproduce > this on pure Calcite. > > Andrew > > > O

Re: Beam SQL Alias issue while using With Clause

2023-01-24 Thread Talat Uyarer via user
; > Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build(); > assertThat(result.getSchema(), equalTo(output_schema)); > > Row output = Row.withSchema(output_schema).addValues(1, > "strstr").build(); > PAssert.that(

Re: Beam SQL Alias issue while using With Clause

2023-01-24 Thread Talat Uyarer via dev
; > Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build(); > assertThat(result.getSchema(), equalTo(output_schema)); > > Row output = Row.withSchema(output_schema).addValues(1, > "strstr").build(); > PAssert.that(

Re: Beam SQL Alias issue while using With Clause

2023-01-24 Thread Talat Uyarer via user
tes Java bytecode to directly execute the DSL. >> >> Problem: it looks like the CalcRel has output columns with aliases "id" >> and "v" where it should have output columns with aliases "id" and "value". >> >> Kenn >> >&

Beam SQL Alias issue while using With Clause

2023-01-12 Thread Talat Uyarer via user
Hi All, I am using Beam 2.43 with Calcite SQL with Java. I have a query with a WITH clause and some aliasing. Looks like Beam Query optimizer after optimizing my query, it drops Select statement's aliases. Can you help me to identify where the problem is ? This is my query INFO: SQL: WITH `tempT

Re: About Native Deployment's Autoscaling implementation

2022-05-31 Thread Talat Uyarer
ers. > > This way users can also integrate easily with the custom Flink metrics > too. > > > > maxReplicas: We could add this easily to the taskManager resource specs > > > > Nice workflow picture, I would love to include this in the docs later. > One > >

[jira] [Created] (BEAM-14538) Support Flink Operator on Flink Runner

2022-05-31 Thread Talat Uyarer (Jira)
Talat Uyarer created BEAM-14538: --- Summary: Support Flink Operator on Flink Runner Key: BEAM-14538 URL: https://issues.apache.org/jira/browse/BEAM-14538 Project: Beam Issue Type: Improvement

Re: About Native Deployment's Autoscaling implementation

2022-05-25 Thread Talat Uyarer
we might need some integration with the k8s metrics >> system. >> In any case whether we need a FLIP or not depends on the complexity, if >> it's simple then we can go without a FLIP. >> >> Cheers, >> Gyula >> >> On Tue, May 24, 2022 at 12:26 PM T

Re: About Native Deployment's Autoscaling implementation

2022-05-25 Thread Talat Uyarer
on the complexity, if > it's simple then we can go without a FLIP. > > Cheers, > Gyula > > On Tue, May 24, 2022 at 12:26 PM Talat Uyarer < > tuya...@paloaltonetworks.com> wrote: > >> Hi Gyula, >> >> This seems very promising for initial scaling. W

Re: About Native Deployment's Autoscaling implementation

2022-05-24 Thread Talat Uyarer
auto-scaler would do). >> >> Looking forward to your thoughts >> >> [1] https://github.com/tillrohrmann/flink/commits/autoscaling >> <https://urldefense.com/v3/__https://github.com/tillrohrmann/flink/commits/autoscaling__;!!Mt_FR42WkD9csi9Y!ZNBiCduZFUmuQI7_9M48gQ

About Native Deployment's Autoscaling implementation

2022-05-22 Thread Talat Uyarer
Hi, I am working on auto scaling support for native deployments. Today Flink provides Reactive mode however it only runs on standalone deployments. We use Kubernetes native deployment. So I want to increase or decrease job resources for our streamin jobs. Recent Flip-138 and Flip-160 are very usefu

[jira] [Updated] (BEAM-14438) Beam slowness compared to flink-native

2022-05-11 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/BEAM-14438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat Uyarer updated BEAM-14438: Labels: performance (was: ) > Beam slowness compared to flink-nat

Re: Beam slowness compared to flink-native

2022-05-10 Thread Talat Uyarer
HI Ifat, Did you enable fasterCopy parameter ? Please look at this issue: https://issues.apache.org/jira/browse/BEAM-11146 Thanks On Mon, May 2, 2022 at 12:57 AM Afek, Ifat (Nokia - IL/Kfar Sava) < ifat.a...@nokia.com> wrote: > Hi, > > > > I’m investigating a slowness of our beam pipelines, an

[jira] [Updated] (BEAM-13577) Beam Select's uniquifyNames function loses nullability of Complex types while inferring schema

2022-01-20 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat Uyarer updated BEAM-13577: Affects Version/s: 2.35.0 2.36.0 > Beam Select's uniquifyNames

[jira] [Commented] (BEAM-13577) Beam Select's uniquifyNames function loses nullability of Complex types while inferring schema

2022-01-13 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475982#comment-17475982 ] Talat Uyarer commented on BEAM-13577: - [~kenn] I am waiting [~ibzib] '

Review ask for my pull request

2022-01-06 Thread Talat Uyarer
Hi, I created a pull request. Could you review my pr ? This is url: https://github.com/apache/beam/pull/16380 Thanks

[jira] [Created] (BEAM-13577) Beam Select's uniquifyNames function loses nullability of Complex types while inferring schema

2021-12-28 Thread Talat Uyarer (Jira)
Talat Uyarer created BEAM-13577: --- Summary: Beam Select's uniquifyNames function loses nullability of Complex types while inferring schema Key: BEAM-13577 URL: https://issues.apache.org/jira/browse/BEAM-

[jira] [Updated] (BEAM-13577) Beam Select's uniquifyNames function loses nullability of Complex types while inferring schema

2021-12-28 Thread Talat Uyarer (Jira)
[ https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat Uyarer updated BEAM-13577: Description: We use BeamSQL in our project. When we use any JOIN. SQL generates BeamCoGBKJoinRel

Re: Filtering support on Fetch API

2021-11-30 Thread Talat Uyarer
or basic > comparisons of the built-in Serdes might be simple enough, but anything > more complex or involving custom Serdes would probably require a new > plug-in type on the broker. > > On Mon, Nov 29, 2021 at 10:49 AM Talat Uyarer < > tuya...@paloaltonetworks.com> > wrot

Filtering support on Fetch API

2021-11-29 Thread Talat Uyarer
Hi All, I want to get your advice about one subject. I want to create a KIP for message header base filtering on Fetch API. Our current use case We have 1k+ topics and per topic, have 10+ consumers for different use cases. However all consumers are interested in different sets of messages on the

Slow Samza beam Job

2021-11-11 Thread Talat Uyarer
Hi, I am doing some kind of performance testing. I submitted two Beam jobs, one is running with a Dataflow worker and the one Samza runner. My Samza deployment is standalone. I have 10 workers for each job. My DAG is very basic *Read From Kafka -> BeamSQL filtter -> Write GCS* However I have th

Re: Kafka Offset Commit

2021-08-12 Thread Talat Uyarer
afka consumer group doesn't apply to samza as well since > Samza manages the assignments of Kafka partitions to tasks and doesn't > leverage Kafka's high level consumer behavior to assign its partition to > different consumers. > > Hope that answers your question. >

Re: Kafka Offset Commit

2021-08-10 Thread Talat Uyarer
DaqSy0r4X9x43flCgkiuMeZhtbLCX9uwEMkqxtvQ2g&s=vlTnWi-4Xxnk52pXrMZTfQekBoDWp66hovL2E_qi-Z8&e= > > has details on the configs we need to add to enable checkpointing to kafka > for a job. > > thanks > > > On Tue, Aug 10, 2021 at 5:03 PM Talat Uyarer > > wrote: >

Kafka Offset Commit

2021-08-10 Thread Talat Uyarer
Hi Samza Community, This is my first email. Forgive my lack of knowledge about samza. I am running a testing job in my environment. I run in local model but somehow my job is processing data however it does not commit offset on Kafka side. I use an apache beam samza runner. My pipeline is simply

Re: Calculating Top K element per window with Beam SQL

2021-07-29 Thread Talat Uyarer
l_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rel_BeamSortRel.java-23L200&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=t64ngtbWsSfQC4rND6Nc4ri8I--Ey30-KIuKISgxxI0&s=f5C-HFf6aqOpMxRyaDjkMGNb7PsX-puF6K4cW-b_gTI&e=> >

Calculating Top K element per window with Beam SQL

2021-07-09 Thread Talat Uyarer
Hi, I am trying to calculate the top k element based on a streaming aggregation per window. Do you know if there is any support for it on BeamSQL or How can achieve this goal with BeamSQL on stream ? Sample Query SELECT customer_id, app, sum(bytes_total) as bytes_total FROM PCOLLECTION GROUP BY c

How Beam SQL Side Input refresh/update

2021-05-07 Thread Talat Uyarer
Hi, Based on Join documentation. If I have a Join with Unbounded and Bounded > For this type of JOIN bounded input is treated as a side-input by the > implementation. This means that window/trigger is inherented from upstreams. On my pipeline I dont have any triggering or window. I use a global

Re: Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
t update data with any mechanisim. Thanks On Wed, Feb 10, 2021 at 12:41 PM Talat Uyarer wrote: > Does beam create UDF function for every bundle or in setup of pipeline ? > > I will keep internal state in memory. The Async thread will update that in > memory state based on an interva

Re: Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
:37 PM Rui Wang wrote: > The problem that I can think of is maybe before the async call is > completed, the UDF life cycle has reached to the end. > > > -Rui > > On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer < > tuya...@paloaltonetworks.com> wrote: > >> Hi,

Apache Beam SQL and UDF

2021-02-10 Thread Talat Uyarer
Hi, We plan to use UDF on our sql. We want to achieve some kind of filtering based on internal states. We want to update that internal state with a separate async thread in UDF. Before implementing that thing I want to get your options. Is there any limitation for UDF to have multi-thread implemen

Re: About Beam SQL Schema Changes and Code generation

2020-12-08 Thread Talat Uyarer
>>> >>>>>> Reuven >>>>>> >>>>>> On Tue, Dec 8, 2020 at 9:33 AM Brian Hulette >>>>>> wrote: >>>>>> >>>>>>> Reuven, could you clarify what you have in mind? I know multiple >>>>>>

Re: About Beam SQL Schema Changes and Code generation

2020-12-08 Thread Talat Uyarer
compatibility >>>>>> support to SchemaCoder, including support for certain schema changes >>>>>> (field >>>>>> additions/deletions) - I think the most recent discussion was here [1]. >>>>>> >>>>>> But it sounds

Re: About Beam SQL Schema Changes and Code generation

2020-12-08 Thread Talat Uyarer
might be theoretically possible to do this (at least for the > case where existing fields do not change). Whether anyone currently has > available time to do this is a different question, but it's something that > can be looked into. > > On Mon, Dec 7, 2020 at 9:29 PM Talat Uyar

Re: About Beam SQL Schema Changes and Code generation

2020-12-07 Thread Talat Uyarer
ges, are these new fields being added to the > schema? Or are you making changes to the existing fields? > > On Mon, Dec 7, 2020 at 9:02 PM Talat Uyarer > wrote: > >> Hi, >> For sure let me explain a little bit about my pipeline. >> My Pipeline is actually simple &g

Re: About Beam SQL Schema Changes and Code generation

2020-12-07 Thread Talat Uyarer
m unless you are > also changing the SQL statement? > > Is this a case where you have a SELECT *, and just want to make sure those > fields are included? > > Reuven > > On Mon, Dec 7, 2020 at 6:31 PM Talat Uyarer > wrote: > >> Hi Andrew, >> >> I assume

Re: About Beam SQL Schema Changes and Code generation

2020-12-07 Thread Talat Uyarer
QL to try to produce > the same plan for an updated SQL query. > > Andrew > > On Mon, Dec 7, 2020 at 5:44 PM Talat Uyarer > wrote: > >> Hi, >> >> We are using Beamsql on our pipeline. Our Data is written in Avro format. >> We generate our rows bas

About Beam SQL Schema Changes and Code generation

2020-12-07 Thread Talat Uyarer
Hi, We are using Beamsql on our pipeline. Our Data is written in Avro format. We generate our rows based on our Avro schema. Over time the schema is changing. I believe Beam SQL generates Java code based on what we define as BeamSchema while submitting the pipeline. Do you have any idea How can we

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-03 Thread Talat Uyarer
GC thrashing. > Memory is used/total/max = 4112/5994/5994 MB, GC last/max = 97.36/97.36 %, > #pushbacks=3, gc thrashing=true. Heap dump not written. And also my process rate is 4kps per instance. I would like to hear your suggestions if you have any. Thanks On Wed, Sep 2, 2020 at 6:22 PM Tal

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
I also tried Brian's suggestion to clear stringbuilder by calling delete with stringbuffer length. No luck. I am still getting the same error message. Do you have any suggestions ? Thanks On Wed, Sep 2, 2020 at 3:33 PM Talat Uyarer wrote: > If I'm understanding Talat's log

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
to-reuse-a-stringbuilder-in-a-loop >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_242438_is-2Dit-2Dbetter-2Dto-2Dreuse-2Da-2Dstringbuilder-2Din-2Da-2Dloop&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3V

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
VYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=EFsNtBMh3aQSH1MXWx0-YmpRIgUHj6EfHvulHdoBkdw&s=H5Fupf1d3R199PF73T8D8YYAnipbS3YdJKj_4Ep2-DU&e=> > > On Wed, Sep 2, 2020 at 3:02 PM Talat Uyarer > wrote: > >> Sorry for the wrong import. You can see on the code I am using >> Stri

Re: OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
AhADvjz4ucjndSmzyOZ8FPBvJ_0oZQ&e=> > > Could you try using StringBuilder instead since the usage is not > appropriate for a StringWriter? > > > On Wed, Sep 2, 2020 at 2:49 PM Talat Uyarer > wrote: > >> Hi, >> >> I have an issue with String Co

OOM issue on Dataflow Worker by doing string manipulation

2020-09-02 Thread Talat Uyarer
Hi, I have an issue with String Concatenating. You can see my code below.[1] I have a step on my df job which is concatenating strings. But somehow when I use that step my job starts getting jvm restart errors. Shutting down JVM after 8 consecutive periods of measured GC thrashing. > Memory is u

Re: Resource Consumption increase With TupleTag

2020-08-20 Thread Talat Uyarer
? (Even if MessageExtractor seems simple it isn't free, You > have to now write to two GCS locations instead of one for each work item > that you process so your doing more network calls) > > > On Wed, Aug 19, 2020 at 8:36 PM Talat Uyarer > wrote: > >> Filter

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
you can have a single Pipeline and do as > many reads as you want and the'll all get executed in the same job. > > On Wed, Aug 19, 2020 at 4:14 PM Talat Uyarer > wrote: > > > > Hi Robert, > > > > I calculated process speed based on worker count. When I have sep

Re: Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
7pAR8g&m=Erfg03JLKLNG3lT2ejqq7_fbvfL95-wSxZ5hFKqzyKU&s=JsWPJxBXopYYenfBAp6nkwfB0Q1Dhs1d4Yi41fBY3a8&e= > ). > > - Robert > > On Wed, Aug 19, 2020 at 3:37 PM Talat Uyarer > wrote: > > > > Hi, > > > > I have a very simple DAG on my dataflow job. > (KafkaIO->Filter-&g

Resource Consumption increase With TupleTag

2020-08-19 Thread Talat Uyarer
Hi, I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS). When I submit this Dataflow job per topic it has 4kps per instance processing speed. However I want to consume two different topics in one DF job. I used TupleTag. I created TupleTags per message type. Each topic has dif

Re: Calcite and Enum Type

2020-07-15 Thread Talat Uyarer
25YQP0vDyhmVKo11YVqx3G8KeT1tYc&e= > > -Rui > > On Wed, Jul 15, 2020 at 3:55 PM Talat Uyarer > > wrote: > > > Hi Julian, > > > > Thanks for your answer. I dont know other dbs but Also Postgresql > support > > enum too[1]. Do you think sup

Re: Calcite and Enum Type

2020-07-15 Thread Talat Uyarer
ave a given set of > values. Then hopefully a storage system would compress repeated values > to a few bits each. > > Feel free to log a JIRA with your requirements. We'll see if anyone > else wants this, and is prepared to implement it. > > Julian > > On Tue

Calcite and Enum Type

2020-07-14 Thread Talat Uyarer
Hi, I am using Beam SQL which uses calcite. IN Beam we have logical types which we can use for Enumeration. But looks like they did not implement enumeration support for calcite. I want to give that support. But I could not find the right way to implement it. In my Enumeration is a HashMap. My que

Re: KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
PJNRn4tog4I&e=> > [3] https://issues.apache.org/jira/browse/BEAM-9977 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D9977&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RU

KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
Hi, I am using Dataflow. When I update my DF job with a new topic or update partition count on the same topic. I can not use DF's update function. Could you help me to understand why I can not use the update function for that ? I checked the Beam source code but I could not find the right place t

Re: Building Dataflow Worker

2020-06-15 Thread Talat Uyarer
eam 2.19 is using gradle 5.2.1, is the installed version > compatible with that? > > Try > ./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar > in a clean workspace. > > On Fri, Jun 12, 2020 at 4:30 PM Talat Uyarer > wrote: > >> Hi, >> >>

Building Dataflow Worker

2020-06-12 Thread Talat Uyarer
Hi, I want to build the dataflow worker on apache beram 2.19. However I faced a grpc issue. I did not change anything. Just checked release-2.19.0 branch and run build command. Could you help me understand why it does not build. [1] Additional information, Based on my limited knowledge Looks like

Re: KafkaIO Read Latency

2020-06-10 Thread Talat Uyarer
ng and closing bundle*”? > > Sometimes, starting a KafkaReader can take time since it will seek for a > start offset for each assigned partition but it happens only once at > pipeline start-up and mostly depends on network conditions. > > On 9 Jun 2020, at 23:05, Talat Uyarer > wrote

KafkaIO Read Latency

2020-06-09 Thread Talat Uyarer
Hi, I added some metrics on a step right after KafkaIO. When I compare the read time difference between producer and KafkaIO it is 800ms for P99. However somehow that step's opening and closing bundle difference is 18 seconds for p99. The step itself does not do any specific thing. Do you have any

Re: Pipeline Processing Time

2020-06-09 Thread Talat Uyarer
e.proofpoint.com/v2/url?u=https-3A__beam.apache.org_releases_javadoc_2.21.0_org_apache_beam_sdk_transforms_DoFn.ProcessElement.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KuUWakZ-xaVGYfsw7YGz1WBOLIlpBHikvRxgZs9vWn0&s=nkJq_weo7lr

Re: Pipeline Processing Time

2020-06-01 Thread Talat Uyarer
Q&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=1202mTv7BP1KzcBJECS98dr7u5riw0NHdl8rT8I6Ego&s=cPdnrK4r-tVd0iAO6j7eAAbDPISOdazEYBrPoC9cQOo&e=> > > > On Thu, May 28, 2020 at 1:12 PM Talat Uyarer > wrote: > >> Yes I am trying to track how long

Re: Pipeline Processing Time

2020-05-28 Thread Talat Uyarer
essing takes? > Do you care about how much CPU time is being consumed in aggregate for all > the processing that your pipeline is doing? > > > On Thu, May 28, 2020 at 11:01 AM Talat Uyarer < > tuya...@paloaltonetworks.com> wrote: > >> I am using Dataflow Runner.

  1   2   3   4   5   6   7   8   9   10   >