Congrats folks!
On Thu, Mar 27, 2025 at 10:35 AM Rulin Xing wrote:
> Congrats folks!
>
> Rulin
>
e some progress on
> managed IO but there is still more to be done there.
>
>
>
> Adding folks, who can help you with these questions and see if these
> projects still makes sense this year as well. : @Danny McCormick
> @Talat Uyarer @Chamikara
> Jayalath
>
>
>
>
Hey Flink Users,
Heads up! The Apache Beam Summit 2024 is happening next month( September
04, 2024) at the Google Campus in Sunnyvale, CA. It's two days packed with
learning and networking with the Beam community.
Key highlights:
- Improving Stability for Running Python SDK with Flink Runner: im
Hi ajay,
When you have 3 parallelisms you will have 3 independent clients. If you
want to keep prefetch count 3 you need to set setRequestedChannelMax as 1
and setParallelism 3. So All 3 clients can have one connection.
Talat
On Tue, May 7, 2024 at 5:52 AM ajay pandey wrote:
> Hi Flink Team,
>
Hi ajay,
When you have 3 parallelisms you will have 3 independent clients. If you
want to keep prefetch count 3 you need to set setRequestedChannelMax as 1
and setParallelism 3. So All 3 clients can have one connection.
Talat
On Tue, May 7, 2024 at 5:52 AM ajay pandey wrote:
> Hi Flink Team,
>
Hi Lasse,
If there's a significant difference in the system time between Flink
TaskManagers, it can lead to negative time calculations when comparing
timestamps from different sources.
On Mon, May 6, 2024 at 5:40 AM Lasse Nedergaard <
lassenedergaardfl...@gmail.com> wrote:
> Hi.
>
> In Flink job
Hi Keith,
When you add a new insert statement to your EXECUTE STATEMENT you change
your job graph with independent two graphs.Unfortunately, Flink doesn't
currently provide a way to directly force specific UIDs for operators
through configuration or SQL hints. This is primarily due to how Flink's
Talat Uyarer created FLINK-33530:
Summary: Add entropy to Google Cloud Storage path for better
scalability
Key: FLINK-33530
URL: https://issues.apache.org/jira/browse/FLINK-33530
Project: Flink
Talat Uyarer created FLINK-33530:
Summary: Add entropy to Google Cloud Storage path for better
scalability
Key: FLINK-33530
URL: https://issues.apache.org/jira/browse/FLINK-33530
Project: Flink
Hi,
I saw this a little bit late. I implement a custom count distinct for our
streaming use case. If you are looking for something close enough but not
exact you can use my UDF. It uses the HyperLogLogPlus algorithm, which is
an efficient and scalable way to estimate cardinality with a controlled
[
https://issues.apache.org/jira/browse/FLINK-32818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764298#comment-17764298
]
Talat Uyarer commented on FLINK-32818:
--
Thank you [~fanrui] I look forward you
[
https://issues.apache.org/jira/browse/FLINK-32818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763366#comment-17763366
]
Talat Uyarer commented on FLINK-32818:
--
Hey [~fanrui] At Palo Alto Network
[
https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748908#comment-17748908
]
Talat Uyarer commented on FLINK-32700:
--
Our customers are ok to lose data if
[
https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748759#comment-17748759
]
Talat Uyarer edited comment on FLINK-32700 at 7/29/23 1:3
[
https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748759#comment-17748759
]
Talat Uyarer commented on FLINK-32700:
--
[~gyfora] Currently there is an issu
rtitions (for Kafka sources for example) and requires some coordination
> works.
>
> Best,
> Zhanghao Chen
> --
> *发件人:* Talat Uyarer via dev
> *发送时间:* 2023年7月23日 15:28
> *收件人:* dev
> *主题:* Scaling Flink Jobs without Restarting Job
>
> HI
HI,
We are using Flink with Adaptive Scheduler(Reactive Mode) on Kubernetes
with Standalone deployment Application mode for our streaming
infrastructure. Our autoscaler is scaling up or down our jobs. However,
each scale action causes a job restart.
Our customers complain about fluctuating traffi
Sorry for cross posting
-- Forwarded message -
From: Talat Uyarer
Date: Fri, May 19, 2023, 2:25 AM
Subject: Local Combiner for GroupByKey on Flink Streaming jobs
To:
Hi,
I have a stream aggregation job which is running on Flink 1.13 I generate
DAG by using Beam SQL. My SQL
.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_FLINK_FLIP-2D182-253A-2BSupport-2Bwatermark-2Balignment-2Bof-2BFLIP-2D27-2BSources&d=DwMDaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=aecDGh6mOmdF13fznxqZ2eCSP--lTT02C2dNHJ
Maybe the User list does not have knowledge about this. That's why I also
resend on the Dev list. Sorry for cross posting
Hi All,
I have a stream aggregation job which reads from Kafka and writes some
Sinks.
When I submit my job Flink checkpoint size keeps increasing if I use
unaligned checkpoi
Hi All,
I have a stream aggregation job which reads from Kafka and writes some
Sinks.
When I submit my job Flink checkpoint size keeps increasing if I use
unaligned checkpoint settings and it does not emit any window results.
If I use an aligned checkpoint, size is somewhat under control(still bi
Hi,
I have a stream aggregation job which is running on Flink 1.13 I generate
DAG by using Beam SQL. My SQL query has a TUMBLE window. Basically My
pipeline reads from kafka aggregate, counts/sums some values by streamin
aggregation and writes a Sink.
BeamSQl uses Groupbykey for the aggregation p
Hi All,
We are using Flink Kubernetes Operator on our production. We have 3k+ jobs
in standalone mode. But after 2.5k jobs operator getting slow. Now when we
submit a job it takes 10+ minutes to the job runs. Does anyone use similar
scale or more job ?
Now we run as a single pod. Does operator su
; It's doable but might not be straightforward since the AS recycles
> ExecutionGraph during restart. It has been a low priority so far because
> it's mainly valuable for batch jobs, but we might reconsider it if there
> are enough use cases.
>
> Best,
> D.
>
> On Sat, Apr
,
> D.
>
> On Tue, Apr 11, 2023 at 4:35 AM Weihua Hu wrote:
>
>> Hi,
>>
>> AFAIK, the reactive mode always restarts the whole pipeline now.
>>
>> Best,
>> Weihua
>>
>>
>> On Tue, Apr 11, 2023 at 8:38 AM Talat Uyarer via user <
>&
Hi All,
We use Flink 1.13 with reactive mode for our streaming jobs. When we have
an issue/exception on our pipeline. Flink rescheduled all tasks. Is there
any way to reschedule only task that had exceptions ?
Thanks
Vr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e=
>
> I'm experimenting with ideas on how to work around this but a fix will
> likely require a Calcite upgrade, which is not something I'd have time
> to help with. (I'm not on the Google Beam team anymore.)
>
Vr&s=g36wnBGvi7DQG7gvljaG08vXIhROyCoz5vWBBRS43Ag&e=
>
> I'm experimenting with ideas on how to work around this but a fix will
> likely require a Calcite upgrade, which is not something I'd have time
> to help with. (I'm not on the Google Beam team anymore.)
>
Would you like to be a volunteer +Andrew Pilloud :)
On Thu, Feb 23, 2023 at 4:51 PM Andrew Pilloud wrote:
> The bot says there are no reviewers for Flink. Possibly you'll find a
> volunteer to review it here?
>
> On Thu, Feb 23, 2023 at 4:47 PM Talat Uyarer via dev
> wro
Hi,
I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and try
to run assign reviewer command. But I am not sure. I pressed the right
button :)
If you know some who can review this change
https://github.com/apache/beam/pull/25554
Could you assign him/her to this mr ?
Thanks
ovider);
> +PCollection stream =
> +BeamSqlRelUtils.toPCollection(
> +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM
> basicRowTestTable WHERE basicRowTestTable.col.string_field = 'innerStr')
> SELECT * FROM tempTable"));
> +Schema outputS
ovider);
> +PCollection stream =
> +BeamSqlRelUtils.toPCollection(
> +pipeline, sqlEnv.parseQuery("WITH tempTable AS (SELECT * FROM
> basicRowTestTable WHERE basicRowTestTable.col.string_field = 'innerStr')
> SELECT * FROM tempTable"));
> +Schema outputS
t;>>
>>> One minor issue I encountered. It took me a while to get your test case
>>> running, it doesn't appear there are any calcite gradle rules to run
>>> CoreQuidemTest and constructing the classpath manually was tedious. Did I
>>> miss something?
>&g
t;>>
>>> One minor issue I encountered. It took me a while to get your test case
>>> running, it doesn't appear there are any calcite gradle rules to run
>>> CoreQuidemTest and constructing the classpath manually was tedious. Did I
>>> miss something?
>&g
Hi,
Do you use Flink operator or manually deployed session cluster ?
Thanks
On Fri, Feb 3, 2023, 4:32 AM P Singh wrote:
> Hi Team,
>
> I have set up a flink cluster on GKE and am trying to submit a beam
> pipeline with below options. I was able to run this on a local machine but
> I don't unde
Dataflow has similar
behavior with checkpoint support they call pipeline fusion. [1]
[1]
https://cloud.google.com/dataflow/docs/pipeline-lifecycle#fusion_optimization
Thanks
On Thu, Feb 2, 2023 at 9:25 AM Schwalbe Matthias <
matthias.schwa...@viseca.ch> wrote:
> Hi Talat Uyarer,
>
>
>
Hi,
I know we use the portability framework when the sdk language (python) is
different from the runner's language(java) .
If my runner is Java based and I want to use the portability framework for
Java SDK. Is there any optimization on the Beam side rather than running
two separate docker images
t 7
>
>
>
> Hope we shed a little light on this
>
>
>
> Best regards
>
>
>
> Thias
>
>
>
>
>
>
>
> *From:* Kishore Pola
> *Sent:* Thursday, February 2, 2023 4:12 AM
> *To:* weijie guo ; Talat Uyarer <
> tuya...@paloaltonetwor
o checkpoint
> independently for each task, this is not supported.
>
>
> Best regards,
>
> Weijie
>
>
> Talat Uyarer via user 于2023年2月1日周三 15:34写道:
>
>> Hi,
>>
>> We have a job that is reading from kafka and writing some endpoints. The
>> job does no
Hi,
We have a job that is reading from kafka and writing some endpoints. The
job does not have any shuffling steps. I implement it with multiple
steps. Flink chained those operators in one operator in submission time.
However I see all operators are doing checkpointing.
Is there any way to crea
suspect the problem lies there. I spent some time looking but wasn't able
> to find it by code inspection, it looks like this code path is doing the
> right thing with names. I'll spend some time tomorrow trying to reproduce
> this on pure Calcite.
>
> Andrew
>
>
> O
suspect the problem lies there. I spent some time looking but wasn't able
> to find it by code inspection, it looks like this code path is doing the
> right thing with names. I'll spend some time tomorrow trying to reproduce
> this on pure Calcite.
>
> Andrew
>
>
> O
;
> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
> assertThat(result.getSchema(), equalTo(output_schema));
>
> Row output = Row.withSchema(output_schema).addValues(1,
> "strstr").build();
> PAssert.that(
;
> Schema.builder().addInt32Field("fout_int").addStringField("fout_string").build();
> assertThat(result.getSchema(), equalTo(output_schema));
>
> Row output = Row.withSchema(output_schema).addValues(1,
> "strstr").build();
> PAssert.that(
tes Java bytecode to directly execute the DSL.
>>
>> Problem: it looks like the CalcRel has output columns with aliases "id"
>> and "v" where it should have output columns with aliases "id" and "value".
>>
>> Kenn
>>
>&
Hi All,
I am using Beam 2.43 with Calcite SQL with Java.
I have a query with a WITH clause and some aliasing. Looks like Beam Query
optimizer after optimizing my query, it drops Select statement's aliases.
Can you help me to identify where the problem is ?
This is my query
INFO: SQL:
WITH `tempT
ers.
> > This way users can also integrate easily with the custom Flink metrics
> too.
> >
> > maxReplicas: We could add this easily to the taskManager resource specs
> >
> > Nice workflow picture, I would love to include this in the docs later.
> One
> >
Talat Uyarer created BEAM-14538:
---
Summary: Support Flink Operator on Flink Runner
Key: BEAM-14538
URL: https://issues.apache.org/jira/browse/BEAM-14538
Project: Beam
Issue Type: Improvement
we might need some integration with the k8s metrics
>> system.
>> In any case whether we need a FLIP or not depends on the complexity, if
>> it's simple then we can go without a FLIP.
>>
>> Cheers,
>> Gyula
>>
>> On Tue, May 24, 2022 at 12:26 PM T
on the complexity, if
> it's simple then we can go without a FLIP.
>
> Cheers,
> Gyula
>
> On Tue, May 24, 2022 at 12:26 PM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi Gyula,
>>
>> This seems very promising for initial scaling. W
auto-scaler would do).
>>
>> Looking forward to your thoughts
>>
>> [1] https://github.com/tillrohrmann/flink/commits/autoscaling
>> <https://urldefense.com/v3/__https://github.com/tillrohrmann/flink/commits/autoscaling__;!!Mt_FR42WkD9csi9Y!ZNBiCduZFUmuQI7_9M48gQ
Hi,
I am working on auto scaling support for native deployments. Today Flink
provides Reactive mode however it only runs on standalone deployments. We
use Kubernetes native deployment. So I want to increase or decrease job
resources for our streamin jobs. Recent Flip-138 and Flip-160 are very
usefu
[
https://issues.apache.org/jira/browse/BEAM-14438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat Uyarer updated BEAM-14438:
Labels: performance (was: )
> Beam slowness compared to flink-nat
HI Ifat,
Did you enable fasterCopy parameter ?
Please look at this issue: https://issues.apache.org/jira/browse/BEAM-11146
Thanks
On Mon, May 2, 2022 at 12:57 AM Afek, Ifat (Nokia - IL/Kfar Sava) <
ifat.a...@nokia.com> wrote:
> Hi,
>
>
>
> I’m investigating a slowness of our beam pipelines, an
[
https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat Uyarer updated BEAM-13577:
Affects Version/s: 2.35.0
2.36.0
> Beam Select's uniquifyNames
[
https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475982#comment-17475982
]
Talat Uyarer commented on BEAM-13577:
-
[~kenn] I am waiting [~ibzib] '
Hi,
I created a pull request. Could you review my pr ?
This is url: https://github.com/apache/beam/pull/16380
Thanks
Talat Uyarer created BEAM-13577:
---
Summary: Beam Select's uniquifyNames function loses nullability of
Complex types while inferring schema
Key: BEAM-13577
URL: https://issues.apache.org/jira/browse/BEAM-
[
https://issues.apache.org/jira/browse/BEAM-13577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat Uyarer updated BEAM-13577:
Description:
We use BeamSQL in our project. When we use any JOIN. SQL generates
BeamCoGBKJoinRel
or basic
> comparisons of the built-in Serdes might be simple enough, but anything
> more complex or involving custom Serdes would probably require a new
> plug-in type on the broker.
>
> On Mon, Nov 29, 2021 at 10:49 AM Talat Uyarer <
> tuya...@paloaltonetworks.com>
> wrot
Hi All,
I want to get your advice about one subject. I want to create a KIP for
message header base filtering on Fetch API.
Our current use case We have 1k+ topics and per topic, have 10+ consumers
for different use cases. However all consumers are interested in different
sets of messages on the
Hi,
I am doing some kind of performance testing. I submitted two Beam jobs, one
is running with a Dataflow worker and the one Samza runner.
My Samza deployment is standalone. I have 10 workers for each job. My DAG
is very basic
*Read From Kafka -> BeamSQL filtter -> Write GCS*
However I have th
afka consumer group doesn't apply to samza as well since
> Samza manages the assignments of Kafka partitions to tasks and doesn't
> leverage Kafka's high level consumer behavior to assign its partition to
> different consumers.
>
> Hope that answers your question.
>
DaqSy0r4X9x43flCgkiuMeZhtbLCX9uwEMkqxtvQ2g&s=vlTnWi-4Xxnk52pXrMZTfQekBoDWp66hovL2E_qi-Z8&e=
>
> has details on the configs we need to add to enable checkpointing to kafka
> for a job.
>
> thanks
>
>
> On Tue, Aug 10, 2021 at 5:03 PM Talat Uyarer >
> wrote:
>
Hi Samza Community,
This is my first email. Forgive my lack of knowledge about samza. I am
running a testing job in my environment. I run in local model but somehow
my job is processing data however it does not commit offset on Kafka side.
I use an apache beam samza runner.
My pipeline is simply
l_src_main_java_org_apache_beam_sdk_extensions_sql_impl_rel_BeamSortRel.java-23L200&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=t64ngtbWsSfQC4rND6Nc4ri8I--Ey30-KIuKISgxxI0&s=f5C-HFf6aqOpMxRyaDjkMGNb7PsX-puF6K4cW-b_gTI&e=>
>
Hi,
I am trying to calculate the top k element based on a streaming
aggregation per window. Do you know if there is any support for it
on BeamSQL or How can achieve this goal with BeamSQL on stream ?
Sample Query
SELECT customer_id, app, sum(bytes_total) as bytes_total FROM PCOLLECTION
GROUP BY c
Hi,
Based on Join documentation. If I have a Join with Unbounded and Bounded
> For this type of JOIN bounded input is treated as a side-input by the
> implementation. This means that window/trigger is inherented from upstreams.
On my pipeline I dont have any triggering or window. I use a global
t update data with any mechanisim.
Thanks
On Wed, Feb 10, 2021 at 12:41 PM Talat Uyarer
wrote:
> Does beam create UDF function for every bundle or in setup of pipeline ?
>
> I will keep internal state in memory. The Async thread will update that in
> memory state based on an interva
:37 PM Rui Wang wrote:
> The problem that I can think of is maybe before the async call is
> completed, the UDF life cycle has reached to the end.
>
>
> -Rui
>
> On Wed, Feb 10, 2021 at 12:34 PM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> Hi,
Hi,
We plan to use UDF on our sql. We want to achieve some kind of
filtering based on internal states. We want to update that internal state
with a separate async thread in UDF. Before implementing that thing I want
to get your options. Is there any limitation for UDF to have multi-thread
implemen
>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Tue, Dec 8, 2020 at 9:33 AM Brian Hulette
>>>>>> wrote:
>>>>>>
>>>>>>> Reuven, could you clarify what you have in mind? I know multiple
>>>>>>
compatibility
>>>>>> support to SchemaCoder, including support for certain schema changes
>>>>>> (field
>>>>>> additions/deletions) - I think the most recent discussion was here [1].
>>>>>>
>>>>>> But it sounds
might be theoretically possible to do this (at least for the
> case where existing fields do not change). Whether anyone currently has
> available time to do this is a different question, but it's something that
> can be looked into.
>
> On Mon, Dec 7, 2020 at 9:29 PM Talat Uyar
ges, are these new fields being added to the
> schema? Or are you making changes to the existing fields?
>
> On Mon, Dec 7, 2020 at 9:02 PM Talat Uyarer
> wrote:
>
>> Hi,
>> For sure let me explain a little bit about my pipeline.
>> My Pipeline is actually simple
&g
m unless you are
> also changing the SQL statement?
>
> Is this a case where you have a SELECT *, and just want to make sure those
> fields are included?
>
> Reuven
>
> On Mon, Dec 7, 2020 at 6:31 PM Talat Uyarer
> wrote:
>
>> Hi Andrew,
>>
>> I assume
QL to try to produce
> the same plan for an updated SQL query.
>
> Andrew
>
> On Mon, Dec 7, 2020 at 5:44 PM Talat Uyarer
> wrote:
>
>> Hi,
>>
>> We are using Beamsql on our pipeline. Our Data is written in Avro format.
>> We generate our rows bas
Hi,
We are using Beamsql on our pipeline. Our Data is written in Avro format.
We generate our rows based on our Avro schema. Over time the schema is
changing. I believe Beam SQL generates Java code based on what we define as
BeamSchema while submitting the pipeline. Do you have any idea How can we
GC thrashing.
> Memory is used/total/max = 4112/5994/5994 MB, GC last/max = 97.36/97.36 %,
> #pushbacks=3, gc thrashing=true. Heap dump not written.
And also my process rate is 4kps per instance. I would like to hear your
suggestions if you have any.
Thanks
On Wed, Sep 2, 2020 at 6:22 PM Tal
I also tried Brian's suggestion to clear stringbuilder by calling delete
with stringbuffer length. No luck. I am still getting the same error
message. Do you have any suggestions ?
Thanks
On Wed, Sep 2, 2020 at 3:33 PM Talat Uyarer
wrote:
> If I'm understanding Talat's log
to-reuse-a-stringbuilder-in-a-loop
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_242438_is-2Dit-2Dbetter-2Dto-2Dreuse-2Da-2Dstringbuilder-2Din-2Da-2Dloop&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3V
VYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=EFsNtBMh3aQSH1MXWx0-YmpRIgUHj6EfHvulHdoBkdw&s=H5Fupf1d3R199PF73T8D8YYAnipbS3YdJKj_4Ep2-DU&e=>
>
> On Wed, Sep 2, 2020 at 3:02 PM Talat Uyarer
> wrote:
>
>> Sorry for the wrong import. You can see on the code I am using
>> Stri
AhADvjz4ucjndSmzyOZ8FPBvJ_0oZQ&e=>
>
> Could you try using StringBuilder instead since the usage is not
> appropriate for a StringWriter?
>
>
> On Wed, Sep 2, 2020 at 2:49 PM Talat Uyarer
> wrote:
>
>> Hi,
>>
>> I have an issue with String Co
Hi,
I have an issue with String Concatenating. You can see my code below.[1] I
have a step on my df job which is concatenating strings. But somehow when I
use that step my job starts getting jvm restart errors.
Shutting down JVM after 8 consecutive periods of measured GC thrashing.
> Memory is u
? (Even if MessageExtractor seems simple it isn't free, You
> have to now write to two GCS locations instead of one for each work item
> that you process so your doing more network calls)
>
>
> On Wed, Aug 19, 2020 at 8:36 PM Talat Uyarer
> wrote:
>
>> Filter
you can have a single Pipeline and do as
> many reads as you want and the'll all get executed in the same job.
>
> On Wed, Aug 19, 2020 at 4:14 PM Talat Uyarer
> wrote:
> >
> > Hi Robert,
> >
> > I calculated process speed based on worker count. When I have sep
7pAR8g&m=Erfg03JLKLNG3lT2ejqq7_fbvfL95-wSxZ5hFKqzyKU&s=JsWPJxBXopYYenfBAp6nkwfB0Q1Dhs1d4Yi41fBY3a8&e=
> ).
>
> - Robert
>
> On Wed, Aug 19, 2020 at 3:37 PM Talat Uyarer
> wrote:
> >
> > Hi,
> >
> > I have a very simple DAG on my dataflow job.
> (KafkaIO->Filter-&g
Hi,
I have a very simple DAG on my dataflow job. (KafkaIO->Filter->WriteGCS).
When I submit this Dataflow job per topic it has 4kps per instance
processing speed. However I want to consume two different topics in one DF
job. I used TupleTag. I created TupleTags per message type. Each topic has
dif
25YQP0vDyhmVKo11YVqx3G8KeT1tYc&e=
>
> -Rui
>
> On Wed, Jul 15, 2020 at 3:55 PM Talat Uyarer >
> wrote:
>
> > Hi Julian,
> >
> > Thanks for your answer. I dont know other dbs but Also Postgresql
> support
> > enum too[1]. Do you think sup
ave a given set of
> values. Then hopefully a storage system would compress repeated values
> to a few bits each.
>
> Feel free to log a JIRA with your requirements. We'll see if anyone
> else wants this, and is prepared to implement it.
>
> Julian
>
> On Tue
Hi,
I am using Beam SQL which uses calcite. IN Beam we have logical types which
we can use for Enumeration. But looks like they did not implement
enumeration support for calcite. I want to give that support. But I could
not find the right way to implement it. In my Enumeration is a
HashMap.
My que
PJNRn4tog4I&e=>
> [3] https://issues.apache.org/jira/browse/BEAM-9977
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D9977&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=RU
Hi,
I am using Dataflow. When I update my DF job with a new topic or update
partition count on the same topic. I can not use DF's update function.
Could you help me to understand why I can not use the update function for
that ?
I checked the Beam source code but I could not find the right place t
eam 2.19 is using gradle 5.2.1, is the installed version
> compatible with that?
>
> Try
> ./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar
> in a clean workspace.
>
> On Fri, Jun 12, 2020 at 4:30 PM Talat Uyarer
> wrote:
>
>> Hi,
>>
>>
Hi,
I want to build the dataflow worker on apache beram 2.19. However I faced a
grpc issue. I did not change anything. Just checked release-2.19.0 branch
and run build command. Could you help me understand why it does not build.
[1]
Additional information, Based on my limited knowledge Looks like
ng and closing bundle*”?
>
> Sometimes, starting a KafkaReader can take time since it will seek for a
> start offset for each assigned partition but it happens only once at
> pipeline start-up and mostly depends on network conditions.
>
> On 9 Jun 2020, at 23:05, Talat Uyarer
> wrote
Hi,
I added some metrics on a step right after KafkaIO. When I compare the read
time difference between producer and KafkaIO it is 800ms for P99. However
somehow that step's opening and closing bundle difference is 18 seconds for
p99. The step itself does not do any specific thing. Do you have any
e.proofpoint.com/v2/url?u=https-3A__beam.apache.org_releases_javadoc_2.21.0_org_apache_beam_sdk_transforms_DoFn.ProcessElement.html&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=KuUWakZ-xaVGYfsw7YGz1WBOLIlpBHikvRxgZs9vWn0&s=nkJq_weo7lr
Q&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=1202mTv7BP1KzcBJECS98dr7u5riw0NHdl8rT8I6Ego&s=cPdnrK4r-tVd0iAO6j7eAAbDPISOdazEYBrPoC9cQOo&e=>
>
>
> On Thu, May 28, 2020 at 1:12 PM Talat Uyarer
> wrote:
>
>> Yes I am trying to track how long
essing takes?
> Do you care about how much CPU time is being consumed in aggregate for all
> the processing that your pipeline is doing?
>
>
> On Thu, May 28, 2020 at 11:01 AM Talat Uyarer <
> tuya...@paloaltonetworks.com> wrote:
>
>> I am using Dataflow Runner.
1 - 100 of 1007 matches
Mail list logo