Re: Apache BEAM on Flink in production

2019-05-28 Thread Kaymak, Tobias
We (Ricardo.ch) are using it in production and I will share our use-case during the upcoming Apache Beam Summit in Berlin in a few weeks. I don't know if there will be recordings, but I will share the slides afterwards. :) https://beamsummit.org/schedule/2019-06-20?sessionId=114625 On Thu, May 9,

Using Beam 2.14.0 and Flink 1.8.1 - getting {"errors":["Not found."]} in the web ui

2019-08-05 Thread Kaymak, Tobias
Hello, I've upgraded my dependencies from Beam 2.12 to Beam 2.14 and from Flink 1.7.2 to 1.8.1. I've increased my log level down to INFO for all components, the flink job manager looks fine. I can launch jobs via the cmdline and I can see them when I run `flink list` - but the webinterface is ret

Re: Using Beam 2.14.0 and Flink 1.8.1 - getting {"errors":["Not found."]} in the web ui

2019-08-06 Thread Kaymak, Tobias
I had a fat JAR that was using Flink 1.7 in its POM. After bumping that to 1.8 it works :) On Mon, Aug 5, 2019 at 4:49 PM Kaymak, Tobias wrote: > Hello, > > I've upgraded my dependencies from Beam 2.12 to Beam 2.14 and from Flink > 1.7.2 to 1.8.1. I've increased my log

Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-06 Thread Kaymak, Tobias
Hello, Flink requires in version 1.8, that if one wants to use RocksDB as a state backend, that dependency has to be added to the pom.xml file. [0] My cluster stopped working with RocksDB so I did added this dependency to the pom.xml of my Beam project (I've tried 1.8.1 and 1.8.0):

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-06 Thread Kaymak, Tobias
the Flink Runner). On Tue, Aug 6, 2019 at 4:35 PM Kaymak, Tobias wrote: > Hello, > > Flink requires in version 1.8, that if one wants to use RocksDB as a state > backend, that dependency has to be added to the pom.xml file. [0] > > My cluster stopped working with RocksDB so I

Trying to rebuild the Timely/Stateful Processing example from the Beam Blog

2019-08-07 Thread Kaymak, Tobias
Hello, during a hackathon we are trying to rebuild the [0] Batched RPC call example written by Kenneth Knowles. There is a question on SO about it that I am trying to answer (as I think the Windowing is not done correctly) [1]. While doing so I discovered that it is unclear to me how the `staleSet

WordCount example breaks for FlinkRunner (local) for 2.14.0

2019-08-07 Thread Kaymak, Tobias
Hello, after generating the examples in 2.14.0 via: mvn archetype:generate \ -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=2.14.0 \ -DgroupId=org.example \ -DartifactId=word-count-beam \

Re: WordCount example breaks for FlinkRunner (local) for 2.14.0

2019-08-07 Thread Kaymak, Tobias
e quickstart guide or documentation which told you to use mvn > package and then java -jar beam.jar that we should be updating? > > On Wed, Aug 7, 2019 at 9:27 AM Kaymak, Tobias > wrote: > >> Hello, >> >> after generating the examples in 2.14.0

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-12 Thread Kaymak, Tobias
relevant changes to the RocksDB state > backend in 1.8.1, but I couldn't spot anything. Could it be that an old > version of RocksDB is still in the Flink cluster path? > > Cheers, > Max > > On 06.08.19 16:43, Kaymak, Tobias wrote: > > And of course the moment I click

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-12 Thread Kaymak, Tobias
* each time :) On Mon, Aug 12, 2019 at 9:48 PM Kaymak, Tobias wrote: > I've checked multiple times now and it breaks as with the 1.8.1 image - > I've completely rebuilt the Docker image and teared down the testing > cluster. > > Best, > Tobi > > On Mon,

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-13 Thread Kaymak, Tobias
. Any idea why this could happen? On Mon, Aug 12, 2019 at 9:49 PM Kaymak, Tobias wrote: > * each time :) > > On Mon, Aug 12, 2019 at 9:48 PM Kaymak, Tobias > wrote: > >> I've checked multiple times now and it breaks as with the 1.8.1 image - >> I've completel

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-13 Thread Kaymak, Tobias
This is a major issue for us as we are no longer able to do a clean-shutdown of the pipelines right now - only cancelling them hard is possible. On Tue, Aug 13, 2019 at 2:46 PM Kaymak, Tobias wrote: > I just rolled out the upgraded and working 1.8.0/2.14.0 combination to > producti

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-13 Thread Kaymak, Tobias
table/ops/state/state_backends.html#setting-the-per-job-state-backend [1] https://hub.docker.com/_/flink [2] https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.8.html#rocksdb-version-bump-and-switch-to-frocksdb-flink-10471 On Tue, Aug 13, 2019 at 2:50 PM Kaymak, Tobias wrote: > Thi

Re: Using Beam 2.14.0 + Flink 1.8 with RocksDB state backend - perhaps missing dependency?

2019-08-14 Thread Kaymak, Tobias
ug 13, 2019 at 3:13 PM Kaymak, Tobias wrote: > Ok I think I have an understanding of what happens - somehow. > Flink switched their RocksDB fork in the 1.8 release, this is why the > dependency must now be explicitly added to a project. [0] > I did both actually, adding this dependency t

Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-20 Thread Kaymak, Tobias
Hello, I am trying to upgrade from a Flink session cluster 1.8 to 1.9 and from Beam 2.16.0 to 2.19.0. Everything went quite smoothly, the local runner and the local Flink runner work flawlessly. However when I: 1. Generate a Beam jar for the FlinkRunner via maven (mvn package -PFlinkRunner) 2

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-23 Thread Kaymak, Tobias
check is removed in Flink 1.10: > https://issues.apache.org/jira/browse/FLINK-15201 > > Thanks for reporting. > Kyle > > On Thu, Feb 20, 2020 at 4:10 AM Kaymak, Tobias > wrote: > >> Hello, >> >> I am trying to upgrade from a Flink session cluster 1.8 to 1.9 a

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-26 Thread Kaymak, Tobias
t; BEAM-9299 Upgrade Flink Runner to 1.8.3 and 1.9.2 > > > > In any case if you have cycles to help test any of the related tickets > > PRs that would help too. > > > > > > On Mon, Feb 24, 2020 at 8:47 AM Kaymak, Tobias > <mailto:tobias.kay...@rica

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-26 Thread Kaymak, Tobias
Hello, we fixed the issue and are ready to test :) - is there a RC already available? Best, Tobi On Wed, Feb 26, 2020 at 12:59 PM Kaymak, Tobias wrote: > Hello, > > happy to help testing! I am currently fixing a networking issue between > our dev cluster for integration tests and

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-26 Thread Kaymak, Tobias
If I am not running in detached mode (so that my pipeline starts) I am unable to Stop it in the webinterface. The only option available is to cancel it. Is this expected? [image: Screenshot 2020-02-26 at 16.34.08.png] On Wed, Feb 26, 2020 at 4:16 PM Kaymak, Tobias wrote: > Hello, > >

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-27 Thread Kaymak, Tobias
0-SNAPSHOT Best, Tobi On Wed, Feb 26, 2020 at 5:07 PM Ismaël Mejía wrote: > Since it was merged yesterday you can test with the 2.20.0-SNAPSHOT until > the first candidate is out. > > On Wed, Feb 26, 2020 at 4:37 PM Kaymak, Tobias > wrote: > >> If I am not ru

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-27 Thread Kaymak, Tobias
Copy paste error, sorry: 2.20.0-SNAPSHOT in combination with beam-runners-flink-1.10 or beam-runners-flink-1.10-SNAPSHOT didn't work either for me. On Thu, Feb 27, 2020 at 11:50 AM Kaymak, Tobias wrote: > I can confirm that the pipeline behaves as expected with 2.20.0-SNAPSHOT >

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-02-27 Thread Kaymak, Tobias
.org/jira/browse/BEAM-9299 > > Regards, > Ismaël > > > On Thu, Feb 27, 2020 at 11:53 AM Kaymak, Tobias > wrote: > >> Copy paste error, sorry: >> >> 2.20.0-SNAPSHOT in combination with beam-runners-flink-1.10 >> or beam-runners-flink-1.10-SNAPSHO

Re: Beam 2.19.0 / Flink 1.9.1 - Session cluster error when submitting job "Multiple environments cannot be created in detached mode"

2020-03-02 Thread Kaymak, Tobias
se. If so, we may consider adding it again. > > Cheers, > Max > > PS: Concerning the web interface with 1.9.2, I'm not sure what changes > your Jar contain but we'll have to look into this when we upgrade to > 1.9.2 in Beam. > > On 28.02.20 14:59, Kaymak, Tobi

Re: Hello Beam Community!

2020-03-17 Thread Kaymak, Tobias
Welcome Brittany! :) On Fri, Mar 13, 2020 at 7:30 PM Robert Bradshaw wrote: > Welcome! > > On Fri, Mar 13, 2020 at 7:41 AM Ismaël Mejía wrote: > >> Welcome ! >> >> On Fri, Mar 13, 2020 at 3:00 PM Connell O'Callaghan >> wrote: >> >>> Welcome Brittany >>> >>> On Fri, Mar 13, 2020 at 6:45 AM

Enriching a stream by looking up from a huge table (2 TiB)+

2020-05-13 Thread Kaymak, Tobias
Hi, First of all thank you for the Webinar Beam Sessions this month. They are super helpful especially for getting people excited and on-boarded with Beam! We are currently trying to promote Beam with more use cases within our company and tackling a problem, where we have to join a stream of arti

[Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-26 Thread Kaymak, Tobias
I have the following class definition: public class EnrichedArticle implements Serializable { // ArticleEnvelope is generated via Protobuf private ArticleProto.ArticleEnvelope article; // Asset is a Java POJO private List assets; public EnrichedArticle(ArticleProto.ArticleEnvelope arti

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-27 Thread Kaymak, Tobias
to convert a > PCollection to a PCollection with Convert.toRows [1]. > > Brian > > [1] > https://beam.apache.org/releases/javadoc/2.22.0/org/apache/beam/sdk/schemas/transforms/Convert.html#toRows-- > > On Fri, Jun 26, 2020 at 3:19 AM Kaymak, Tobias > wrote: >

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-27 Thread Kaymak, Tobias
JavaBeanSchema, or make >> it into an AutoValue and use AutoValueSchema. >> >> Once you do that you should be able to convert a >> PCollection to a PCollection with Convert.toRows [1]. >> >> Brian >> >> [1] >> https://beam.apach

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-30 Thread Kaymak, Tobias
o this since >>> it's generated code and you can't use @DefaultSchema (+Reuven Lax >>> and +Alex Van Boxel in case >>> they have better advice), you might try just registering a provider >>> manually when you create the pipeline, something like >>&g

Recommended way to generate a TableRow from Json without using TableRowJsonCoder context OUTER?

2020-07-08 Thread Kaymak, Tobias
As a workaround I am currently using the following code to generate a TableRow object from a Java Protobuf class - as I am facing a problem with Beam schemas ( https://www.mail-archive.com/user@beam.apache.org/msg05799.html). It relies on the TableRowJsonCoder: String json = JsonFormat.prin

Re: Recommended way to generate a TableRow from Json without using TableRowJsonCoder context OUTER?

2020-07-09 Thread Kaymak, Tobias
Base64.getEncoder() >> .encodeToString( >> >> ((BytesValue) x) >> >>

Re: Beam supports Flink Async IO operator

2020-07-09 Thread Kaymak, Tobias
Hi Eleanore, Maybe batched RPC is what you are looking for? https://beam.apache.org/blog/timely-processing/ On Wed, Jul 8, 2020 at 6:20 PM Eleanore Jin wrote: > Thanks Luke and Max for the information. > > We have the use case that inside a DoFn, we will need to call external > services to trig

Re: Registering Protobuf schema

2020-07-12 Thread Kaymak, Tobias
This sounds like it is related to the problem I'm trying to solve. (In my case having a Java POJO containing a protobuf backed-class and trying to generate a Beam Schema from it.) I would be very interested to a solution to this as well :) On Tue, Jul 7, 2020 at 2:22 PM wrote: > Hi All, > > > >

Running a batch pipeline on the Classic Java Flink Runner - pipeline starts, but shell blocks

2020-07-16 Thread Kaymak, Tobias
Hello, I have a batch pipeline with Beam 2.22.0 reading about 200 GiB from BigQuery, mapping the data and writing it out via CassandraIO. When I run the pipeline via the Classic Java Flink runner on a 1.10.1 Flink cluster I face the following issue: When launching the pipeline via bin/flink run

CassandraIO - random failures in batch mode while writing - how to recover?

2020-07-16 Thread Kaymak, Tobias
Hello, I am trying to load a table that is about 200 GiB in size in BigQuery to Cassandra via a batch job in Beam 2.22.0 on Flink 1.10.1 - the job runs but fails at random points in time throwing different errors each time - and not always at the same points in the data (which comes in pretty clea

Re: CassandraIO - random failures in batch mode while writing - how to recover?

2020-07-23 Thread Kaymak, Tobias
Ok so the easiest way out of this was to set the consistency level to `LOCAL_QUORUM` in the CassandraIO, this way everything went smoothly. On Thu, Jul 16, 2020 at 9:09 PM Kaymak, Tobias wrote: > Hello, > > I am trying to load a table that is about 200 GiB in size in BigQuery to > C

Running a Beam batch job in Flink runner leads to a successful outcome, but Flink does not consider it "Completed"

2020-07-23 Thread Kaymak, Tobias
Hello, After running a successful batch job on Flink 1.10.1 with Beam 2.22.0 I was expecting to see the job in the "completed" section of the Flink webinterface. That was not the case, the following log of the taskmanager at DEBUG level shows that something within the shutdown of the job might wen

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-08-27 Thread Kaymak, Tobias
> > > Luke: Regarding recursive schemas, Reuven and I have had some discussions > about it offline. I think he said it should be feasible but I don't know > much beyond that. > > Brian > > [1] https://issues.apache.org/jira/browse/BEAM-10765 > > On Tue, Jun 30, 202

Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-01 Thread Kaymak, Tobias
Hello, when deploying a Beam streaming pipeline on Flink and canceling it after some time, the following can be seen in the logs: 2020-10-01 07:36:47,605 WARN io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop- Unexpected exception in the selector loop. flink-taskmanager-7695c66775-x

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
fy which JAR file contains >> io.grpc.netty.shaded.io.netty.util.collection.IntObjectHashMap. Can you >> check which version of which artifact (I suspect io.grpc:grpc-netty) has >> the class in your runtime? >> >> As far as I know, Beam's vendored (shaded) class files have the package >> name "org

UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
Hello, In chapter 4 of the Streaming Systems book by Tyler, Slava and Reuven there is an example 4-6 on page 111 about custom windowing that deals with UnalignedFixedWindows: https://www.oreilly.com/library/view/streaming-systems/9781491983867/ch04.html Unfortunately that example is abbreviated a

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
I think this was caused by having the flink-runner defined twice in my pom. Oo (one time as defined with scope runtime, and one time without) On Fri, Oct 2, 2020 at 9:38 AM Kaymak, Tobias wrote: > Sorry that I forgot to include the versions, currently I'm on Beam 2.23.0 > / Flink

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-10-02 Thread Kaymak, Tobias
No, that was not the case. I'm still seeing this message when canceling a pipeline. Sorry the spam. On Fri, Oct 2, 2020 at 12:22 PM Kaymak, Tobias wrote: > I think this was caused by having the flink-runner defined twice in my > pom. Oo > (one time as defined with scope runtime

Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
This is what I came up with: https://gist.github.com/tkaymak/1f5eccf8633c18ab7f46f8ad01527630 The first run looks okay (in my use case size and offset are the same), but I will need to add tests to prove my understanding of this. On Fri, Oct 2, 2020 at 12:05 PM Kaymak, Tobias wrote: > He

Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-02 Thread Kaymak, Tobias
PM Kaymak, Tobias wrote: > This is what I came up with: > > https://gist.github.com/tkaymak/1f5eccf8633c18ab7f46f8ad01527630 > > The first run looks okay (in my use case size and offset are the same), > but I will need to add tests to prove my understanding of this. > > On F

Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-05 Thread Kaymak, Tobias
ssion windows? The window would start at the > timestamp of the article, and the Session gap duration would be the > (event-time) timeout after which you stop waiting for assets to join that > article. > > On Fri, Oct 2, 2020 at 3:05 AM Kaymak, Tobias > wrote: > >> Hello

Re: UnalignedFixedWindows - how to join to streams with unaligned fixed windows / full source code from the Streaming Systems book?

2020-10-06 Thread Kaymak, Tobias
to join A with B should the session windows then overlap? On Mon, Oct 5, 2020 at 11:10 AM Kaymak, Tobias wrote: > Hi Reuven, > Thank you for your response. > > Yes, I've tested session windows with a gap of 10 minutes as I thought > this should work in this scenario. >

Beam 2.25.0 / Flink 1.11.2 - Job failing after upgrading from 2.24.0 / Flink 1.10.2

2020-11-04 Thread Kaymak, Tobias
When running our Kafka-To-BigQuery pipeline with the Flink 1.11.2 Docker image, the following exception is visible for the failing job on the *job manager*: 2020-11-04 09:27:14 java.lang.RuntimeException: Failed to cleanup global state. at org.apache.beam.runners.flink.translation.wrappers.str

Re: Beam 2.25.0 / Flink 1.11.2 - Job failing after upgrading from 2.24.0 / Flink 1.10.2

2020-11-04 Thread Kaymak, Tobias
I see the same problem for Beam 2.25.0 / Flink 1.11.1 and 1.10.2 so it seems to be related to the upgrade to Beam 2.25.0 from 2.24.0

Beam streaming BigQueryIO pipeline on a Flink cluster misses elements

2020-11-04 Thread Kaymak, Tobias
Hello, while investigating potential benefits of switching BigQueryIO from FILE_LOADS to streaming inserts, I found a potential edge case that might be related to the way the BigQueryIO is being handled on a Flink cluster: Flink's task manager are run as pre-emptible instances in a GKE cluster's

Re: Beam 2.25.0 / Flink 1.11.2 - Job failing after upgrading from 2.24.0 / Flink 1.10.2

2020-11-04 Thread Kaymak, Tobias
) might have issues related to rocksdb, can you > file a Jira for that, please? > > Thanks, > > Jan > On 11/4/20 9:50 AM, Kaymak, Tobias wrote: > > When running our Kafka-To-BigQuery pipeline with the Flink 1.11.2 Docker > image, > the following exception is visible

Re: Beam streaming BigQueryIO pipeline on a Flink cluster misses elements

2020-11-06 Thread Kaymak, Tobias
at 1:36 PM Kaymak, Tobias wrote: > Hello, > > while investigating potential benefits of switching BigQueryIO from > FILE_LOADS to streaming inserts, I found a potential edge case that might > be related to the way the BigQueryIO is being handled on a Flink cluster: > > Fli

Re: Java/Flink - Flink's shaded Netty and Beam's Netty clash

2020-12-04 Thread Kaymak, Tobias
n/IntObjectHashMap$PrimitiveIterator.class >> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap.class >> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$1.class >> >> io/grpc/netty/shaded/io/netty/util/collection/IntObjectHashMap$KeySet$1.class &

Testing a DoFn with a Create.of() and a KafkaRecord

2020-12-09 Thread Kaymak, Tobias
According to the documentation [0] the Create.of() works only for "Standard" types, but shouldn't it in theory also work for non-standard types when the Coder is specified? I want to test a DoFn that receives KafkaRecord as an input: KafkaRecord input = new KafkaRecord(topic, partition, offset

Re: Testing a DoFn with a Create.of() and a KafkaRecord

2020-12-09 Thread Kaymak, Tobias
5:42 PM Kaymak, Tobias wrote: > According to the documentation [0] the Create.of() works only for > "Standard" types, but shouldn't it in theory also work for non-standard > types when the Coder is specified? > > I want to test a DoFn that receives KafkaRecord as

Beam 2.27.0 - Flink 1.11.3 - watermark not progressing?

2021-02-12 Thread Kaymak, Tobias
Hello, I am currently updating from Beam 2.25.0 to Beam 2.27.0 and from Flink 1.10.3 to Flink 1.11.3. My pipeline does read from 2 Kafka topics and windowing them via: Window.>into( FixedWindows.of(Duration.standardMinutes(1))) .withAllowedLateness(Duration.standa

Re: Beam stuck when trying to run on remote Flink cluster

2021-02-12 Thread Kaymak, Tobias
Hey Nir, Could you elaborate on how your setup looks? Are you using Java or Python? Which Flink version? Is the cluster running on K8s? Do you use the portable runner or the classic one? On Mon, Feb 8, 2021 at 5:41 PM Joseph Zack wrote: > unsubscribe > > On Mon, Feb 8, 2021 at 5:06 AM Nir Gazit

How to use of BigQueryIO Method.FILE_LOADS when reading from a unbounded source?

2018-10-10 Thread Kaymak, Tobias
I am trying to read from an unbounded source and using FILE_LOADS instead of streaming inserts towards BigQuery. If I don't have the following two lines .withMethod(BigQueryIO.Write.Method.FILE_LOADS) .withTriggeringFrequency(Duration.standardMinutes(10)) my code works just fine, but uses str

Re: How to use of BigQueryIO Method.FILE_LOADS when reading from a unbounded source?

2018-10-10 Thread Kaymak, Tobias
; e); > } > } > > > > are you sure your templocation is set correctly? I guess it’s needed for > staging a bigquery load job instead of streaming. > > > > Wout > > > > > > > > *From: *"Kaymak, Tobias" > *Reply-To:

KafkaIO - Deadletter output

2018-10-23 Thread Kaymak, Tobias
Hi, Is there a way to get a Deadletter Output from a pipeline that uses a KafkaIO connector for it's input? As `TimestampPolicyFactory.withTimestampFn()` takes only a SerializableFunction and not a ParDo, how would I be able to produce a Deadletter output from it? I have the following pipeline de

Re: KafkaIO - Deadletter output

2018-10-29 Thread Kaymak, Tobias
gt;>>>>>> >>>>>>>>>>>>>>> On Wed, Oct 24, 2018 at 7:19 AM Chamikara Jayalath < >>>>>>>>>>>>>>> chamik...@google.com> wrote: >>>>>>>>>>>>>>&

Experience with KafkaIO -> BigQueryIO

2018-11-06 Thread Kaymak, Tobias
Hi, I am sharing my experience with you after trying to use the following pipeline logic (with Beam 2.6.0 - running on Flink 1.5): 1. Reading from KafkaIO, attaching a timestamp from each parsed element 2. Filtering bad records 3. Writing to a partitioned table in BigQuery with FILE_LOADS (batch

Re: Experience with KafkaIO -> BigQueryIO

2018-11-07 Thread Kaymak, Tobias
ssign it to a partition. Is this better? > On Tue, Nov 6, 2018 at 3:44 AM Kaymak, Tobias > wrote: > >> Hi, >> >> I am sharing my experience with you after trying to use the following >> pipeline >> logic (with Beam 2.6.0 - running on Flink 1.5): >&g

Re: Experience with KafkaIO -> BigQueryIO

2018-11-07 Thread Kaymak, Tobias
ig through the temporary file structure in the GCS bucket used by BigQuery.IO before loading the tables, but that is quite a challenge. > On Tue, Nov 6, 2018 at 3:44 AM Kaymak, Tobias > wrote: > >> Hi, >> >> I am sharing my experience with you after trying to use the f

Re: Experience with KafkaIO -> BigQueryIO

2018-11-08 Thread Kaymak, Tobias
On Wed, Nov 7, 2018 at 6:49 PM Raghu Angadi wrote: > > On Wed, Nov 7, 2018 at 5:04 AM Kaymak, Tobias > wrote: > >> >> On Tue, Nov 6, 2018 at 6:58 PM Raghu Angadi wrote: >> >>> You seem to be reading from multiple topics and your timestamp policy is &g

Re: Experience with KafkaIO -> BigQueryIO

2018-11-09 Thread Kaymak, Tobias
ank you, >> >> Tobi >> >> >> >>> Raghu. >>> >>> >>>> I could also not fiddle with the timestamp at all and let the system >>>> decide and >>>> then in the BigQuery.IO partitioning step parse it and assign it to

Re: Beam Metrics using FlinkRunner

2018-12-07 Thread Kaymak, Tobias
I am using the Flink Prometheus connector and I get metrics in Prometheus for my running pipelines. - I am not looking at metrics in the Flink dashboard directly. Have you tried that? (Talk with background and example: https://github.com/mbode/flink-prometheus-example) On Tue, Dec 4, 2018 at 7:49

No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-25 Thread Kaymak, Tobias
Hi, I am trying to migrate my existing KafkaToGCS pipeline to a KafkaToBigQuery pipeline to skip the loading step from GCS which is currently handled externally from Beam. I noticed that the pipeline, written in Beam 2.9.0 (Java) does not trigger any checkpoint on Flink (1.5.5), even though its c

Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-28 Thread Kaymak, Tobias
I was spending some time with the "Streaming Systems" [0] book over the weekend and I thought that my pipeline might be doing something "too much" as the BigQuery sink already should partition the data by day and put it in the right place - so can my windowing function in the following pipeline be

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-28 Thread Kaymak, Tobias
;> >> https://cloud.google.com/bigquery/docs/partitioned-tables >> >> Cheers >> >> Reza >> >> >> On Mon, 28 Jan 2019 at 17:11, Kaymak, Tobias >> wrote: >> >>> I was spending some time with the "Streaming Systems" [0] book

Re: No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-28 Thread Kaymak, Tobias
`runner` to `FlinkRunner`? > > If possible, could you share parts of the Flink logs? > > Thanks, > Max > > On 25.01.19 15:14, Kaymak, Tobias wrote: > > Hi, > > > > I am trying to migrate my existing KafkaToGCS pipeline to a > KafkaToBigQuery > >

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-28 Thread Kaymak, Tobias
But it seems like that it fails, when I remove the windowing from the pipeline, so I guess the answer is a no. On Mon, Jan 28, 2019 at 11:36 AM Kaymak, Tobias wrote: > Yes I am making use of partitioned tables, that's why I was wondering if > the windowing step could be skipped. :)

Re: No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-29 Thread Kaymak, Tobias
rting at offset 13478 2019-01-29 09:21:58,145 INFO org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Reader-0: reading from ratings-9 starting at offset 12966 On Mon, Jan 28, 2019 at 3:36 PM Kaymak, Tobias wrote: > Hi Maximilian, > > yes, I've set the --runner to FlinkRunn

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Kaymak, Tobias
s with a PubSub source and a partitioned >> table sink and was able to push things through, it was a very simple >> pipeline through, with BigQueryIO.to() set to simple string. >> >> Cheers >> >> Reza >> >> On Mon, 28 Jan 2019 at 22:39, Kaymak, Tobia

Re: Is a window function necessary when using a KafkaIO source and a BigQueryIO sink?

2019-01-29 Thread Kaymak, Tobias
; something together over the next week or so and post back here, might see > if its something that would suite a short blog as well. > > Cheers > > Reza > > > > On Tue, 29 Jan 2019 at 19:56, Kaymak, Tobias > wrote: > >> I am using FILE_LOADS and no timestamp co

Re: Tuning BigQueryIO.Write

2019-01-30 Thread Kaymak, Tobias
Hi, I am currently playing around with BigQueryIO options, and I am not an expert on it, but 60 workers sounds like a lot to me (or expensive computation) for 10k records hitting 2 tables each. Could you maybe share the code of your pipeline? Cheers, Tobi On Tue, Jan 22, 2019 at 9:28 PM Jeff Klu

Re: No checkpoints for KafkaIO to BigQueryIO pipeline on Flink runner?

2019-01-30 Thread Kaymak, Tobias
t; > That means you have to wait 5 minutes until the first checkpoint will be > taken. > You should be seeing an INFO message like this: "INFO: Triggering > checkpoint 1 @ > 1548775459114 for job 3b5bdb811f1923bf49db24403e9c1ae9." > > Thanks, > Max > > On 29.0

Dealing with "large" checkpoint state of a Beam pipeline in Flink

2019-02-12 Thread Kaymak, Tobias
Hi, my Beam 2.10-SNAPSHOT pipeline has a KafkaIO as input and a BigQueryIO configured with FILE_LOADS as output. What bothers me is that even if I configure in my Flink 1.6 configuration state.backend: rocksdb state.backend.incremental: true I see states that are as big as 230 MiB and checkpoint

Re: Dealing with "large" checkpoint state of a Beam pipeline in Flink

2019-02-12 Thread Kaymak, Tobias
gt; goes up to 8 GB of data on a savepoint. >> >> I am on Flink 1.5.x, on premises also using Rockdb and incremental. >> >> So far my only solutionto avoid errors while checkpointing or >> savepointing is to make sure the checkpoint Timeout is high enough like 20m >>

Beam pipeline should fail when one FILE_LOAD fails for BigQueryIO on Flink

2019-02-13 Thread Kaymak, Tobias
Hello, I have a BigQuery table which ingests a Protobuf stream from Kafka with a Beam pipeline. The Protobuf has a `log Map` column which translates to a field "log" of type RECORD with unknown fields in BigQuery. So I scanned my whole stream to know which schema fields to expect and created an e

Re: Beam pipeline should fail when one FILE_LOAD fails for BigQueryIO on Flink

2019-02-14 Thread Kaymak, Tobias
table. Dropping those values > may or may not be acceptable for your use case. > > [0] > https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#complex_types > > On Wed, Feb 13, 2019 at 9:27 AM Kaymak, Tobias > wrote: > >> Hello, >> >&

Re: Beam pipeline should fail when one FILE_LOAD fails for BigQueryIO on Flink

2019-02-22 Thread Kaymak, Tobias
/BigQueryIO.java#L1901 On Thu, Feb 14, 2019 at 1:05 PM Kaymak, Tobias wrote: > Thank you Jeff, > > I think 1 is a bug and I am planning to report it in the bugtracker > regarding 2 I have now a fixed TableSchema supplied to this pipeline with > the expected fields, and I am ignoring

Flink 1.7.2 + Beam 2.11 error: The transform beam:transform:create_view:v1 is currently not supported.

2019-03-28 Thread Kaymak, Tobias
Hello, I just upgraded to Flink 1.7.2 from 1.6.2 with my dev cluster and from Beam 2.10 to 2.11 and I am seeing this error when starting my pipelines: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedPr

Re: Flink 1.7.2 + Beam 2.11 error: The transform beam:transform:create_view:v1 is currently not supported.

2019-03-28 Thread Kaymak, Tobias
KafkaToBigQuery::convertUserEventToTableRow) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)); On Thu, Mar 28, 2019 at 5:13 PM Kaymak, Tobias wrote: > Hello, > > I just upgraded to Flink 1.7.2

Re: Flink 1.7.2 + Beam 2.11 error: The transform beam:transform:create_view:v1 is currently not supported.

2019-03-29 Thread Kaymak, Tobias
ou set > the "streaming" flag to true. Will be fixed for the 2.12.0 release. > > Thanks, > Max > > On 28.03.19 17:28, Lukasz Cwik wrote: > > +dev <mailto:d...@beam.apache.org> > > > > On Thu, Mar 28, 2019 at 9:13 AM Kaymak, Tobias > <mailto:t

Re: Running Beam 2.11 in Flink 1.6.2: All metrics 0 for Solr pipeline

2019-04-06 Thread Kaymak, Tobias
sk. There is no downstream task in case of this chained > operator because it writes directly to Solr. > > I'm thinking, we might expose a Flink pipeline option to control > chaining in Beam. However, users usually want to apply chaining because > it is a great optimization techniqu

Beam/Flink's netty versions seems to clash (2.32.0 / 1.13.1)

2021-09-22 Thread Kaymak, Tobias
Hello, while upgrading our cluster and pipelines to the new visions I noticed: java.lang.NoClassDefFoundError: io/grpc/netty/shaded/io/netty/channel/FailedChannelFuture so I checked the maven dependency tree and found that the: org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.32.0:

Re: Beam/Flink's netty versions seems to clash (2.32.0 / 1.13.1)

2021-09-23 Thread Kaymak, Tobias
r than netty. > > Stacktrace and the dependency tree would be helpful to understand why the > discrepancy occurred. > > Regards, > Tomo > > On Wed, Sep 22, 2021 at 12:21 Reuven Lax wrote: > >> Are you using any of the GCP IOs in your pipeline? >> >

Re: Beam/Flink's netty versions seems to clash (2.32.0 / 1.13.1)

2021-09-23 Thread Kaymak, Tobias
delivered (and ignore this warning for now, and wait for the next release and then upgrade to it)? Best, Tobi On Thu, Sep 23, 2021 at 2:51 PM Kaymak, Tobias wrote: > Hello, > I am using the GCP IOs to connect to BigQuery and Bigtable. I seem to have > improved the situation by including

Re: Beam/Flink's netty versions seems to clash (2.32.0 / 1.13.1)

2021-09-23 Thread Kaymak, Tobias
gt; On Thu, Sep 23, 2021 at 6:22 AM Kaymak, Tobias > wrote: > >> Hi +Reuven Lax - >> >> I saw your post when researching the error above: >> https://stackoverflow.com/a/69111493 :) >> >> As we are on Flink 1.13.1 in the middle of a move, it would be tough to

Kafka -> BigQueryIO Beam/Dataflow job ends up with weird encoding

2022-01-10 Thread Kaymak, Tobias
Hello and Happy New Year! I am migrating a Java Beam pipeline from 2.27.0 to 2.34.0 and from Flink to Dataflow. I have unit tests for the easy ParDo transforms but along the way somehow my encoding gets screwed up. I replaced my JSON to TableRow step with the one from the official Google/Teleport

Re: Kafka -> BigQueryIO Beam/Dataflow job ends up with weird encoding

2022-01-10 Thread Kaymak, Tobias
Hello Tobias, Have you thought about the encoding of the String? No you have not - adding: String input = new String(data, StandardCharsets.UTF_8); Solved the issue for me. Have a great 2022! On Mon, Jan 10, 2022 at 9:19 PM Kaymak, Tobias wrote: > Hello and Happy New Year! >