Windows were processed out of order

2019-06-03 Thread Juan Carlos Garcia
Hi Folks, My team and i have a situation that cannot be explain and would like to hear your thoughts, we have a pipeline which enrich the incoming messages and write them to BigQuery, the pipeline looks like this: Apache Beam 2.12.0 / GCP Dataflow - - ReadFromKafka (with withCreateTime and 1

Re: Windows were processed out of order

2019-06-03 Thread Juan Carlos Garcia
Mon, Jun 3, 2019 at 1:17 PM Juan Carlos Garcia > wrote: > > > > Hi Folks, > > > > My team and i have a situation that cannot be explain and > > would like to hear your thoughts, we have a pipeline which > > enrich the incoming messages and write them to Bi

Re: Why is my RabbitMq message never acknowledged ?

2019-06-14 Thread Juan Carlos Garcia
In my opinion, for such crucial behavior i would expect the pipeline to fail with a clear message stating the reason, like in the same way when you implement a new Codec and forget to override the verifyDeterministic method (don't recall the right name of it). Just my 2 cents. Maximilian Michels

Re: Design question regarding streaming and sorting

2019-08-24 Thread Juan Carlos Garcia
Hi, The main puzzle here is how to deliver the priority change of a row during a given window, my best shot would be to have a side input (PCollectionView) containing the change of priority, then in the slow worker beam transform extract this side input and update the corresponding row with the ne

Re: Design question regarding streaming and sorting

2019-08-24 Thread Juan Carlos Garcia
Sorry I hit too soon, then after updating the priority I would execute group by DoFn and then use an external sorting (by this priority key) to guarantee that on the next DoFn you have a sorted Iterable. JC Juan Carlos Garcia schrieb am Sa., 24. Aug. 2019, 11:08: > Hi, > > The ma

Re: Looping in Dataflow(Creating multiple jobs for a while loop)

2019-09-16 Thread Juan Carlos Garcia
Hi Anjana, You need to separate your line of thoughts between the pipeline definition vs what happens when you call *run* on the pipeline, given that you need externalize the scheduling using something like a crontab, jenkins, or another mechanism. Best regards, JC On Mon, Sep 16, 2019 at 7:57 P

Re: Looping in Dataflow(Creating multiple jobs for a while loop)

2019-09-16 Thread Juan Carlos Garcia
> Thanks for the reply ! I want to know if there is any way in dataflow to > achieve this before trying external scheduler. > > > > Regards, > > Anjana > > > > From: Juan Carlos Garcia [jcgarc...@gmail.com] > > Sent: Monday

Generating a window identifier while using a Trigger AfterProcessingTime.pastFirstElementInPane()

2018-07-19 Thread Juan Carlos Garcia
Hi Folks, I would like to ask if its possible to be notified when a Windows is created or closed while processing a batch of data. (Sorry for the long post) My scenario: I am using a Session window with a GapDuration of 2 minutes (for testing), during this processing we are assigning a Session id

Beam SparkRunner and Spark KryoSerializer problem

2018-07-19 Thread Juan Carlos Garcia
Folks, Its someone using the SparkRunner out there with the Spark KryoSerializer ? We are being force to use the not so efficient 'JavaSerializer' with Spark because we face the following exception: Exception in thread "main" java.lang.RuntimeException: org.apache.spark.SparkException: Job abor

Re: Generating a window identifier while using a Trigger AfterProcessingTime.pastFirstElementInPane()

2018-07-20 Thread Juan Carlos Garcia
gs (e.g. watermark trigger with no speculative or late firings). > > On Thu, Jul 19, 2018 at 6:12 AM Juan Carlos Garcia > wrote: > >> Hi Folks, >> >> I would like to ask if its possible to be notified when a Windows is >> created or closed while processing a batch of

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-07-30 Thread Juan Carlos Garcia
Bump! Does any of the core-dev roam around here? Can someone provide a feedback about BEAM-4597 <https://issues.apache.org/jira/browse/BEAM-4597> Thanks and regards, On Thu, Jul 19, 2018 at 3:41 PM, Juan Carlos Garcia wrote: > Folks, > > Its someone using the SparkRunner out

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-07-30 Thread Juan Carlos Garcia
Hi Jean, Thanks for taking a look. On Mon, Jul 30, 2018 at 2:49 PM, Jean-Baptiste Onofré wrote: > Hi Juan, > > it seems that has been introduce by the metrics layer in the core runner > API. > > Let me check. > > Regards > JB > > On 30/07/2018 14:47, J

Regression of (BEAM-2277) - IllegalArgumentException when using Hadoop file system for WordCount example.

2018-07-30 Thread Juan Carlos Garcia
Hi Folks, I experienced the issued described in (BEAM-2277 ), which shows it was fixed by v2.0.0 However using version 2.4.0 and 2.6.0 (another user reported it) shows the same error. So either it was not 100% fixed, or the bug appeared again. Th

Re: Python-Beam input from Port

2018-08-15 Thread Juan Carlos Garcia
In the way i see it a SocketIIO.read() implementation should just emit String (or byte[]) back. So creating a PTransform that open the port and then on the expand() method create a loop to just read the String (or byte[]) and then emit the values into the pipeline (returning a PColletion). Hope it

Re: Regression of (BEAM-2277) - IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-20 Thread Juan Carlos Garcia
25 PM Juan Carlos Garcia wrote: > Hi Folks, > > I experienced the issued described in (BEAM-2277 > <https://issues.apache.org/jira/browse/BEAM-2277>), which shows it was > fixed by v2.0.0 > > However using version 2.4.0 and 2.6.0 (another user reported it) shows the >

Re: Regression of (BEAM-2277) - IllegalArgumentException when using Hadoop file system for WordCount example.

2018-08-20 Thread Juan Carlos Garcia
issue, and will > reopen the issue. > > Thank you for reminding us on this, > Tim > > > On Mon, Aug 20, 2018 at 4:44 PM Juan Carlos Garcia > wrote: > >> BUMP!!! >> >> There are people reporting the problem for BEAM 2.6.0 >> <https://i

Re: Problem with KafkaIO

2018-09-19 Thread Juan Carlos Garcia
Don't know if its related, but we have seen our pipeline dying (using SparkRunner) when there is problem with Kafka (network interruptions), errors like: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata Maybe this will fix it as well, thanks Raghu fo

Re: Problem with KafkaIO

2018-09-19 Thread Juan Carlos Garcia
Sorry for hijacking the thread, we are running Spark on top of Yarn, yarn retries multiple times until it reachs it max attempt and then gives up. Raghu Angadi schrieb am Mi., 19. Sep. 2018, 18:58: > On Wed, Sep 19, 2018 at 7:26 AM Juan Carlos Garcia > wrote: > >> Don't

Re: Problem with KafkaIO

2018-09-19 Thread Juan Carlos Garcia
Sorry I hit the send button to fast... The error occurs in the worker. Juan Carlos Garcia schrieb am Mi., 19. Sep. 2018, 20:22: > Sorry for hijacking the thread, we are running Spark on top of Yarn, yarn > retries multiple times until it reachs it max attempt and then gives up. > >

KafkaIO needs access to the brokers even before the pipeline reach the worker

2018-09-19 Thread Juan Carlos Garcia
Hi folks, we have a pipeline for Dataflow and our Google cloud environment has a private network (where the pipeline should run, this network interconnect via an IP-sec to an AWS environment where the Kafka brokers are running). We have found that in order to be able to submit the pipeline we have

AvroIO - failure using direct runner with java.nio.file.FileAlreadyExistsException when moving from temp to destination

2018-09-20 Thread Juan Carlos Garcia
I am writing a pipeline that will read from kafka and convert the data into Avro files with a fixed windows of 10min. I am using a *DynamicAvroDestinations *in order to build a dynamic path and select the corresponding schema based on the incoming data. 1.) While testing on my machine (With Direc

Re: AvroIO - failure using direct runner with java.nio.file.FileAlreadyExistsException when moving from temp to destination

2018-09-26 Thread Juan Carlos Garcia
at 12:54 PM Juan Carlos Garcia wrote: > I am writing a pipeline that will read from kafka and convert the data > into Avro files with a fixed windows of 10min. > > I am using a *DynamicAvroDestinations *in order to build a dynamic path > and select the corresponding schema based

Re: AvroIO - failure using direct runner with java.nio.file.FileAlreadyExistsException when moving from temp to destination

2018-09-26 Thread Juan Carlos Garcia
o be implemented when user defined objects are used for > the dynamic destination. Would you mind opening a Jira for that please? > > I hope this helps a little, and thanks again > Tim > > > > > > > > > > > > On Wed, Sep 26, 2018 at 11:24 AM Juan Carlos Garc

Re: Modular IO presentation at Apachecon

2018-09-26 Thread Juan Carlos Garcia
Im really looking forward for a way to monitor the results(like which batch of elements were written per destination if possible 🙆 ) of an IO Module in a consistent way. Nice presentation. Thomas Weise schrieb am Do., 27. Sep. 2018, 06:35: > Thanks for sharing. I'm looking forward to see the re

Ingesting from KafkaIO into HDFS, problems with checkpoint and data lost on top of yarn + spark

2018-10-01 Thread Juan Carlos Garcia
Hi folks we are running a pipeline which as the subject says the we are having issues with data lost. Using KafkaIO (2.0.4 due to the version of our brokers) with commitOnFinalize, we would like to understand how this finalize work together with a FileIO. I studied the KafkaIO and saw that the re

Re: Ingesting from KafkaIO into HDFS, problems with checkpoint and data lost on top of yarn + spark

2018-10-02 Thread Juan Carlos Garcia
ion to hdfs) is something beam > could achieve without lossing data from KafkaIO. > Yes, reading from any supported source and writing to any supported sink > is supported. Otherwise, it would be a bug. > > On Mon, Oct 1, 2018 at 10:25 PM Juan Carlos Garcia > wrote: > >> Hi

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-10-03 Thread Juan Carlos Garcia
Bump, can someone from the core-dev provide a feedback about: https://issues.apache.org/jira/browse/BEAM-4597 Thanks On Mon, Jul 30, 2018 at 3:15 PM Juan Carlos Garcia wrote: > Hi Jean, > > Thanks for taking a look. > > > On Mon, Jul 30, 2018 at 2:49 PM, Jean-Baptis

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-10-04 Thread Juan Carlos Garcia
Hi Jean, Thank you! On Thu, Oct 4, 2018 at 7:54 AM Jean-Baptiste Onofré wrote: > Hi Juan > > I'm on it. > > Regards > JB > Le 4 oct. 2018, à 07:19, Juan Carlos Garcia a écrit: >> >> Bump, >> >> can someone from the core-dev provide a feedback

Re: Apache Beam UI job creator

2018-10-08 Thread Juan Carlos Garcia
I think (maybe i am wrong) but there is already a project within the Google products that aim for this and still is on beta /alpha ... (i can recall the name) And personally i would definitely like to see something like this. Karan Kumar schrieb am Mo., 8. Okt. 2018, 11:24: > Hello > > We w

Re: Spark storageLevel not taking effect

2018-10-12 Thread Juan Carlos Garcia
Hi Mike, >From the documentation on https://beam.apache.org/documentation/runners/spark/#pipeline-options-for-the-spark-runner storageLevel The StorageLevel to use when caching RDDs in batch pipelines. The Spark Runner automatically caches RDDs that are evaluated repeatedly. This is a batch-only

Error with FlinkRunner: No translator known for org.apache.beam.sdk.io.Read$Unbounded

2018-10-16 Thread Juan Carlos Garcia
Hi Folks, I am switching some pipelines from SparkRunner to the FlinkRunner on beam 2.7 I started with a very simple pipeline which just reads from multiple kafka sources, flatten those and then apply a regular DoFn. On my first try of the pipeline (from command line using a fat jar) like: ```

Re: KafkaIO - Deadletter output

2018-10-23 Thread Juan Carlos Garcia
As Raghu said, Just apply a regular ParDo and return a PCollectionTuple afert that you can extract your Success Records (TupleTag) and your DeadLetter records(TupleTag) and do whatever you want with them. Raghu Angadi schrieb am Mi., 24. Okt. 2018, 05:18: > User can read serialized bytes from

No filesystem found for scheme hdfs - with the FlinkRunner

2018-10-26 Thread Juan Carlos Garcia
Hi Folks, I have a strange situation while running beam 2.7.0 with the FlinkRunner, my setup consist of a HA Flink cluster (1.5.4) writing to HDFS its checkpoint. Flink is able to correctly writes its checkpoint / savepoint to HDFS without any problems. However, my pipeline has to write to HDFS a

Re: No filesystem found for scheme hdfs - with the FlinkRunner

2018-10-26 Thread Juan Carlos Garcia
.matchNewResource("hdfs://ha-nn/tmp/beam-avro", > true))); > > > Thanks > Tim > > > On Fri, Oct 26, 2018 at 11:47 AM Juan Carlos Garcia > wrote: > >> Hi Folks, >> >> I have a strange situation while running beam 2.7.0 with the FlinkRunner, >&

Re: No filesystem found for scheme hdfs - with the FlinkRunner

2018-10-26 Thread Juan Carlos Garcia
.to(FileSystems.matchNewResource(options.getTarget(),true)) >> // BEAM-2277 workaround >> .withTempDirectory(FileSystems.matchNewResource("hdfs://ha-nn/tmp/beam-avro", >> true))); >> >> >> Thanks >> Tim >> >> >> On Fri, Oc

Re: Disable a DoFn for a specific runner

2018-11-11 Thread Juan Carlos Garcia
You can use the PipelineOption which can be accessed from within a {@link DoFn} by invoking getPipelineOptions() on the input DoFn.ProcessContext Context object, or the same from the StartBundleContext, and from there you can access to the Runner options you are passing from the command line. JC

Re: Multiple concurrent transforms with SparkRunner

2018-11-13 Thread Juan Carlos Garcia
I suggest to play around with some spark configurations like: dynamic execution parameters https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation Am Mi., 14. Nov. 2018, 02:28 hat Jiayue Zhang (Bravo) geschrieben: > Hi, > > I'm writing Java Beam code to run with both Datafl

Re:

2018-11-29 Thread Juan Carlos Garcia
If you are using Gradle for packaging, make sure your final jar (fat-jar) contains all the services files merged. Using the Gradle shadowJar plugin include "*mergeServiceFiles()*" instruction like: apply plugin: 'com.github.johnrengelman.shadow' shadowJar { mergeServiceFiles() zip64 true

Re: Graceful shutdown of long-running Beam pipeline on Flink

2018-12-02 Thread Juan Carlos Garcia
Hi Wayne, We have the same setup and we do daily updates to our pipeline. The way we do it is using the flink tool via a Jenkins. Basically our deployment job do as follow: 1. Detect if the pipeline is running (it matches via job name) 2. If found, do a flink cancel with a savepoint (we uses h

Re: No Translator Found issue

2018-12-03 Thread Juan Carlos Garcia
Hi Vinay, When generating your Fatjar make sure you are merging the service files (META-INF/services) of your dependencies. Apache Beam relies heavily on the Java service locator to discover / register its components. JC Am Di., 4. Dez. 2018, 03:18 hat Vinay Patil geschrieben: > Hi, > > I am

Re: KafkaIO and added partitions

2019-01-02 Thread Juan Carlos Garcia
+1 Am Mi., 2. Jan. 2019, 14:34 hat Abdul Qadeer geschrieben: > +1 > > On Tue, 1 Jan 2019 at 12:45, wrote: > >> +1 from my side too :-) >> And ideally I would want to have some hooks to let me know the extra >> partitions have been picked up (or a way to query it). >> >> Although if that can't b

Re: KafkaIO not commiting offsets when using withMaxReadTime

2019-01-09 Thread Juan Carlos Garcia
I also experience the same, as per the documentation **withMaxReadTime** and **withMaxNumRecords** are mainly used for Demo purposes, so i guess is beyond the scope of the current KafkaIO to behave as Bounded with offset management or just something is missing in the current implementation (Waterma

Re: KafkaIO not commiting offsets when using withMaxReadTime

2019-01-09 Thread Juan Carlos Garcia
Just for you to have a look where this happen: https://github.com/apache/beam/blob/dffe2c1a2bd95f78869b266d3e1ea3f8ad8c323d/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L584 Cheers On Wed, Jan 9, 2019 at 5:09 PM Juan Carlos Garcia wrote: > I a

Re: Spark

2019-01-17 Thread Juan Carlos Garcia
Hi Matt, during the time we were using Spark with Beam, the solution was always to pack the jar and use the spark-submit command pointing to your main class which will do `pipeline.run`. The spark-submit command have a flag to decide how to run it (--deploy-mode), whether to launch the job on the

Re: Spark

2019-01-18 Thread Juan Carlos Garcia
lution Architect, Kettle Project Founder > > > > > Op do 17 jan. 2019 om 15:30 schreef Juan Carlos Garcia < > jcgarc...@gmail.com>: > >> Hi Matt, during the time we were using Spark with Beam, the solution was >> always to pack the jar and use the spark-submit

Re: kafkaIO Consumer Rebalance with Spark Runner

2019-01-28 Thread Juan Carlos Garcia
Hi Rick, You can limit your Spark processing by passing the following option to your beam pipeline: *MaxRecordsPerBatch* see https://beam.apache.org/releases/javadoc/2.9.0/org/apache/beam/runners/spark/SparkPipelineOptions.html#getMaxRecordsPerBatch-- Hope it helps. JC On Mon, Jan 28,

Re: Spark progress feedback

2019-01-28 Thread Juan Carlos Garcia
Matt is the machine from where you are launching the pipeline different from where it should run? If that's the case make sure the machine used for launching has all the hdfs environments variable set, as the pipeline is being configured in the launching machine before it hit the worker machine.

Re: Spark: No TransformEvaluator registered

2019-01-29 Thread Juan Carlos Garcia
Hi Matt, Are the META-INF/services files merged correctly on the fat-jar? On Tue, Jan 29, 2019 at 2:33 PM Matt Casters wrote: > Hi Beam! > > After I have my pipeline created and the execution started on the Spark > master I get this strange nugget: > > java.lang.IllegalStateException: No Tran

Re: How to call Oracle stored proc in JdbcIO

2019-02-05 Thread Juan Carlos Garcia
I believe this is not a missing feature, as the question is more inclined toward what do you expect from this procedure? Like reading back a ref cursor into a PCollection, or just doing an insert / update via the sp. Going forward on the jdbc realm you just need to create a prepared statement wit

Re: Some questions about ensuring correctness with windowing and triggering

2019-02-12 Thread Juan Carlos Garcia
In my experience ordering is not guaranteed, you may need apply a transformation that sort the elements and then dispatch them sorted out. Or uses the Sorter extension for this: https://github.com/apache/beam/tree/master/sdks/java/extensions/sorter Steve Niemitz schrieb am Di., 12. Feb. 2019, 1

Re: Dealing with "large" checkpoint state of a Beam pipeline in Flink

2019-02-12 Thread Juan Carlos Garcia
Hi Tobias, I think this can happen when there is a lot of backpressure on the pipeline. Don't know if it's normal but i have a pipeline reading from KafkaIO and pushing to bigquery instreaming mode and i have seen checkpoint of almost 1gb and whenever i am doing a savepoint for updating the pipel

Re: Dealing with "large" checkpoint state of a Beam pipeline in Flink

2019-02-12 Thread Juan Carlos Garcia
I forgot to mention that we uses hdfs as storage for checkpoint / savepoint. Juan Carlos Garcia schrieb am Di., 12. Feb. 2019, 18:03: > Hi Tobias, > > I think this can happen when there is a lot of backpressure on the > pipeline. > > Don't know if it's normal but i

FileIO with GCP - KMS support

2019-02-20 Thread Juan Carlos Garcia
Hi Folks, Is there any discussion going on regarding the support to writes to GCP bucket protected with KMS ? Thanks and regards, -- JC

Re: FileIO with GCP - KMS support

2019-02-20 Thread Juan Carlos Garcia
corresponding callback *and it should works as expected. Cheers! On Wed, Feb 20, 2019 at 11:11 AM Juan Carlos Garcia wrote: > Hi Folks, > > Is there any discussion going on regarding the support to writes to GCP > bucket protected with KMS ? > > Thanks and regards, > -- > > JC > > -- JC

Re: FileIO with GCP - KMS support

2019-02-20 Thread Juan Carlos Garcia
Sorry, i hit send before verifying the right name of the method: The correct method name is: *enqueueCopy* On Wed, Feb 20, 2019 at 4:39 PM Juan Carlos Garcia wrote: > For anyone interested on the same while waiting for KMS support, just > place the class on your own p

Re: FileIO with GCP - KMS support

2019-02-20 Thread Juan Carlos Garcia
e (2.11). > > On Wed, Feb 20, 2019 at 7:43 AM Juan Carlos Garcia > wrote: > >> Sorry, i hit send before verifying the right name of the method: >> >> The correct method name is: *enqueueCopy* >> >> On Wed, Feb 20, 2019 at 4:39 PM Juan Carlos Garcia >> w

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-19 Thread Juan Carlos Garcia
Hi Maulik, Have you submitted your job with the correct configuration to enable autoscaling? --autoscalingAlgorithm= --maxWorkers= I am on my phone right now and can't tell if the flags name are 100% correct. Maulik Gandhi schrieb am Di., 19. März 2019, 18:13: > > Maulik Gandhi > 10:19 AM (

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Juan Carlos Garcia
anks. > - Maulik > > On Tue, Mar 19, 2019 at 2:53 PM Juan Carlos Garcia > wrote: > >> Hi Maulik, >> >> Have you submitted your job with the correct configuration to enable >> autoscaling? >> >> --autoscalingAlgorithm= >> --maxWorkers= &

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

2019-03-20 Thread Juan Carlos Garcia
AVRO files, > input counts, etc), rather than just high-level count of input and output > element from ParDo. > > Thanks. > - Maulik > > On Wed, Mar 20, 2019 at 3:17 AM Juan Carlos Garcia > wrote: > >> Your auto scaling algorithm is THROUGHPUT_BASED, it will kicks in onl

GroupByKey and SortValues Transformation

2019-04-18 Thread Juan Carlos Garcia
Hi Folks, I have question regarding *GroupBy* and *SortValue* (via SecondaryKey), The pipeline looks like: ...source+initialTransformations .apply(Window.Into(FixedWindows.of(Duration.standardMinutes(10 .apply("GroupByKey", GroupByKey.create()) *// using my primaryKey (userId)* .apply("Sort

Re: GroupByKey and SortValues Transformation

2019-04-21 Thread Juan Carlos Garcia
key & window. For a given key & window the > outputs contain a "paneIndex" in their PaneInfo metadata that tells you the > order of they were output. Again, for different key & window there are no > ordering restrictions. > > Kenn > > > On Thu, A

Re: Transform a PCollection> into PCollection (Java)

2019-05-01 Thread Juan Carlos Garcia
Hi Andres, You are missing the call to pipeline method *run()* JC On Wed, May 1, 2019 at 4:35 PM Andres Angel wrote: > Hello everyone, > > Guys I'm trying to consume a kafka topic within my job pipeline, the main > idea is firs read the payload from the kafka topic using KafkaIO, this read > w

Re: Transform a PCollection> into PCollection (Java)

2019-05-01 Thread Juan Carlos Garcia
ate my code guys, here is the new version: > https://pastebin.com/UdT4D7VW > > > > On Wed, May 1, 2019 at 11:10 AM Juan Carlos Garcia > wrote: > >> Hi Andres, >> >> You are missing the call to pipeline method *run()* >> >> JC >> >>

Re: kafka client interoperability

2019-05-02 Thread Juan Carlos Garcia
Downgrade only the KafkaIO module to the version that works for you (also excluding any transient dependency of it) that works for us. JC. Lukasz Cwik schrieb am Do., 2. Mai 2019, 20:05: > +dev > > On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < > richard.moorhe...@cerner.com> wrote: > >> I

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-06 Thread Juan Carlos Garcia
As everyone has pointed out there will be a small overhead added by the abstraction but in my own experience its totally worth it. Almost two years ago we decided to jump into the beam wagon, by first deploying into an on-premises hadoop cluster with the Spark engine (just because spark was alread

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-06 Thread Juan Carlos Garcia
gt; > On Mon, May 6, 2019 at 12:24 PM kant kodali wrote: > >> >> >> On Mon, May 6, 2019 at 12:09 PM Juan Carlos Garcia >> wrote: >> >>> As everyone has pointed out there will be a small overhead added by the >>> abstraction but in my own experien