Hello,
I am currently evaluating Apache Beam (later executing on Google DataFlow),
and for the first use-case I am working on, I have a kinda design question
to see if any of you already had a similar one.
Namely, we have a DB describing dashboards views, and for each views, we
would like to perfor
As a bonus, here is a simplified diagram view of the use-case:
Cheers,
Pascal
On Thu, Aug 16, 2018 at 3:12 PM, Pascal Gula wrote:
> Hello,
> I am currently evaluating Apache Beam (later executing on Google
> DataFlow), and for the first use-case I am working on, I have a kinda
> design questio
Raghu, based upon your description, do you think it would be a good for
KafkaIO to checkpoint on the first read without producing any actual
records?
On Wed, Aug 15, 2018 at 11:49 AM Raghu Angadi wrote:
>
> It is due to "enable.autocommit=true". Auto commit is an option to Kafka
> client and ho
Hi Pascal,
As far as I know, you can't create sub-pipeline within a DoFn, i.e. nested
pipelines are not supported.
Best,
Robin
On Thu, Aug 16, 2018 at 7:03 AM Pascal Gula wrote:
> As a bonus, here is a simplified diagram view of the use-case:
>
> Cheers,
> Pascal
>
>
> On Thu, Aug 16, 2018 at
I am not sure if it helps much. External checkpoints like Kafka
'autocommit' outside Beam's own checkpoint domain will always have quirks
like this. I wanted to ask what their use case for using autocommit was
(over commitOffsetsInFinalize())
If we wanted to, it is not clear to how an unbounded so
Hi Robin,
this is unfortunate news, but I already anticipated such answer with an
alternative implementation.
It would be however interesting to support such feature since I am probably
not the first person asking for this.
Best regards,
Pascal
On Thu, Aug 16, 2018 at 6:20 PM, Robin Qiu wrote:
>
Hello everyone.
Thanks for your answers. We've managed to solve our problem using the
solutions proposed here.
First, there was no need for using autocommit. But switching to
"commitOffsetsInFinalize()" also didn't help. In our example, if a failure
occured when processing message #25, the runner
Hello all -
I am trying to run a barebone beam pipeline to understand the "combine"
logic. I am from python world trying to learn java beam sdk due to my use
case of ETL with spark cluster. So, pardon me for my grotesque java code :)
I appreciate if you could nudge me in the right path with this
On Thu, Aug 16, 2018 at 10:39 AM André Missaglia <
andre.missag...@arquivei.com.br> wrote:
> Hello everyone.
>
> Thanks for your answers. We've managed to solve our problem using the
> solutions proposed here.
>
> First, there was no need for using autocommit. But switching to
> "commitOffsetsInFi
Hello Mahesh,
You can add "implements Serializable" to the Accum class, then it should
work.
By the way, in Java String is immutable, so in order to change, for
example, accum.line, you need to write accum.line = accum.line.concat(line).
Best,
Robin
On Thu, Aug 16, 2018 at 10:42 AM Mahesh Vanga
You can launch another Dataflow job from within an existing Dataflow job.
For all intensive purposes, Dataflow won't know that the jobs are related
in any way so they will only be "nested" because your outer pipeline knows
about the inner pipeline.
You should be able to do this for all runners (gr
Hello Robin -
Thank you so much for your help.
I added Serializable to Accum, and I got the following error. (Sorry, for
being a pain. I hope once I get past the initial hump ...)
Aug 16, 2018 3:00:09 PM org.apache.beam.sdk.io.FileBasedSource
getEstimatedSizeBytes
INFO: Filepattern test_in.csv m
Hi Mahesh,
I think I had the same NPE when I explored self defined combineFn. I think
your combineFn might still need to define a coder to help Beam run it in
distributed environment. Beam tries to invoke coder somewhere and then
throw a NPE as there is no one defined.
Here is a PR I wrote that d
Sorry I forgot to attach the PR link:
https://github.com/apache/beam/pull/6154/files#diff-7358f3f0511940ea565e6584f652ed02R342
-Rui
On Thu, Aug 16, 2018 at 12:13 PM Rui Wang wrote:
> Hi Mahesh,
>
> I think I had the same NPE when I explored self defined combineFn. I think
> your combineFn might
I just watched the excellent presentation by Markku Leppisto in Singapore.
I consult for a financial broker in London.
Our use case is streaming financial data on which various analytics are
performed to find relative value trading opportunities in the fixed income
markets.
I have two questions a
The only way reshuffle could help is if you are able to separate processing
that is prone to errors from processing that produces side effects i.e.
IO --> DoFn_B_prone_to_exceptions --> Reshuffle --> (B)
DoFn_B_producing_side_effects
This way, it might look like (B) processes each record exactly o
Raghu, yes, I was thinking that advance would continuously return false.
This would force a runner to checkpoint and then resume at which point we
would have a stable reading point.
On Thu, Aug 16, 2018 at 3:17 PM Raghu Angadi wrote:
> The only way reshuffle could help is if you are able to sepa
On Thu, Aug 16, 2018 at 4:54 PM Lukasz Cwik wrote:
> Raghu, yes, I was thinking that advance would continuously return false.
> This would force a runner to checkpoint and then resume at which point we
> would have a stable reading point.
>
When the actual checkpoint occurs is still transparent
18 matches
Mail list logo