Re:

Jean-Baptiste Onofré Fri, 01 Dec 2017 01:03:12 -0800

Hi,

yes, we had a similar question some days ago.


We can imagine to have a user callback fn fired when the sink batch is complete.

Let me think about that.

Regards
JB

On 12/01/2017 09:04 AM, Philip Chan wrote:

Hey JB,

Thanks for getting back so quickly.

I suppose in that case I would need a way of monitoring when the ES transformcompletes successfully before I can proceed with doing the swap.The problem with this is that I can't think of a good way to determine thattermination state short of polling the new index to check the document countcompared to the size of input PCollection.That, or maybe I'd need to use an external system like you mentioned to poll onthe state of the pipeline (I'm using Google Dataflow, so maybe there's a way todo this with some API).But I would have thought that there would be an easy way of simply saying "donot process this transform until this other transform completes".Is there no established way of "signaling" between pipelines when some pipelinecompletes, or have some way of declaring a dependency of 1 transform on anothertransform?


Thanks again,
Philip

On Thu, Nov 30, 2017 at 11:44 PM, Jean-Baptiste Onofré <[email protected]<mailto:[email protected]>> wrote:

Hi Philip,

You won't be able to do (3) in the same pipeline as the Elasticsearch Sink
PTransform ends the pipeline with PDone.

So, (3) has to be done in another pipeline (using a DoFn) or in another
"system" (like Camel for instance). I would do a check of the data in the
index and then trigger the swap there.

Regards
JB

On 12/01/2017 08:41 AM, Philip Chan wrote:

Hi,

I'm pretty new to Beam, and I've been trying to use the ElasticSearchIO
sink to write docs into ES.
With this, I want to be able to
1. ingest and transform rows from DB (done)
2. write JSON docs/strings into a new ES index (done)
3. After (2) is complete and all documents are written into a new index,
trigger an atomic index swap under an alias to replace the current
aliased index with the new index generated in step 2. This is basically
a single POST request to the ES cluster.

The problem I'm facing is that I don't seem to be able to find a way to
have a way for (3) to happen after step (2) is complete.

The ElasticSearchIO.Write transform returns a PDone, and I'm not sure
how to proceed from there because it doesn't seem to let me do another
apply on it to "define" a dependency.

https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.Write.html

<https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.Write.html>

<https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.Write.html

<https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.Write.html>>

Is there a recommended way to construct pipelines workflows like this?

Thanks in advance,
Philip

--Jean-Baptiste Onofré

    [email protected] <mailto:[email protected]>
    http://blog.nanthrax.net
    Talend - http://www.talend.com


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re:

Reply via email to