On Mon, May 10, 2021 at 2:09 PM Boyuan Zhang <boyu...@google.com> wrote:
> Hi Evan, > > What do you mean startup delay? Is it the time that from you start the > pipeline to the time that you notice the first output record from PubSub? > Yes that's what I meant, the seemingly idle system waiting for pubsub output despite data being in the subscription at pipeline start time. On Sat, May 8, 2021 at 12:50 AM Ismaël Mejía <ieme...@gmail.com> wrote: > >> Can you try running direct runner with the option >> `--experiments=use_deprecated_read` >> > This seems to work for me, thanks for this! 👍 >> Seems like an instance of >> https://issues.apache.org/jira/browse/BEAM-10670?focusedCommentId=17316858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17316858 >> also reported in >> https://lists.apache.org/thread.html/re6b0941a8b4951293a0327ce9b25e607cafd6e45b69783f65290edee%40%3Cdev.beam.apache.org%3E >> >> We should rollback using the SDF wrapper by default because of the >> usability and performance issues reported. >> >> >> On Sat, May 8, 2021 at 12:57 AM Evan Galpin <evan.gal...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I’m experiencing very slow performance and startup delay when testing a >>> pipeline locally. I’m reading data from a Google PubSub subscription as the >>> data source, and before each pipeline execution I ensure that data is >>> present in the subscription (readable from GCP console). >>> >>> I’m seeing startup delay on the order of minutes with DirectRunner (5-10 >>> min). Is that expected? I did find a Jira ticket[1] that at first seemed >>> related, but I think it has more to do with BQ than DirectRunner. >>> >>> I’ve run the pipeline with a debugger connected and confirmed that it’s >>> minutes before the first DoFn in my pipeline receives any data. Is there a >>> way I can profile the direct runner to see what it’s churning on? >>> >>> Thanks, >>> Evan >>> >>> [1] >>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/BEAM-4548 >>> >>