[ https://issues.apache.org/jira/browse/BEAM-14449?focusedWorklogId=775265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775265 ]
ASF GitHub Bot logged work on BEAM-14449: ----------------------------------------- Author: ASF GitHub Bot Created on: 26/May/22 22:19 Start Date: 26/May/22 22:19 Worklog Time Spent: 10m Work Description: KevinGG commented on code in PR #17736: URL: https://github.com/apache/beam/pull/17736#discussion_r883124064 ########## sdks/python/apache_beam/runners/interactive/interactive_beam.py: ########## @@ -418,15 +424,19 @@ def create( raise ValueError( 'Unknown cluster identifier: %s. Cannot create or reuse' 'a Dataproc cluster.') - elif cluster_metadata.region == 'global': - # The global region is unsupported as it will be eventually deprecated. - raise ValueError('Clusters in the global region are not supported.') - elif not cluster_metadata.region: + if not cluster_metadata.region: _LOGGER.info( 'No region information was detected, defaulting Dataproc cluster ' 'region to: us-central1.') cluster_metadata.region = 'us-central1' + elif cluster_metadata.region == 'global': + # The global region is unsupported as it will be eventually deprecated. + raise ValueError('Clusters in the global region are not supported.') # else use the provided region. + if cluster_metadata.num_workers and cluster_metadata.num_workers < 2: Review Comment: The magic number is required by Dataproc but no documented. To avoid failing the cluster creation with such an error, we do this check and override early for them. I'll move it to a constant with comments. Issue Time Tracking ------------------- Worklog Id: (was: 775265) Time Spent: 1.5h (was: 1h 20m) > Support cluster provisioning when using Flink on Dataproc > --------------------------------------------------------- > > Key: BEAM-14449 > URL: https://issues.apache.org/jira/browse/BEAM-14449 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive > Reporter: Ning > Assignee: Ning > Priority: P2 > Attachments: image-2022-05-16-11-25-32-904.png, > image-2022-05-16-11-28-12-702.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > Provide the capability for the user to explicitly provision a cluster. > Current implementation provisions each cluster at the location specified by > GoogleCloudOptions using 3 worker nodes. There is no explicit API to > configure the number or shape of workers. > We could use the WorkerOptions to allow customers to explicitly provision a > cluster and expose an explicit API (with UX in notebook extension) for > customers to change the size of a cluster connected with their notebook > (until we have an auto scaling solution with Dataproc for Flink). > The API looks like this when configuring the workers for a dataproc cluster > when creating it: > !image-2022-05-16-11-25-32-904.png! > An example request setting the masterConfig and workerConfig: > !image-2022-05-16-11-28-12-702.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)