[ https://issues.apache.org/jira/browse/BEAM-14449?focusedWorklogId=775259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775259 ]
ASF GitHub Bot logged work on BEAM-14449: ----------------------------------------- Author: ASF GitHub Bot Created on: 26/May/22 22:10 Start Date: 26/May/22 22:10 Worklog Time Spent: 10m Work Description: rohdesamuel commented on code in PR #17736: URL: https://github.com/apache/beam/pull/17736#discussion_r883119026 ########## sdks/python/apache_beam/runners/interactive/interactive_beam.py: ########## @@ -418,15 +424,19 @@ def create( raise ValueError( 'Unknown cluster identifier: %s. Cannot create or reuse' 'a Dataproc cluster.') - elif cluster_metadata.region == 'global': - # The global region is unsupported as it will be eventually deprecated. - raise ValueError('Clusters in the global region are not supported.') - elif not cluster_metadata.region: + if not cluster_metadata.region: _LOGGER.info( 'No region information was detected, defaulting Dataproc cluster ' 'region to: us-central1.') cluster_metadata.region = 'us-central1' + elif cluster_metadata.region == 'global': + # The global region is unsupported as it will be eventually deprecated. + raise ValueError('Clusters in the global region are not supported.') # else use the provided region. + if cluster_metadata.num_workers and cluster_metadata.num_workers < 2: Review Comment: Why are two workers required? Also can you please make the magic number a constant? ########## sdks/python/apache_beam/runners/interactive/dataproc/types.py: ########## @@ -50,5 +79,8 @@ def __eq__(self, other): return self.__key() == other.__key() return False + def rename(self): Review Comment: Somewhat of an awkward name for something that sets the cluster name to the default. Maybe "reset_name" or something similar would be better? ########## sdks/python/apache_beam/runners/interactive/interactive_beam.py: ########## @@ -475,7 +485,9 @@ def cleanup( 'options is deprecated since First stable release. References to ' '<pipeline>.options will not be supported', category=DeprecationWarning) - p.options.view_as(FlinkRunnerOptions).flink_master = '[auto]' + p_flink_options = p.options.view_as(FlinkRunnerOptions) + p_flink_options.flink_master = '[auto]' + p_flink_options.flink_version = None Review Comment: Why do we reset the flink version here? Issue Time Tracking ------------------- Worklog Id: (was: 775259) Time Spent: 1h (was: 50m) > Support cluster provisioning when using Flink on Dataproc > --------------------------------------------------------- > > Key: BEAM-14449 > URL: https://issues.apache.org/jira/browse/BEAM-14449 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive > Reporter: Ning > Assignee: Ning > Priority: P2 > Attachments: image-2022-05-16-11-25-32-904.png, > image-2022-05-16-11-28-12-702.png > > Time Spent: 1h > Remaining Estimate: 0h > > Provide the capability for the user to explicitly provision a cluster. > Current implementation provisions each cluster at the location specified by > GoogleCloudOptions using 3 worker nodes. There is no explicit API to > configure the number or shape of workers. > We could use the WorkerOptions to allow customers to explicitly provision a > cluster and expose an explicit API (with UX in notebook extension) for > customers to change the size of a cluster connected with their notebook > (until we have an auto scaling solution with Dataproc for Flink). > The API looks like this when configuring the workers for a dataproc cluster > when creating it: > !image-2022-05-16-11-25-32-904.png! > An example request setting the masterConfig and workerConfig: > !image-2022-05-16-11-28-12-702.png! -- This message was sent by Atlassian Jira (v8.20.7#820007)