[
https://issues.apache.org/jira/browse/BEAM-14332?focusedWorklogId=765729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-765729
]
ASF GitHub Bot logged work on BEAM-14332:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/May/22 22:30
Start Date: 03/May/22 22:30
Worklog Time Spent: 10m
Work Description: rohdesamuel commented on code in PR #17402:
URL: https://github.com/apache/beam/pull/17402#discussion_r864307237
##########
sdks/python/apache_beam/runners/interactive/testing/mock_env.py:
##########
@@ -0,0 +1,90 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module of mocks to isolated the test environment for each Interactive Beam
+test.
+"""
+
+import unittest
+import uuid
+from unittest.mock import patch
+
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.dataproc.dataproc_cluster_manager import
DataprocClusterManager
+from apache_beam.runners.interactive.interactive_environment import
InteractiveEnvironment
+from apache_beam.runners.interactive.testing.mock_ipython import
mock_get_ipython
+
+
+def isolated_env(cls: unittest.TestCase):
Review Comment:
Maybe instead of a decorator, have our tests subclass this
Issue Time Tracking
-------------------
Worklog Id: (was: 765729)
Time Spent: 50m (was: 40m)
> Improve the workflow of cluster management for Flink on Dataproc
> ----------------------------------------------------------------
>
> Key: BEAM-14332
> URL: https://issues.apache.org/jira/browse/BEAM-14332
> Project: Beam
> Issue Type: Improvement
> Components: runner-py-interactive
> Reporter: Ning
> Assignee: Ning
> Priority: P2
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Improve the workflow of cluster management.
> There is an option to configure a default [cluster
> name|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_beam.py#L366].
> The existing user flows are:
> # Use the default cluster name to create a new cluster if none is in use;
> # Reuse a created cluster that has the default cluster name;
> # If the default cluster name is configured to a new value, re-apply 1 and 2.
> A better solution is to
> # Create a new cluster implicitly if there is none or explicitly if the user
> wants one with specific provisioning;
> # Always default to using the last created cluster.
> The reasons are:
> * Cluster name is meaningless to the user when a cluster is just a medium to
> run OSS runners (as applications) such as Flink or Spark. The cluster could
> also be running anywhere (on GCP) such as Dataproc, k8s, or even Dataflow
> itself.
> * Clusters should be uniquely identified, thus should always have a distinct
> name. Clusters are managed (created/reused/deleted) behind the scenes by the
> notebook runtime when the user doesn’t explicitly do so (the capability to
> explicitly manage clusters is still available). Reusing the same default
> cluster name is risky when a cluster is deleted by one notebook runtime while
> another cluster with the same name is created by a different notebook
> runtime.
> * Provide the capability for the user to explicitly provision a cluster.
> Current implementation provisions each cluster at the location specified by
> GoogleCloudOptions using 3 worker nodes. There is no explicit API to
> configure the number or shape of workers.
> We could use the WorkerOptions to allow customers to explicitly provision a
> cluster and expose an explicit API (with UX in notebook extension) for
> customers to change the size of a cluster connected with their notebook
> (until we have an auto scaling solution with Dataproc for Flink).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)