[ https://issues.apache.org/jira/browse/SPARK-51206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-51206: ------------------------------------ Assignee: Haoyu Weng > Python Data Sources incorrectly imports from Spark Connect > ---------------------------------------------------------- > > Key: SPARK-51206 > URL: https://issues.apache.org/jira/browse/SPARK-51206 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 4.0.0, 4.1.0, 4.1 > Reporter: Haoyu Weng > Assignee: Haoyu Weng > Priority: Major > Labels: pull-request-available > > Python Data Source workers plan_data_source_read and write_into_data_source > import some arrow/row conversion helpers from pyspark.sql.connect.conversion. > The imported helper actually doesn't need any Spark Connect dependency. > However pyspark.sql.connect.conversion checks for Spark Connect dependencies. > This causes failure when using Python Data Source if any of pandas, grpcio, > etc. is missing. > {code:python} > .venv/lib/python3.9/site-packages/pyspark/sql/worker/plan_data_source_read.py:34: > in <module> from pyspark.sql.connect.conversion import > ArrowTableToRowsConversion, LocalDataToArrowConversion > .venv/lib/python3.9/site-packages/pyspark/sql/connect/conversion.py:20: in > <module> check_dependencies(__name__) > .venv/lib/python3.9/site-packages/pyspark/sql/connect/utils.py:37: in > check_dependencies require_minimum_grpc_version() > .venv/lib/python3.9/site-packages/pyspark/sql/connect/utils.py:49: in > require_minimum_grpc_version raise PySparkImportError( E > pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] > grpcio >= 1.48.1 must be installed; however, it was not found.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org