Technically speaking, it is possible in stock distribution (can't speak for Databricks) and not super hard to do (just check out how we initialize sessions), but definitely not something that we test or support, especially in a scenario you described.

If you want to achieve concurrent execution, multithreading is normally more than sufficient and avoids problems with the context.



On 12/13/22 00:40, Kevin Su wrote:
I ran my spark job by using databricks job with a single python script.
IIUC, the databricks platform will create a spark context for this python script. However, I create a new subprocess in this script and run some spark code in this subprocess, but this subprocess can't find the context created by databricks.
Not sure if there is any api I can use to get the default context.

bo yang <bobyan...@gmail.com <mailto:bobyan...@gmail.com>> 於 2022年12月 12日 週一 下午3:27寫道:

    In theory, maybe a Jupyter notebook or something similar could
    achieve this? e.g. running some Jypyter kernel inside Spark driver,
    then another Python process could connect to that kernel.

    But in the end, this is like Spark Connect :)


    On Mon, Dec 12, 2022 at 2:55 PM Kevin Su <pings...@gmail.com
    <mailto:pings...@gmail.com>> wrote:

        Also, is there any way to workaround this issue without
        using Spark connect?

        Kevin Su <pings...@gmail.com <mailto:pings...@gmail.com>> 於
        2022年12月12日 週一 下午2:52寫道:

            nvm, I found the ticket.
            Also, is there any way to workaround this issue without
            using Spark connect?

            Kevin Su <pings...@gmail.com <mailto:pings...@gmail.com>> 於
            2022年12月12日 週一 下午2:42寫道:

                Thanks for the quick response? Do we have any PR or Jira
                ticket for it?

                Reynold Xin <r...@databricks.com
                <mailto:r...@databricks.com>> 於 2022年12月12日 週一 下
                午2:39寫道:

                    Spark Connect :)

                    (It’s work in progress)


                    On Mon, Dec 12 2022 at 2:29 PM, Kevin Su
                    <pings...@gmail.com <mailto:pings...@gmail.com>> wrote:

                        Hey there, How can I get the same spark context
                        in two different python processes?
                        Let’s say I create a context in Process A, and
                        then I want to use python subprocess B to get
                        the spark context created by Process A. How can
                        I achieve that?

                        I've
                        tried 
pyspark.sql.SparkSession.builder.appName("spark").getOrCreate(), but it will 
create a new spark context.


--
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to