[DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread Hyukjin Kwon
Hi all,

I would like to discuss moving Spark Connect server to builtin package.
Right now, users have to specify —packages when they run Spark Connect
server script, for example:

./sbin/start-connect-server.sh --jars `ls
connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`

or

./sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.5.1

which is a little bit odd that sbin scripts should provide jars to start.

Moving it to builtin package is pretty straightforward because most of jars
are shaded, and the impact would be minimal, I have a prototype here
apache/spark/#47157 . This also
simplifies Python local running logic a lot.

User facing API layer, Spark Connect Client, stays external but I would
like the internal/admin server layer, Spark Connect Server, implementation
to be built in Spark.

Please let me know if you have thoughts on this!


Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread yangjie01
I'm supportive of this initiative. However, if the purpose is just to avoid the 
additional `--packages` option, it seems that making some adjustments to the 
`assembly/pom.xml` could potentially meet our goal. Is it really necessary to 
restructure the code directory?

Jie Yang

发件人: Hyukjin Kwon 
日期: 2024年7月2日 星期二 08:19
收件人: dev 
主题: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API 
layer stays external)


Hi all,

I would like to discuss moving Spark Connect server to builtin package. Right 
now, users have to specify —packages when they run Spark Connect server script, 
for example:

./sbin/start-connect-server.sh --jars `ls 
connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`

or

./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.1

which is a little bit odd that sbin scripts should provide jars to start.

Moving it to builtin package is pretty straightforward because most of jars are 
shaded, and the impact would be minimal, I have a prototype here 
apache/spark/#47157.
 This also simplifies Python local running logic a lot.

User facing API layer, Spark Connect Client, stays external but I would like 
the internal/admin server layer, Spark Connect Server, implementation to be 
built in Spark.

Please let me know if you have thoughts on this!



Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread Hyukjin Kwon
My concern is that the `connector` directory is really for
external/optional packages (and they aren't included in assembly IIRC).. so
I am hesitant to just change the assembly.
The actual changes are not quite large but it moves the files around.


On Tue, 2 Jul 2024 at 12:23, yangjie01  wrote:

> I'm supportive of this initiative. However, if the purpose is just to
> avoid the additional `--packages` option, it seems that making some
> adjustments to the `assembly/pom.xml` could potentially meet our goal. Is
> it really necessary to restructure the code directory?
>
>
>
> Jie Yang
>
>
>
> *发件人**: *Hyukjin Kwon 
> *日期**: *2024年7月2日 星期二 08:19
> *收件人**: *dev 
> *主题**: *[外部邮件] [DISCUSS] Move Spark Connect server to builtin package
> (Client API layer stays external)
>
>
>
> Hi all,
>
> I would like to discuss moving Spark Connect server to builtin package.
> Right now, users have to specify —packages when they run Spark Connect
> server script, for example:
>
> ./sbin/start-connect-server.sh --jars `ls 
> connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`
>
> or
>
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.5.1
>
> which is a little bit odd that sbin scripts should provide jars to start.
>
> Moving it to builtin package is pretty straightforward because most of
> jars are shaded, and the impact would be minimal, I have a prototype here
> apache/spark/#47157
> .
> This also simplifies Python local running logic a lot.
>
> User facing API layer, Spark Connect Client, stays external but I would
> like the internal/admin server layer, Spark Connect Server, implementation
> to be built in Spark.
>
> Please let me know if you have thoughts on this!
>
>
>


Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-01 Thread yangjie01
I have manually attempted to only modify the `assembly/pom.xml` and examined 
the results of executing `dev/make-distribution.sh --tgz`. The 
`spark-connect_2.13-4.0.0-SNAPSHOT.jar` is indeed included in the jars 
directory. However, if rearranging the directories would result in a clearer 
project structure, I believe that would also be a viable approach.

发件人: Hyukjin Kwon 
日期: 2024年7月2日 星期二 12:00
收件人: yangjie01 
抄送: dev 
主题: Re: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client 
API layer stays external)

My concern is that the `connector` directory is really for external/optional 
packages (and they aren't included in assembly IIRC).. so I am hesitant to just 
change the assembly.
The actual changes are not quite large but it moves the files around.

On Tue, 2 Jul 2024 at 12:23, yangjie01  wrote:
I'm supportive of this initiative. However, if the purpose is just to avoid the 
additional `--packages` option, it seems that making some adjustments to the 
`assembly/pom.xml` could potentially meet our goal. Is it really necessary to 
restructure the code directory?

Jie Yang

发件人: Hyukjin Kwon mailto:gurwls...@apache.org>>
日期: 2024年7月2日 星期二 08:19
收件人: dev mailto:dev@spark.apache.org>>
主题: [外部邮件] [DISCUSS] Move Spark Connect server to builtin package (Client API 
layer stays external)


Hi all,

I would like to discuss moving Spark Connect server to builtin package. Right 
now, users have to specify —packages when they run Spark Connect server script, 
for example:

./sbin/start-connect-server.sh --jars `ls 
connector/connect/server/target/**/spark-connect*SNAPSHOT.jar`

or

./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.1

which is a little bit odd that sbin scripts should provide jars to start.

Moving it to builtin package is pretty straightforward because most of jars are 
shaded, and the impact would be minimal, I have a prototype here 
apache/spark/#47157.
 This also simplifies Python local running logic a lot.

User facing API layer, Spark Connect Client, stays external but I would like 
the internal/admin server layer, Spark Connect Server, implementation to be 
built in Spark.

Please let me know if you have thoughts on this!