Re: [DISCUSS] Support spark.ml on Spark Connect

2024-10-14 Thread Bobby
Thank you for your kind response. I will prepare a formal PR for Spark. Niranjan Jayakar 于2024年10月11日周五 22:45写道: > +1 > > On Thu, Oct 10, 2024 at 5:28 PM Xiao Li wrote: > >> Thank you for working on this! >> >> Xiao >> >> Martin Grund 于

Re: [DISCUSS] Support spark.ml on Spark Connect

2024-11-13 Thread Bobby
ial to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Tue, 15 Oct 2024 at 08:24, Bobby

Re: [DISCUSS] Support spark.ml on Spark Connect

2025-01-20 Thread Bobby
50812 <https://issues.apache.org/jira/browse/SPARK-50812>. Thank you again for your feedback and for merging these PRs! Best Regards, Bobby Wang Mich Talebzadeh 于2024年11月13日周三 21:17写道: > OK I added a comment to PR > > HTH, > > Mich Talebzadeh, > > Architect | Data En

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-15 Thread Bobby
I also have one: https://github.com/apache/spark/pull/49503 I would like to support plugin for connect ML in 4.0 Thx Ruifeng Zheng 于2025年1月16日周四 09:10写道: > This one: SPARK-50812 > We want to support more ML algorithms on Connect in 4.0. > But

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-10 Thread Bobby
ted in expressions "unresolvedstarwithcolumns(explode(array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)))". SQLSTATE: 42K0E It's a regression. The same code could pass on spark 3.5.5 connect, but failed on spark 4.0.0-rc2 and the latest 4.1.0 Bobby 于2025年3月10日周一 16:30写道: > I ran into an exce

Re: [VOTE] Release Spark 4.0.0 (RC2)

2025-03-11 Thread Bobby
I ran into an exception issue when playing around spark connect, more details can be found at https://issues.apache.org/jira/browse/SPARK-51451 pyspark.errors.exceptions.connect.AnalysisException: [UNSUPPORTED_GENERATOR.NESTED_IN_EXPRESSIONS] The generator is not supported: nested in expressions "

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-22 Thread Bobby
han spending the > weekend binge-watching on Netflix 😅🤣). > > @Bobby , thanks a lot, not only for reporting the > issue but also for providing a time-saving project for testing. > > El sáb, 22 mar 2025 a las 1:11, Bobby () escribió: > >> I think https://issues.apache.org

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-21 Thread Bobby
I think https://issues.apache.org/jira/browse/SPARK-51537 should be fixed before 4.0.0 release, or else, the executor side will not use global jars added by `--jars` in the connect mode, which could result in deserialization exceptions.

[DISCUSS] Spark Columnar Processing

2019-03-25 Thread Bobby Evans
, any feedback is welcome, and we will file a SPIP on it once we feel like the major changes we are proposing are acceptable. Thanks, Bobby Evans

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Bobby Evans
odegen, and convert rows to > columnar batches when communicating with external systems. > > On Mon, Mar 25, 2019 at 1:05 PM Bobby Evans wrote: > >> This thread is to discuss adding in support for data frame processing >> using an in-memory columnar format compatible with Apa

Re: [DISCUSS] Spark Columnar Processing

2019-03-26 Thread Bobby Evans
Please let me know if I missed any of your concerns, or if I misunderstood any of them. Thanks, Bobby On Tue, Mar 26, 2019 at 12:21 PM Reynold Xin wrote: > 26% improvement is underwhelming if it requires massive refactoring of the > codebase. Also you can't just add the bene

Re: [DISCUSS] Spark Columnar Processing

2019-03-27 Thread Bobby Evans
a benefit to more than just GPU accelerated queries. Thanks, Bobby On Tue, Mar 26, 2019 at 11:59 PM Kazuaki Ishizaki wrote: > Looks interesting discussion. > Let me describe the current structure and remaining issues. This is > orthogonal to cost-benefit trade-off discussion. > &

Re: [DISCUSS] Spark Columnar Processing

2019-04-02 Thread Bobby Evans
k and discussion. Thanks again, Bobby On Mon, Apr 1, 2019 at 5:09 PM Reynold Xin wrote: > I just realized I didn't make it very clear my stance here ... here's > another try: > > I think it's a no brainer to have a good columnar UDF interface. This > would f

Re: [DISCUSS] Spark Columnar Processing

2019-04-03 Thread Bobby Evans
a design document really fits in here because from http://spark.apache.org/improvement-proposals.html and http://spark.apache.org/contributing.html it does not mention a design anywhere. I am happy to put one up, but I was hoping the API concept would cover most of that. Thanks, Bobby On Tue

Re: [DISCUSS] Spark Columnar Processing

2019-04-05 Thread Bobby Evans
I just filed SPARK-27396 as the SPIP for this proposal. Please use that JIRA for further discussions. Thanks for all of the feedback, Bobby On Wed, Apr 3, 2019 at 7:15 PM Bobby Evans wrote: > I am still working on the SPIP and should get it up in the next few days. > I have the basi

Re: [DISCUSS] Spark Columnar Processing

2019-04-11 Thread Bobby Evans
this. Thanks, Bobby On Fri, Apr 5, 2019 at 2:24 PM Bobby Evans wrote: > I just filed SPARK-27396 as the SPIP for this proposal. Please use that > JIRA for further discussions. > > Thanks for all of the feedback, > > Bobby > > On Wed, Apr 3, 2019 at 7:15 PM Bobby Evans w

Re: [DISCUSS] Spark Columnar Processing

2019-04-13 Thread Bobby Evans
ColumnBatch API, but also provide utilities to directly convert > from/to Arrow. > > > On Thu, Apr 11, 2019 at 7:13 AM, Bobby Evans wrote: > >> The SPIP has been up for almost 6 days now with really no discussion on >> it. I am hopeful that means it's okay an

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Bobby Evans
I am +1, I better be because I am proposing the SPIP. Thanks, Bobby On Tue, Apr 16, 2019 at 10:38 AM Tom Graves wrote: > Hi everyone, > > I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for > extended Columnar Processing Support. The proposal is to extend

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bobby Evans
I think you misunderstood the point of this SPIP. I responded to your comments in the SPIP JIRA. On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng wrote: > I posted my comment in the JIRA >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bobby Evans
s, we’d still use our own classes to > manipulate the data internally, and end users could use the Arrow library > if they want it). > > Matei > > > On Apr 20, 2019, at 8:38 AM, Bobby Evans wrote: > > > > I think you misunderstood the point of this SPIP. I responded

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
n the future and some libraries using >> this feature are begin to use the new Arrow code. >> >> Matei >> >> > On Apr 20, 2019, at 1:39 PM, Bobby Evans wrote: >> > >> > I want to be clear that this SPIP is not proposing exposing Arrow >> API

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
to clarify exactly what I am proposing, and then restart the vote after we have gotten more agreement on what APIs should be exposed. Thanks, Bobby On Mon, Apr 22, 2019 at 10:49 AM Xiangrui Meng wrote: > Per Robert's comment on the JIRA, ETL is the main use case for the SPIP. I >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
t on what APIs should be exposed" > > That'd be very useful. At least I was confused by what the SPIP was about. > No point voting on something when there is still a lot of confusion about > what it is. > > > On Mon, Apr 22, 2019 at 10:58 AM, Bobby Evans wrote: > >

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-30 Thread Bobby Evans
work with the Arrow community to get some form of guarantees about the stability of the standard. That should hopefully unblock stable APIs so end users can write columnar UDFs in scala/java and ideally get efficient Arrow based batch data transfers to external tools as well. Thanks, Bobby On Tue

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-07 Thread Bobby Evans
I am +! On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > Hi everyone, > > I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs > for extended Columnar Processing Support. The proposal is to extend > the support to allow for more columnar processing. We had previous > vote

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-15 Thread Bobby Evans
It would allow for the columnar processing to be extended through the shuffle. So if I were doing say an FPGA accelerated extension it could replace the ShuffleExechangeExec with one that can take a ColumnarBatch as input instead of a Row. The extended version of the ShuffleExchangeExec could then

Re: [RESULT][VOTE] SPIP: Public APIs for extended Columnar Processing Support

2019-05-30 Thread Bobby Evans
Let me put up an initial patch probably around the beginning of next week and we can talk about the maintenance involved with it there when you have something more concrete to look at. Thanks, Bobby On Wed, May 29, 2019 at 5:04 PM Reynold Xin wrote: > Thanks Tom. > > I finally ha

Re: DSV2 API Question

2019-06-27 Thread Bobby Evans
expanded so you can do some of the things you are requesting. I hope this helps, Bobby On Tue, Jun 25, 2019 at 4:24 PM Andrew Melo wrote: > Hello, > > I've (nearly) implemented a DSV2-reader interface to read particle physics > data stored in the ROOT (https://root.cern.ch/) fil

Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Bobby Evans
, Bobby On Wed, Sep 4, 2019 at 9:24 AM Thomas graves wrote: > Hey everyone, > > I'd like to call for a vote on SPARK-27495 SPIP: Support Stage level > resource configuration and scheduling > > This is for supporting stage level resource configuration and > scheduling. The

Re: Contract for PartitionReader/InputPartition for ColumnarBatch?

2020-06-29 Thread Bobby Evans
e for closing the incoming batch, but our batch sizes are a lot larger so GC pressure is less of an issue. The only thing for us is that we have to manage the transition between the spark columnar model and our plugin's internal columnar model. Not a big deal though. Thanks, Bobby On Sat, Ju

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819 . However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hi Patrick, Thanks for clarifying. No issues with functionality. +1 (non-binding) Thanks Bobby On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell wrote: > Hey Bobby, > > Those are generic warnings that the hadoop libraries throw. If you are > using MapRFS they shou

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Bobby Chowdary
0 JDK8 make-distribution.sh --tgz -Pmapr4 -Phive -Pnetlib-lgpl -Phive-thriftserver didn’t have this issue in RC3 and tried it on scala as well. Thanks Bobby ​

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-05 Thread Bobby Chowdary
Thanks Yin ! every thing else works great! +1 (non-binding) On Fri, Jun 5, 2015 at 2:11 PM, Yin Huai wrote: > Hi Bobby, > > sqlContext.table("test.test1") is not officially supported in 1.3. For > now, please use the "use database" as a workaround. We will ad

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-30 Thread Bobby Chowdary
+1 Tested on CentOS 7 On Jun 30, 2015 19:38, "Joseph Bradley" wrote: > +1 > > On Tue, Jun 30, 2015 at 5:27 PM, Reynold Xin wrote: > >> +1 >> >> On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 1.4.1!