Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Jia Yu
Thanks for putting this together. +0 (non-binding) from my side. Happy to see geospatial data is getting attention but we need to make it right. Jia Yu On Mon, May 5, 2025 at 12:15 PM Szehon Ho wrote: > +1 (non binding) > > Thanks > Szehon > > On Mon, May 5, 2025 at 11:17

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Jia Yu
reinvention. Thanks, JIa On Sat, Mar 29, 2025 at 11:02 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > > * 1. Domain types evolve quickly.* > It has taken years for Parquet to include these new types in its format... > We could evolve alongside Parq

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
/sedona_sql/expressions Happy to chat further and hear your thoughts. Again, Sedona is an Apache project — Spark is welcome to depend on Sedona and re-use any of our work if helpful. Thanks, Jia On Sat, Mar 29, 2025 at 2:41 PM Reynold Xin wrote: > > While I don’t think Spark should become a

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
building geospatial support around Spark — not inside it — and we’d love to continue collaborating in this spirit. Happy to work together on providing Geo support in Parquet! Best, Jia References [1] GeoParquet project: https://github.com/opengeospatial/geoparquet [2] Sedona’s GeoParquet DataSource

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Jia Yu
Dear Menelaos, Thanks for bringing this up again. I’ve seen similar proposals come up on the mailing list before, and I’d like to offer some thoughts. For full transparency, I’m Jia Yu, PMC Chair of Apache Sedona (https://github.com/apache/sedona), a widely used open-source cluster computing

[ANNOUNCE] Apache Sedona 1.7.1 released

2025-03-16 Thread Jia Yu
Dear all, We are happy to report that we have released Apache Sedona 1.7.1. Thank you again for your help. Apache Sedona is a cluster computing system for processing large-scale spatial data on top of Apache Spark, Flink and Snowflake. Vote thread (Permalink from https://lists.apache.org/list.ht

[ANNOUNCE] Apache Sedona 1.7.0 released

2024-12-03 Thread Jia Yu
Dear all, We are happy to report that we have released Apache Sedona 1.7.0. Thank you again for your help. Apache Sedona is a cluster computing system for processing large-scale spatial data. Vote thread (Permalink from https://lists.apache.org/list.html): https://lists.apache.org/thread/5hvcr80

SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia
lt;https://issues.apache.org/jira/browse/SPARK-47773> SPIP Doc <https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing> Your feedback and comments are welcome and appreciated. Thanks. Thanks, Jia Ke

Re: [VOTE] Release Spark 3.4.2 (RC1)

2023-11-30 Thread Jia Fan
+1 L. C. Hsieh 于2023年11月30日周四 12:33写道: > +1 > > Thanks Dongjoon! > > On Wed, Nov 29, 2023 at 7:53 PM Mridul Muralidharan > wrote: > > > > +1 > > > > Signatures, digests, etc check out fine. > > Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes > > > > Regards, > > Mridul

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-24 Thread Jia Fan
+1 L. C. Hsieh 于2023年10月24日周二 13:23写道: > +1 > > On Mon, Oct 23, 2023 at 6:31 PM Anish Shrigondekar > wrote: > > > > +1 (non-binding) > > > > Thanks, > > Anish > > > > On Mon, Oct 23, 2023 at 5:01 PM Wenchen Fan wrote: > >> > >> +1 > >> > >> On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim < > kabh

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Jia Fan
+1 Ruifeng Zheng 于2023年9月12日周二 08:46写道: > +1 > > On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon wrote: > >> +1 >> >> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: >> >>> +1 >>> >>> Xiao >>> >>> Yuanjian Li 于2023年9月11日周一 10:53写道: >>> @Peter Toth I've looked into the details of this i

Re: [DISCUSS] Incremental statistics collection

2023-08-28 Thread Jia Fan
For those databases with automatic deduplication capabilities, such as hbase, we have inserted 100 rows with the same rowkey, but in fact there is only one in hbase. Is the new statistical value we added 100 or 1, or hbase already contains this rowkey, the value would be 0. How should we handle thi

Re: Some questions about Spark github action

2023-08-24 Thread Jia Fan
Thanks Xinrong and Jack. I will take a look, also I find https://github.com/apache/spark/pull/32092 is what I want. Thanks a lot. Xinrong Meng 于2023年8月25日周五 04:30写道: > Hi Jia, > > Consider reviewing GitHub Action variables like > $GITHUB_REPOSITORY. Detailed information can be f

Some questions about Spark github action

2023-08-24 Thread Jia Fan
, all developers can control the github action opening and retrying of their own PRs. I checked Spark's github action configuration and found nothing special, is there any key point that I haven't noticed? I'd be very grateful if anyone could help. Best regards, Jia Fan

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-13 Thread Jia Fan
+1 Mridul Muralidharan 于2023年8月11日周五 15:57写道: > > +1 > > Signatures, digests, etc check out fine. > Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes > > Regards, > Mridul > > > On Fri, Aug 11, 2023 at 2:00 AM Cheng Pan wrote: > >> +1 (non-binding) >> >> Passed integratio

Re: What else could be removed in Spark 4?

2023-08-07 Thread Jia Fan
d practice to use the deprecated method of 2.x on 4.x. 3. For Mesos, I think we should remove it from doc first. ____ Jia Fan > 2023年8月8日 05:47,Sean Owen 写道: > > While we're noodling on the topic, what else might be worth removing in Spark > 4? > &

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Jia Fan
Congratulations! Jia Fan > 2023年8月7日 11:28,Ye Xianjin 写道: > > Congratulations! > > Sent from my iPhone > >> On Aug 7, 2023, at 11:16 AM, Yuming Wang wrote: >> >>  >> >> Congratulations! >> >> On Mon

Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Jia Fan
+ 1 > 2023年7月29日 13:06,Adrian Pop-Tifrea 写道: > > +1, the more data source formats, the better, and if the solution is already > thoroughly tested, I say we should go for it. > > On Sat, Jul 29, 2023, 06:35 Xiao Li > wrote: >> +1 >> >> On Fri, Jul 28, 2023 at 15

Re: [Reminder] Spark 3.5 Branch Cut

2023-07-14 Thread Jia Fan
Can we put [SPARK-44262][SQL] Add `dropTable` and `getInsertStatement` to JdbcDialect into 3.5.0? https://github.com/apache/spark/pull/41855 Since this is the last major version update of 3.x, I think we need to make sure JdbcDialect can support more databases. Gengliang Wang 于2023年7月15日周六 05:20

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Jia Fan
+1 Maxim Gekk 于2023年7月4日周二 17:23写道: > +1 > > On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > >> +1, thank you >> >> Kent >> >> On 2023/07/04 05:32:52 Dongjoon Hyun wrote: >> > +1 >> > >> > Thank you, Yuanjian >> > >> > Dongjoon >> > >> > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon >> wrote:

Re: Beginner - Looking for starter issues

2023-06-29 Thread Jia Fan
Hi Harry, Maybe you can start with https://issues.apache.org/jira/browse/SPARK-37935 Jia Fan > 2023年6月28日 08:09,Harry 写道: > > Hi, > > I am looking to pick up some tasks on ASF Jira. > I have a basic understanding of how things work in t

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-19 Thread Jia Fan
+1 Dongjoon Hyun 于2023年6月20日周二 10:41写道: > Please vote on releasing the following candidate as Apache Spark version > 3.4.1. > > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spa

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jia Fan
By the way, like Holden said, what's big feature for 4.0.0? I think very big version change always bring some different. Jia Fan 于2023年6月13日周二 08:25写道: > +1 > > ________ > > Jia Fan > > > > 2023年6月13日 03:51,Chao Sun 写道: > > +1 > &g

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Jia Fan
+1 Jia Fan > 2023年6月13日 03:51,Chao Sun 写道: > > +1 > > On Mon, Jun 12, 2023 at 12:50 PM kazuyuki tanimura > wrote: >> +1 (non-binding) >> >> Thank you! >> Kazu >> >> >>> On Jun 12, 2023, at

Re: Apache Spark 3.4.1 Release?

2023-06-08 Thread Jia Fan
+1 Jia Fan > 2023年6月9日 08:00,Yuming Wang 写道: > > +1. > > On Fri, Jun 9, 2023 at 7:14 AM Chao Sun <mailto:sunc...@apache.org>> wrote: >> +1 too >> >> On Thu, Jun 8, 2023 at 2:34 PM kazuyuki tanimura >> wrote

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-28 Thread Jia Fan
Thanks Dongjoon! There are some ticket I want to share. SPARK-39420 Support ANALYZE TABLE on v2 tables SPARK-42750 Support INSERT INTO by name SPARK-43521 Support CREATE TABLE LIKE FILE Dongjoon Hyun 于2023年5月29日周一 08:42写道: > Hi, All. > > Apache Spark 3.5.0 is scheduled for August (1st Release Ca

Re: [DISCUSS] Add SQL functions into Scala, Python and R API

2023-05-24 Thread Jia Fan
+1 It is important that different APIs can be used to call the same function Ryan Berti 于2023年5月25日周四 01:48写道: > During my recent experience developing functions, I found that identifying > locations (sql + connect functions.scala + functions.py, FunctionRegistry, > + whatever is required for R)

Re: [CONNECT] New Clients for Go and Rust

2023-05-19 Thread Jia Fan
clickhouse. It use different repository for jdbc, odbc, c++. Please refer: https://github.com/ClickHouse/clickhouse-java https://github.com/ClickHouse/clickhouse-odbc https://github.com/ClickHouse/clickhouse-cpp PS: I'm looking forward to the javascript connect client! Thanks Regards Jia Fan M

Re: The Spark email setting should be update

2023-04-19 Thread Jia Fan
if I click the Reply arrow button to the right of each > message, it responds only to the person who sent that message. > > In order to respond to the list, I had to click "Reply All", move the list > to the To field and remove everybody else. > > Is this the same issue

The Spark email setting should be update

2023-04-17 Thread Jia Fan
m, because when I reply to emails from other communities, the default reply address is d...@xxx.apache.org. Can spark modify the corresponding settings to reduce the chance of developers replying incorrectly? Thanks ____ Jia Fan

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-11 Thread Jia Fan
+1 Wenchen Fan 于2023年4月11日周二 14:32写道: > +1 > > On Tue, Apr 11, 2023 at 9:57 AM Yuming Wang wrote: > >> +1. >> >> On Tue, Apr 11, 2023 at 9:14 AM Yikun Jiang wrote: >> >>> +1 (non-binding) >>> >>> Also ran the docker image related test (signatures/standalone/k8s) with >>> rc7: https://github.co

Re: Undelivered Mail Returned to Sender

2023-03-08 Thread Jia Fan
The mail system > > : Host or domain name not found. Name > service > error for name=databricks.com.invalid type=: Host not found > > > > -- Forwarded message -- > From: Jia Fan > To: Herman van Hovell > Cc: Hyukjin Kwon , dev@

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Jia Fan
Hi Herman, I just use ./build/mvn -DskipTests clean package, also had try to use ./build/mvn -DskipTests clean install. Herman van Hovell 于2023年3月8日周三 21:17写道: > Hi Jia, > > How are you building connect? > > Kind regards, > Herman > > On Wed, Mar 8, 2023 a

Re: [Question] Can't start Spark Connect

2023-03-08 Thread Jia Fan
a test case like > `SparkConnectServiceSuite` in IntelliJ should work. > > On Wed, 8 Mar 2023 at 15:02, Jia Fan wrote: > >> Hi developers, >>I want to contribute some code for Spark Connect. Any doc for >> starters? I want to debug SimpleSparkConnectService but I can't

[Question] Can't start Spark Connect

2023-03-07 Thread Jia Fan
Hi developers, I want to contribute some code for Spark Connect. Any doc for starters? I want to debug SimpleSparkConnectService but I can't start it with IDEA. I would appreciate any help. Thanks ____ Jia Fan

RE: How to convert InternalRow to Row.

2020-11-30 Thread Jia, Ke A
The fromRow method is removed in spark3.0. And the new API is : val encoder = RowEncoder(schema) val row = encoder.createDeserializer().apply(internalRow) Thanks, Jia Ke From: Wenchen Fan Sent: Friday, November 27, 2020 9:32 PM To: Jason Jun Cc: Spark dev list Subject: Re: How to convert

RE: Adaptive Query Execution performance results in 3TB TPC-DS

2020-02-13 Thread Jia, Ke A
runtime statistic for further optimization not table stats. So it seems the effect of table stats may be small to this benchmark tests. Thanks. Regards, Jia Ke From: Amogh Margoor Sent: Friday, February 14, 2020 5:02 AM To: Wenchen Fan Cc: Jia, Ke A ; dev@spark.apache.org Subject: Re: Adapti

Adaptive Query Execution performance results in 3TB TPC-DS

2020-02-11 Thread Jia, Ke A
bug or improvement when enable AQE, please help to file related JIRAs. Thanks. Regards, Jia Ke

RE: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Jia, Ke A
Hi Ben and Felix, This is Jia Ke from Intel Big Data Team. And I'm also interested in this. Would you please add me to the invite, thanks a lot. Best regards, Jia Ke From: Qi,He Sent: Thursday, December 05, 2019 11:12 AM To: Saisai Shao Cc: Liu,Linhong ; Aniket Mokashi ; Felix Cheung

Reuse Executor JVM across different JobContext

2016-01-17 Thread Jia Zou
Dear all, Is there a way to reuse executor JVM across different JobContexts? Thanks. Best Regards, Jia

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Annabel, but I may need to clarify that I have no intention to write and run Spark UDF in C++, I'm just wondering whether Spark can read and write data to a C++ process with zero copy. Best Regards, Jia On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote: > My guess is

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Hi, Kazuaki, It’s very similar with my requirement, thanks! It seems they want to write to a C++ process with zero copy, and I want to do both read/write with zero copy. Any one knows how to obtain more information like current status of this JIRA entry? Best Regards, Jia On Dec 7, 2015

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
more carefully, in case it has a very efficient C++ binding mechanism. Best Regards, Jia On Dec 7, 2015, at 11:46 AM, Dewful wrote: > Maybe looking into something like Tachyon would help, I see some sample c++ > bindings, not sure how much of the current functionality they support... &

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
why we want shared memory. Suggestions will be highly appreciated! Best Regards, Jia On Dec 7, 2015, at 10:54 AM, Robin East wrote: > -dev, +user (this is not a question about development of Spark itself so > you’ll get more answers in the user mailing list) > > First up let me say

Shared memory between C++ process and Spark

2015-12-06 Thread Jia
to do this, but I wonder whether there is any existing efforts or more efficient approach to do this? Thank you very much! Best Regards, Jia - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-26 Thread Calvin Jia
+1, tested that 1.5.0-RC2 works with Tachyon 0.7.1 as external block store.

Jenkins HiveCompatibilitySuite Test Failures

2015-07-24 Thread Calvin Jia
Hi, I've been seeing errors with org.apache.spark.sql.hive.execution.HiveCompatibilitySuite from the Jenkins tests in a PR I proposed as well as in PRs from other members of the community. Is this test stable at the moment? Thanks, Calvin

Re: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-04 Thread Calvin Jia
+1 Tested with input from Tachyon and persist off heap. On Thu, Jun 4, 2015 at 6:26 PM, Timothy Chen wrote: > +1 > > Been testing cluster mode and client mode with mesos with 6 nodes cluster. > > Everything works so far. > > Tim > > On Jun 4, 2015, at 5:47 PM, Andrew Or wrote: > > +1 (binding)

Re: Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-28 Thread Calvin Jia
Hi, You can apply this patch and recompile. Hope this helps, Calvin On Tue, Apr 28, 2015 at 1:19 PM, sara mustafa wrote: > Hi Zhang, > > How did you compile Spark 1.3.1 with Tachyon? when i changed Tachyon > version > to 0.6.3 in core/pom.xml, make-d