Re: Allowing Unicode Whitespace in Lexer

2024-03-25 Thread Alex Cruise
While we're at it, maybe consider allowing "smart quotes" too :) -0xe1a On Sat, Mar 23, 2024 at 5:29 PM serge rielau.com wrote: > Hello, > > I have a PR https://github.com/apache/spark/pull/45620 ready to go that > will extend the definition of whitespace (what separates token) from the > smal

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Alex Porcelli
m/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no

[QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Alex Porcelli
ith such a dependency and being pointed out that you may have a solution to share). [1] - https://github.com/apache/spark/blob/master/sql/core/pom.xml#L187 [2] - https://issues.apache.org/jira/browse/LEGAL-526 [3] - https://issues.apache.org/jira/browse/LEGAL-663 Regards, Alex

Query hints visible to DSV2 connectors?

2023-08-02 Thread Alex Cruise
Hey folks, I'm adding an optional feature to my DSV2 connector where it can choose between a row-based or columnar PartitionReader dynamically depending on a query's schema. I'd like to be able to supply a hint at query time that's visible to the connector, but at the moment I can't see any way to

Re: Late materialization?

2023-05-31 Thread Alex Cruise
DML or compactions are happening behind the query's back, but presumably Spark users already have this class of problem, it's just less serious when the end-to-end execution time of a query is shorter. WDYT? -0xe1a On Wed, May 31, 2023 at 11:03 AM Alex Cruise wrote: > Hey folks, I&#

Late materialization?

2023-05-31 Thread Alex Cruise
Hey folks, I'm building a Spark connector for my company's proprietary data lake... That project is going fine despite the near total lack of documentation. ;) In parallel, I'm also trying to figure out a better story for when humans inevitably `select * from 100_trillion_rows`, glance at the firs

planInputPartitions being called twice

2023-05-12 Thread Alex Cruise
(I posted this on Slack originally) Hey folks, I’m writing a batch connector for an in-house data lake and doing some performance work now… I’ve noticed my ScanBuilder creates a Scan exactly once, but its toBatch method is being called three times, returning the identical object every time, then t

Recent paper that might be relevant to pushdown and other optimizations

2023-04-21 Thread Alex Cruise
Optimizing Query Predicates with Disjunctions for Column Stores https://arxiv.org/pdf/2002.00540.pdf [abstract at the end of my message] I just googled [predicate pushdown cnf] and it's WILD to me that this paper came up in the first page of search results, and was published last year. It mention

Re: Adding new connectors

2023-03-27 Thread Alex Cruise
On Fri, Mar 24, 2023 at 11:23 AM Alex Cruise wrote: > I found ExternalCatalog a few days ago and have been implementing one of > those, but it seems like DataSourceRegister / SupportsCatalogOptions is > another popular approach. I'm not sure offhand how they overlap/intersect &g

Re: Adding new connectors

2023-03-24 Thread Alex Cruise
On Fri, Mar 24, 2023 at 3:18 PM John Zhuge wrote: > Is this similar to Iceberg's hidden partitioning > ? > Check out the details in the spec: > https://iceberg.apache.org/spec/#partition-transforms > Yes, it's ver

Re: Adding new connectors

2023-03-24 Thread Alex Cruise
On Fri, Mar 24, 2023 at 1:46 PM John Zhuge wrote: > Have you checked out SparkCatalog > > in > Apache Iceberg project? More docs at > https://iceberg.apache.org/docs/latest/s

Adding new connectors

2023-03-24 Thread Alex Cruise
Hey folks, please let me know this is more of a user@ post! I'm building a Spark connector for my company's data-lake-ish product, and it looks like there's very little documentation about how to go about it. I found ExternalCatalog a few days ago and have been implementing one of those, but it s

Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-17 Thread Alex Ott
ming Wang >> >> - Reynold Xin * >> >> - Cheng Su >> >> - Peter Toth >> >> - Mich Talebzadeh >> >> - Maxim Gekk >> >> - Chao Sun >> >> - Xinli Shang >> >> - Huaxin Gao >> >> - Kent Yao >> >> - Liang-Chi Hsieh * >> >> - Kousuke Saruta * >> >> - Ye Zhou >> >> - Cheng Pan >> >> - Angers Zhu >> >> - Wenchen Fan * >> >> - Holden Karau * >> >> - Yi Wu >> >> - Ricardo Almeida >> >> - DB Tsai * >> >> - Thomas Graves * >> >> - Terry Kim >> >> >> >> +0: None >> >> -1: None >> >> -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)

Re: Binary compatibility issues in 3.1.1?

2021-02-08 Thread Alex Ott
although no, additional constructor won't work... On Mon, Feb 8, 2021 at 7:01 PM Alex Ott wrote: > Hi all > > I've noticed following SO question about Spark 3.1.1 not working with > Delta 0.7.0: > https://stackoverflow.com/questions/66106096/delta-lake-insert-into-sql-i

Binary compatibility issues in 3.1.1?

2021-02-08 Thread Alex Ott
sn't work without changes: https://github.com/datastax/spark-cassandra-connector/pull/1280 -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)

Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Alex Scammon
Thank you! I just reached out to the original author and we'll get the conflicts sorted out amongst us, I'm sure. Thanks again, -Alex From: Dongjoon Hyun Sent: Tuesday, June 30, 2020 9:01 PM To: Alex Scammon Cc: Michel Sumbul ; dev@spark.apache.org

Re: [Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-30 Thread Alex Scammon
Can I buymeacoffee.com for someone to take a look at PR#23340<https://github.com/apache/spark/pull/23340>? I'm totally not above outright bribery to get some eyes on this PR. Thanks, -Alex From: Michel Sumbul Sent: Thursday, June 25, 2020 11:48

[Spark Core] Merging PR #23340 for New Executor Memory Metrics

2020-06-22 Thread Alex Scammon
/SPARK-23206 Any help getting #23340 opened back up and moving again would be very much appreciated. Cheers, Alex Scammon Head of Open Source Engineering G-Research

Re: Auto-linking from PRs to Jira tickets

2020-03-10 Thread Alex Ott
yes - it's https://issues.apache.org/jira/browse/INFRA-19934 Nicholas Chammas at "Tue, 10 Mar 2020 13:52:23 -0400" wrote: NC> Could you point us to the ticket? I'd like to follow along. NC> On Tue, Mar 10, 2020 at 9:13 AM Alex Ott wrote: NC> For Zeppelin

Re: Auto-linking from PRs to Jira tickets

2020-03-10 Thread Alex Ott
ack to the relevant Jira tickets. (We NC> already have auto-linking from Jira to PRs.) NC> Has someone looked into this already, or should I file a ticket with INFRA and see what they say? NC> Nick -- With best wishes,Alex Ott http://alexott.net/ Twitte

Re: LICENSE and NOTICE file content

2018-06-25 Thread Alex Harui
meet the spirit of an open source release, either, but better than shipping executable code in a source package. On the other hand, I would not hold up a release for an issue like this. Fix it in some future release. My 2 cents, -Alex From: Sean Owen Reply-To: "legal-disc...@apach

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Alex Kozlov
Looked at the paper: while we can argue on the performance side, I think semantically the Scala pattern matching is much more expressive. The time will decide. On Tue, Mar 1, 2016 at 9:07 AM, Jerry Lam wrote: > Hi Alex, > > We went through this path already :) This is the reason we

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Alex Kozlov
erman van Hövell >>> >>> 2016-03-01 15:16 GMT+01:00 Jerry Lam : >>> >>>> Hi Spark developers, >>>> >>>> Will you consider to add support for implementing "Pattern matching in >>>> sequences of rows"? More specifically

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
Thanks. I was actually able to get mapreduce.input. fileinputformat.list-status.num-threads working in Spark against a regular fileset in S3, in Spark 1.5.2 ... looks like the issue is isolated to Hive. On Tue, Jan 12, 2016 at 6:48 PM, Cheolsoo Park wrote: > Alex, see this jira- >

Re: Enabling mapreduce.input.fileinputformat.list-status.num-threads in Spark?

2016-01-12 Thread Alex Nastetsky
Ran into this need myself. Does Spark have an equivalent of "mapreduce. input.fileinputformat.list-status.num-threads"? Thanks. On Thu, Jul 23, 2015 at 8:50 PM, Cheolsoo Park wrote: > Hi, > > I am wondering if anyone has successfully enabled > "mapreduce.input.fileinputformat.list-status.num-t

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Alex Rovner
(http://opentsdb.net/) and monitor the progress through the UI provided by the DB. *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * <http://www.magnetic.com/>* On Mon, Nov 30, 2015 at 1:43 PM, Jacek Laskowski wrote: > Hi, > > My limited understanding of Spark tells me that

Re: Sort Merge Join from the filesystem

2015-11-16 Thread Alex Nastetsky
Done, thanks. On Mon, Nov 9, 2015 at 7:23 PM, Cheng, Hao wrote: > Yes, we definitely need to think how to handle this case, probably even > more common than both sorted/partitioned tables case, can you jump to the > jira and leave comment there? > > > > *

Re: Sort Merge Join from the filesystem

2015-11-09 Thread Alex Nastetsky
, Hao wrote: > Yes, we probably need more change for the data source API if we need to > implement it in a generic way. > > BTW, I create the JIRA by copy most of words from Alex. J > > > > https://issues.apache.org/jira/browse/SPARK-11512 > > > > > >

Sort Merge Join from the filesystem

2015-11-04 Thread Alex Nastetsky
(this is kind of a cross-post from the user list) Does Spark support doing a sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, without needing to repartition/sort them again? This fun

SPARK-10617

2015-10-12 Thread Alex Rovner
Would someone mind reviewing? https://github.com/apache/spark/pull/9004 <https://github.com/apache/spark/pull/9004#issuecomment-146031856> -- *Alex Rovner* *Director, Data Engineering * *o:* 646.759.0052 * <http://www.magnetic.com/>*

Re: SparkR DataFrame fail to return data of Decimal type

2015-08-14 Thread Shkurenko, Alex
Created https://issues.apache.org/jira/browse/SPARK-9982, working on the PR On Fri, Aug 14, 2015 at 12:43 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Thanks for the catch. Could you send a PR with this diff ? > > On Fri, Aug 14, 2015 at 10:30 AM, Shkurenko

SparkR DataFrame fail to return data of Decimal type

2015-08-14 Thread Shkurenko, Alex
.Float" => writeType(dos, "double") writeDouble(dos, value.asInstanceOf[Float].toDouble) +case "decimal" | "java.math.BigDecimal" => + writeType(dos, "double") + writeDouble(dos, scala.math.BigDecimal(value.asInstanceOf[java.math.BigDecimal]).toDouble) case "double" | "java.lang.Double" => writeType(dos, "double") writeDouble(dos, value.asInstanceOf[Double]) Thanks, Alex

Re: Writing to multiple outputs in Spark

2015-08-14 Thread Alex Angelini
Speaking about Shopify's deployment, this would be a really nice to have feature. We would like to write data to folders with the structure `//` but have had to hold off on that because of the lack of support for MultipleOutputs. On Fri, Aug 14, 2015 at 10:56 AM, Silas Davis wrote: > Would it b

RE: Spark build time

2015-04-21 Thread Alex
If you are using MVN there are some parameters (MAVEN_OPTS) which need to be set in order to give the underlying environment enough memory. See the instructions here: https://spark.apache.org/docs/latest/building-spark.html -Original Message- From: "Reynold Xin" Sent: ‎4/‎21/‎2015 4:21

Contributing algorithms to MLlib

2014-06-09 Thread Alex Levin
ought of starting with an implementation of* Fuzzy k - means* algorithm and continuing with *Hidden Markov Model* algorithm. Do you know if anyone is currently working on an implementation of these algorithms for MLlib? Regards, Alex