Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Jacek Laskowski
> Thanks for reporting the issue! Did you hit the same problem when you set > the `spark.jars.ivy` config with Spark 3.5? If this config never worked > with a relative path, we should change the wording in the migration guide. > > Thanks, > Wenchen > > On Sun, Apr 27, 2025 at

Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Jacek Laskowski
kSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) A workaround is to use an absolute path. Is this a known issue? Should I report it against rc4? Please guide. Thanks! Po

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Jacek Laskowski
... Tests passed in 28 second Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Tue, Jun 20, 2023 at 4:41 AM Dongjoon Hyun wrote: > Ple

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Jacek Laskowski
+0 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu wrote: > Hi all, > > I'd like

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski
at Python devs would like to work on new data sources but support their wishes wholeheartedly :) Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Jacek Laskowski
+1 * Built fine with Scala 2.13 and -Pkubernetes,hadoop-cloud,hive,hive-thriftserver,scala-2.13,volcano * Ran some demos on Java 17 * Mac mini / Apple M2 Pro / Ventura 13.3.1 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-03 Thread Jacek Laskowski
+1 Compiled on Java 17 with Scala 2.13 on macos and ran some basic code. Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Thu, Mar 30, 20

Re: starter tasks for new contributors

2023-03-17 Thread Jacek Laskowski
Hey Maxim, Very great kudos for the idea! Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Mar 17, 2023 at 2:18 PM Maxim Gekk wrote:

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Jacek Laskowski
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a huge one for my clients (who still are on 3.2.1 or even older :)) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on htt

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-30 Thread Jacek Laskowski
Hi, I don't want to hijack the voting thread but given I faced https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if it's -1. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.

v3.2.0-rc6 and org.postgresql.Driver was not found in the CLASSPATH

2021-09-30 Thread Jacek Laskowski
Hi, Just ran a freshly-built 3.2.0 RC6 and faced an issue (that seems to be reported earlier on SO): > The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH More details in https://issues.apache.org/jira/browse/SPARK-36904 Pozdrawiam, Ja

Re: [SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
Salut Sean ! Merci beaucoup mon ami Sean ! That's exactly an answer I hoped for. Thank you! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski

[SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
seem to think straight. Would a PR with such a change be acceptable? (Sean I'm looking at you :D) [1] https://github.com/apache/spark/blob/8d817dcf3084d56da22b909d578a644143f775d5/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeShuffleWithLocalRead.scala#L89-L9

Re: [SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-07 Thread Jacek Laskowski
Thanks Wenchen. If it's ever asked on SO I'm simply gonna quote you :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/ja

[SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-03 Thread Jacek Laskowski
he/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitt

[SQL] When SQLConf vals gets own accessor defs?

2021-09-03 Thread Jacek Laskowski
[3] https://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L638 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Jacek Laskowski
Hi Yi Wu, Looks like the issue has got resolution: Won't Fix. How about your -1? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
] Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sun, Aug 22, 2021 at 12:45 PM Jacek Laskowski wrote: > Hi Gengl

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
uild 25.292-b10, mixed mode) BTW, Shouldn't the page [1] be updated to reflect this? This is what I followed. [1] https://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage Thanks Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internal

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Jacek Laskowski
$MAVEN_OPTS -Xmx8g -XX:ReservedCodeCacheSize=1g Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Aug 20

TreeNode.exists?

2021-08-11 Thread Jacek Laskowski
33671/files#diff-4d16a733f8741de9a4b839ee7c356c3e9b439b4facc70018f5741da1e930c6a8R51-R54 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-02 Thread Jacek Laskowski
Big shout-out to you, Dongjoon! Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 2, 20

Should AggregationIterator.initializeBuffer be moved down to SortBasedAggregationIterator?

2021-05-25 Thread Jacek Laskowski
awiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Purpose of OffsetHolder as a LeafNode?

2021-05-15 Thread Jacek Laskowski
#L633 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Welcoming six new Apache Spark committers

2021-03-30 Thread Jacek Laskowski
Hi, Congrats to all of you committers! Wishing you all the best (commits)! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jacek

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-19 Thread Jacek Laskowski
Hi Hyukjin, FYI: cloud-fan commented 3 hours ago: thanks, merging to master/3.1! https://github.com/apache/spark/pull/31550#issuecomment-781977920 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Jacek Laskowski
Hi, I'm "okay to add RocksDB StateStore as external module". See no reason not to. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <

Re: How to contribute the code

2021-02-01 Thread Jacek Laskowski
Hi, http://spark.apache.org/contributing.html ? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, J

[K8S] ExecutorPodsWatchSnapshotSource with no spark-exec-inactive label in 3.1?

2021-01-23 Thread Jacek Laskowski
/ExecutorPodsPollingSnapshotSource.scala#L62 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
Hi Hyukjin, Agreed. I asked to see if I'm not missing anything. Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/

Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
(We could leave it as is as an "easter egg"-like thing too) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Jacek Laskowski
orPodsAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. 21/01/19 12:23:29 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski &q

[K8S] KUBERNETES_EXECUTOR_REQUEST_CORES

2021-01-12 Thread Jacek Laskowski
utorFeatureStep.scala#L72 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi Sean, +1 to leave it. Makes so much more sense (as that's what really happened and the history of Apache Spark is...irreversible). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me o

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, BTW, wondering aloud. Since it was agreed to skip 3.1.0 and go ahead with 3.1.1, what's gonna happen with v3.1.0 tag [1]? Is it going away and we'll see 3.1.1-rc1? [1] https://github.com/apache/spark/tree/v3.1.0-rc1 Pozdrawiam, Jacek Laskowski https://about.me/JacekLask

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, I'm just reading this now. I'm for 3.1.1 with no 3.1.0 but the news that we're skipping that particular release. Gonna be more fun! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> F

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Jacek Laskowski
Hi, I'm curious why Spark 3.1.0 is already available in repo1.maven.org? https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/ Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/&g

Re: [3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
ache/spark/blob/094563384478a402c36415edf04ee7b884a34fc9/core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala#L108 [2] https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Lasko

[3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
Thanks Sean for such a quick response! Let me propose a fix for the docs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklasko

Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
uld be compiled with Scala 2.12, but that requires scala-2.12 profile [2] to be enabled) [1] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L158 [2] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L2830 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of&

Why is V2SessionCatalog not a CatalogExtension?

2020-08-08 Thread Jacek Laskowski
, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Why time difference while registering a new BlockManager (using BlockManagerMasterEndpoint)?

2020-06-12 Thread Jacek Laskowski
/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L481 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-26 Thread Jacek Laskowski
Thanks Yi Wu! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Apr 18, 2020 at 12:17 PM wuyi wrote: &

ShuffleMapStage and pendingPartitions vs isAvailable or findMissingPartitions?

2020-04-26 Thread Jacek Laskowski
since isAvailable or findMissingPartitions (using MapOutputTrackerMaster) know it already and I think are even more up-to-date. Why is there this extra registry? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala#L60 Pozdraw

BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-16 Thread Jacek Laskowski
spark/blob/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L382 [2] https://github.com/apache/spark/blob/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L637 Pozdrawiam, Jacek Laskowski https://about.me/JacekLask

Re: InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-16 Thread Jacek Laskowski
Hi Jungtaek, Thanks a lot for your answer. What you're saying reflects my understanding perfectly. There's a small change, but makes understanding where rules are used much simpler (= less confusing). I'll propose a PR and see where it goes from there. Thanks! Pozdrawiam,

InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-12 Thread Jacek Laskowski
blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L115 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [DOCS] Spark SQL Upgrading Guide

2020-02-16 Thread Jacek Laskowski
nal/SQLConf.scala#L1306-L1307 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Feb 15, 2020 at 7:44 PM Jacek Laskowsk

[DOCS] Spark SQL Upgrading Guide

2020-02-15 Thread Jacek Laskowski
/github.com/apache/spark/blob/master/docs/sql-migration-guide.md#upgrading-from-spark-sql-244-to-245 [5] http://spark.apache.org/releases/spark-release-2-4-5.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl

Does StreamingSymmetricHashJoinExec work with watermark? I don't think so

2019-11-11 Thread Jacek Laskowski
b.com/apache/spark/blob/v3.0.0-preview/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala#L156-L164 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-14 Thread Jacek Laskowski
ner and get > more information about batches. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski wrote: > >> Hi, >> >> I haven't spent much time on it, but the following DEBUG message >> from WatermarkTracker spa

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-13 Thread Jacek Laskowski
on why we don't deal with >> it. I'll file and submit a patch. >> >> Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] >> which I've submitted a patch recently. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >>

[SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-12 Thread Jacek Laskowski
r the other modes - Complete and Update. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L329-L365 Is this intentional? Why? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-10 Thread Jacek Laskowski
Hi, Thanks much for such thorough conversation. Enjoyed it very much. > Source/Sink traits are in org.apache.spark.sql.execution and thus they are private. That would explain why I couldn't find scaladocs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-08 Thread Jacek Laskowski
t, but am hoping to get some more info before. Thanks! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka htt

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-02 Thread Jacek Laskowski
at SparkAISummit in two weeks!) Gonna be challenging! Hope I won't spread a wrong word. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-st

[SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jacek Laskowski
pache/spark/sql/SQLContext.scala#L422-L428 [3] https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals

Re: Welcoming some new committers and PMC members

2019-09-12 Thread Jacek Laskowski
Hi, What a great news! Congrats to all awarded and the community for voting them in! p.s. I think it should go to the user mailing list too. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark

Re: Why two netty libs?

2019-09-05 Thread Jacek Laskowski
Hi, Thanks much for the answers. Learning Spark every day! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache

Why two netty libs?

2019-09-03 Thread Jacek Laskowski
Hi, Just noticed that Spark 2.4.x uses two netty deps of different versions. Why? jars/netty-all-4.1.17.Final.jar jars/netty-3.9.9.Final.jar Shouldn't one be excluded or perhaps shaded? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://b

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-09-03 Thread Jacek Laskowski
Hi Devs, Thanks all for a very prompt response! That was insanely quick. Merci beaucoup! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

[SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jacek Laskowski
github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-11 Thread Jacek Laskowski
Hi, Thanks Dongjoon Hyun for stepping up as a release manager! Much appreciated. If there's a volunteer to cut a release, I'm always to support it. In addition, the more frequent releases the better for end users so they have a choice to upgrade and have all the latest fixes or wait. It's their

Re: [SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-11 Thread Jacek Laskowski
UI is meant for). With that being said, I'm wondering why is EventTimeStatsAccum not a SQL metric then? With that, it'd be in web UI, but just in the physical plan of a streaming query. WDYT? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https

[SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-10 Thread Jacek Laskowski
d send a pull request for review? Please guide as I found it very helpful (and surprisingly easy to implement so I'm worried I'm missing something important). Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internal

[SS] ContinuousExecution.commit and excessive JSON serialization?

2019-06-03 Thread Jacek Laskowski
/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L341 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

FileSourceScanExec.doExecute - when is this executed if ever?

2019-04-26 Thread Jacek Laskowski
Hi, I may have asked this question before, but seems I forgot/can't find the answer. When is FileSourceScanExec.doExecute executed if ever? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming

Re: Is there a way to read a Parquet File as ColumnarBatch?

2019-04-22 Thread Jacek Laskowski
I'm exploring parquet data source in more detail as we speak). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-

Re: Sort order in bucketing in a custom datasource

2019-04-16 Thread Jacek Laskowski
Hi, I don't think so. I can't think of an interface (trait) that would give that information to the Catalyst optimizer. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bi

Re: Static functions

2019-02-11 Thread Jacek Laskowski
Hi Jean, I thought the functions have already been tagged? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering

Re: [SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-26 Thread Jacek Laskowski
Thanks Jungtaek Lim! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https

[SS] FlatMapGroupsWithStateExec with no commitTimeMs metric?

2018-11-25 Thread Jacek Laskowski
/StreamingGlobalLimitExec.scala#L87 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me

Re: Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Jacek Laskowski
Hi Marco, Many thanks for such a quick response. With that, I'll direct my curiosity into a different direction. Thanks! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/

Is spark.sql.codegen.factoryMode property really for tests only?

2018-11-16 Thread Jacek Laskowski
c/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L758-L767 [2] https://github.com/apache/spark/blob/v2.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Projection.scala#L159 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit

How to know all the issues resolved for 2.4.0?

2018-11-07 Thread Jacek Laskowski
ution statuses: Resolved, Done, Fixed? When is an issue marked as either of them? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams

Why does spark.range(1).write.mode("overwrite").saveAsTable("t1") throw an Exception?

2018-10-30 Thread Jacek Laskowski
or users so accept my apologizes when sent to a wrong mailing list. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: welcome a new batch of committers

2018-10-06 Thread Jacek Laskowski
Wow! That's a nice bunch of contributors. Congrats to all new committers. I've had tough times to follow all the contributions, but with this crew it's gonna be nearly impossible. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-10-01 Thread Jacek Laskowski
Hi, OK. Sorry for the noise. I don't know why it started working, but I cannot reproduce it anymore. Sorry for a false alarm (but I could promise it didn't work and I changed nothing). Back to work... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-30 Thread Jacek Laskowski
d anything relevant. I'm surprised nobody's reported it before. That worries me (or simply says that all the enterprise deployments simply use YARN with Hive?) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-30 Thread Jacek Laskowski
or.java:624) at java.lang.Thread.run(Thread.java:748) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-stream

saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-29 Thread Jacek Laskowski
.java:455) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) While it works fine in 2.3.1. Could anybody explain the change in behaviour in 2.3.2? The commit / the JIRA issue would be even nicer. Thanks. Pozdrawiam, Jacek Laskows

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-08 Thread Jacek Laskowski
something that I should not have been bothered much with. Thanks Russ and Herman for your help to get my thinking right. That will also help my Spark clients, esp. during Spark SQL workshops! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mast

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-08 Thread Jacek Laskowski
Thanks Russ! That helps a lot. On the other hand makes reviewing the codebase of Spark SQL slightly harder since Java code generation is so much about string concatenation :( p.s. Should all the code in doExecute be considered and marked @deprecated? Pozdrawiam, Jacek Laskowski https

Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
e generation is enabled and is currently the proper execution path? p.s. This SparkPlan.doExecute is used to trigger whole-stage code gen by WholeStageCodegenExec (and InputAdapter), but that's all the code that is to be executed by doExecute, isn't it? Pozdrawiam, Jacek Laskowski https:/

Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap

2018-09-07 Thread Jacek Laskowski
hat uses createHashMap or finishAggregate). Is that correct? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-k

Why is View logical operator not a UnaryNode explicitly?

2018-08-27 Thread Jacek Laskowski
463 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Same code in DataFrameWriter.runCommand and Dataset.withAction?

2018-08-14 Thread Jacek Laskowski
) or even remove runCommand altogether. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Fo

Re: Why is SQLImplicits an abstract class rather than a trait?

2018-08-05 Thread Jacek Laskowski
uldn't that import work? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Jacek Laskowski
;ve brought it up since I think Kafka data source is so important that it should be included in spark-shell and spark-submit by default. THANKS! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https:

Qs on Dataset API -- groups of createXXXTempViews and XXXcheckpoint methods

2018-07-26 Thread Jacek Laskowski
f help would be very helpful. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at

Re: JDBC Data Source and customSchema option but DataFrameReader.assertNoSpecifiedSchema?

2018-07-19 Thread Jacek Laskowski
/jdbc/JDBCRelation. scala?utf8=%E2%9C%93#L116-L118 [2] https://github.com/apache/spark/blob/v2.3.1/sql/core/ src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils. scala#L785-L788 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https:

JDBC Data Source and customSchema option but DataFrameReader.assertNoSpecifiedSchema?

2018-07-16 Thread Jacek Laskowski
/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L167 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka St

Re: [ANNOUNCE] Announcing Apache Spark 2.3.1

2018-06-14 Thread Jacek Laskowski
Hi Marcelo, How to announce it on twitter @ https://twitter.com/apachespark? How to make it part of the release process? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark

Re: [SQL] Purpose of RuntimeReplaceable unevaluable unary expressions?

2018-05-31 Thread Jacek Laskowski
Yay! That's right!!! Thanks Reynold. Such a short answer with so much information. Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering

[SQL] Purpose of RuntimeReplaceable unevaluable unary expressions?

2018-05-30 Thread Jacek Laskowski
e? [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L275 [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L266-L267 Pozd

Re: [SQL] Understanding RewriteCorrelatedScalarSubquery optimization (and TreeNode.transform)

2018-05-28 Thread Jacek Laskowski
nstead be focusing on the methods of Expression or even QueryPlan to understand the various methods (as that's what triggered my question). Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streami

[SQL] Two ScalarSubquery expressions?! Could we have ScalarSubqueryExec instead?

2018-05-27 Thread Jacek Laskowski
e/src/main/scala/org/apache/spark/sql/execution/subquery.scala?utf8=%E2%9C%93#L46 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams http

[SQL] Understanding RewriteCorrelatedScalarSubquery optimization (and TreeNode.transform)

2018-05-27 Thread Jacek Laskowski
tf8=%E2%9C%93#L290-L299 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: Spark version for Mesos 0.27.0

2018-05-25 Thread Jacek Laskowski
Hi, Mesos 0.27.0?! That's been a while. I'd search for the changes to pom.xml and see when the mesos dependency version changed. That'd give you the most precise answer. I think it could've been 1.5 or older. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski

  1   2   3   4   >