Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-19 Thread Jacek Laskowski
Hi Hyukjin, FYI: cloud-fan commented 3 hours ago: thanks, merging to master/3.1! https://github.com/apache/spark/pull/31550#issuecomment-781977920 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: Welcoming six new Apache Spark committers

2021-03-30 Thread Jacek Laskowski
Hi, Congrats to all of you committers! Wishing you all the best (commits)! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jacek

Purpose of OffsetHolder as a LeafNode?

2021-05-15 Thread Jacek Laskowski
#L633 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Should AggregationIterator.initializeBuffer be moved down to SortBasedAggregationIterator?

2021-05-25 Thread Jacek Laskowski
awiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-02 Thread Jacek Laskowski
Big shout-out to you, Dongjoon! Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 2, 20

TreeNode.exists?

2021-08-11 Thread Jacek Laskowski
33671/files#diff-4d16a733f8741de9a4b839ee7c356c3e9b439b4facc70018f5741da1e930c6a8R51-R54 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-21 Thread Jacek Laskowski
$MAVEN_OPTS -Xmx8g -XX:ReservedCodeCacheSize=1g Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Aug 20

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
uild 25.292-b10, mixed mode) BTW, Shouldn't the page [1] be updated to reflect this? This is what I followed. [1] https://spark.apache.org/docs/latest/building-spark.html#setting-up-mavens-memory-usage Thanks Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internal

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Jacek Laskowski
] Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sun, Aug 22, 2021 at 12:45 PM Jacek Laskowski wrote: > Hi Gengl

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Jacek Laskowski
Hi Yi Wu, Looks like the issue has got resolution: Won't Fix. How about your -1? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.

[SQL] When SQLConf vals gets own accessor defs?

2021-09-03 Thread Jacek Laskowski
[3] https://github.com/apache/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L638 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.

[SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-03 Thread Jacek Laskowski
he/spark/blob/54cca7f82ecf23e062bb4f6d68697abec2dbcc5b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L519-L530 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitt

Re: [SQL] s.s.a.coalescePartitions.parallelismFirst true but recommends false

2021-09-07 Thread Jacek Laskowski
Thanks Wenchen. If it's ever asked on SO I'm simply gonna quote you :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/ja

[SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
seem to think straight. Would a PR with such a change be acceptable? (Sean I'm looking at you :D) [1] https://github.com/apache/spark/blob/8d817dcf3084d56da22b909d578a644143f775d5/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeShuffleWithLocalRead.scala#L89-L9

Re: [SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Jacek Laskowski
Salut Sean ! Merci beaucoup mon ami Sean ! That's exactly an answer I hoped for. Thank you! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski

v3.2.0-rc6 and org.postgresql.Driver was not found in the CLASSPATH

2021-09-30 Thread Jacek Laskowski
Hi, Just ran a freshly-built 3.2.0 RC6 and faced an issue (that seems to be reported earlier on SO): > The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH More details in https://issues.apache.org/jira/browse/SPARK-36904 Pozdrawiam, Ja

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-30 Thread Jacek Laskowski
Hi, I don't want to hijack the voting thread but given I faced https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if it's -1. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Jacek Laskowski
Yoohoo! Thanks Yuming for driving this release. A tiny step for Spark a huge one for my clients (who still are on 3.2.1 or even older :)) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on htt

Re: starter tasks for new contributors

2023-03-17 Thread Jacek Laskowski
Hey Maxim, Very great kudos for the idea! Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Fri, Mar 17, 2023 at 2:18 PM Maxim Gekk wrote:

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-03 Thread Jacek Laskowski
+1 Compiled on Java 17 with Scala 2.13 on macos and ran some basic code. Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Thu, Mar 30, 20

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Jacek Laskowski
+1 * Built fine with Scala 2.13 and -Pkubernetes,hadoop-cloud,hive,hive-thriftserver,scala-2.13,volcano * Ran some demos on Java 17 * Mac mini / Apple M2 Pro / Ventura 13.3.1 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Fo

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski
at Python devs would like to work on new data sources but support their wishes wholeheartedly :) Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Jacek Laskowski
+0 Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu wrote: > Hi all, > > I'd like

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-22 Thread Jacek Laskowski
... Tests passed in 28 second Pozdrawiam, Jacek Laskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Tue, Jun 20, 2023 at 4:41 AM Dongjoon Hyun wrote: > Ple

Re: Sort order in bucketing in a custom datasource

2019-04-16 Thread Jacek Laskowski
Hi, I don't think so. I can't think of an interface (trait) that would give that information to the Catalyst optimizer. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bi

Re: Is there a way to read a Parquet File as ColumnarBatch?

2019-04-22 Thread Jacek Laskowski
I'm exploring parquet data source in more detail as we speak). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-

FileSourceScanExec.doExecute - when is this executed if ever?

2019-04-26 Thread Jacek Laskowski
Hi, I may have asked this question before, but seems I forgot/can't find the answer. When is FileSourceScanExec.doExecute executed if ever? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming

[SS] ContinuousExecution.commit and excessive JSON serialization?

2019-06-03 Thread Jacek Laskowski
/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L341 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

[SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-10 Thread Jacek Laskowski
d send a pull request for review? Please guide as I found it very helpful (and surprisingly easy to implement so I'm worried I'm missing something important). Thanks. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internal

Re: [SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-11 Thread Jacek Laskowski
UI is meant for). With that being said, I'm wondering why is EventTimeStatsAccum not a SQL metric then? With that, it'd be in web UI, but just in the physical plan of a streaming query. WDYT? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-11 Thread Jacek Laskowski
Hi, Thanks Dongjoon Hyun for stepping up as a release manager! Much appreciated. If there's a volunteer to cut a release, I'm always to support it. In addition, the more frequent releases the better for end users so they have a choice to upgrade and have all the latest fixes or wait. It's their

[SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-08-26 Thread Jacek Laskowski
github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek

Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?

2019-09-03 Thread Jacek Laskowski
Hi Devs, Thanks all for a very prompt response! That was insanely quick. Merci beaucoup! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured

Why two netty libs?

2019-09-03 Thread Jacek Laskowski
Hi, Just noticed that Spark 2.4.x uses two netty deps of different versions. Why? jars/netty-all-4.1.17.Final.jar jars/netty-3.9.9.Final.jar Shouldn't one be excluded or perhaps shaded? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://b

Re: Why two netty libs?

2019-09-05 Thread Jacek Laskowski
Hi, Thanks much for the answers. Learning Spark every day! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache

Re: Welcoming some new committers and PMC members

2019-09-12 Thread Jacek Laskowski
Hi, What a great news! Congrats to all awarded and the community for voting them in! p.s. I think it should go to the user mailing list too. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark

[SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jacek Laskowski
pache/spark/sql/SQLContext.scala#L422-L428 [3] https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-02 Thread Jacek Laskowski
at SparkAISummit in two weeks!) Gonna be challenging! Hope I won't spread a wrong word. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-st

[SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-08 Thread Jacek Laskowski
t, but am hoping to get some more info before. Thanks! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka htt

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-10 Thread Jacek Laskowski
Hi, Thanks much for such thorough conversation. Enjoyed it very much. > Source/Sink traits are in org.apache.spark.sql.execution and thus they are private. That would explain why I couldn't find scaladocs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals

[SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-12 Thread Jacek Laskowski
r the other modes - Complete and Update. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L329-L365 Is this intentional? Why? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-13 Thread Jacek Laskowski
on why we don't deal with >> it. I'll file and submit a patch. >> >> Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] >> which I've submitted a patch recently. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >>

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-14 Thread Jacek Laskowski
ner and get > more information about batches. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski wrote: > >> Hi, >> >> I haven't spent much time on it, but the following DEBUG message >> from WatermarkTracker spa

Does StreamingSymmetricHashJoinExec work with watermark? I don't think so

2019-11-11 Thread Jacek Laskowski
b.com/apache/spark/blob/v3.0.0-preview/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala#L156-L164 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of

[DOCS] Spark SQL Upgrading Guide

2020-02-15 Thread Jacek Laskowski
/github.com/apache/spark/blob/master/docs/sql-migration-guide.md#upgrading-from-spark-sql-244-to-245 [5] http://spark.apache.org/releases/spark-release-2-4-5.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl

Re: [DOCS] Spark SQL Upgrading Guide

2020-02-16 Thread Jacek Laskowski
nal/SQLConf.scala#L1306-L1307 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Feb 15, 2020 at 7:44 PM Jacek Laskowsk

InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-12 Thread Jacek Laskowski
blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L115 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-16 Thread Jacek Laskowski
Hi Jungtaek, Thanks a lot for your answer. What you're saying reflects my understanding perfectly. There's a small change, but makes understanding where rules are used much simpler (= less confusing). I'll propose a PR and see where it goes from there. Thanks! Pozdrawiam,

BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-16 Thread Jacek Laskowski
spark/blob/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L382 [2] https://github.com/apache/spark/blob/31734399d57f3c128e66b0f97ef83eb4c9165978/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L637 Pozdrawiam, Jacek Laskowski https://about.me/JacekLask

ShuffleMapStage and pendingPartitions vs isAvailable or findMissingPartitions?

2020-04-26 Thread Jacek Laskowski
since isAvailable or findMissingPartitions (using MapOutputTrackerMaster) know it already and I think are even more up-to-date. Why is there this extra registry? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala#L60 Pozdraw

Re: BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-26 Thread Jacek Laskowski
Thanks Yi Wu! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, Apr 18, 2020 at 12:17 PM wuyi wrote: &

Why time difference while registering a new BlockManager (using BlockManagerMasterEndpoint)?

2020-06-12 Thread Jacek Laskowski
/core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala#L481 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Why is V2SessionCatalog not a CatalogExtension?

2020-08-08 Thread Jacek Laskowski
, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
uld be compiled with Scala 2.12, but that requires scala-2.12 profile [2] to be enabled) [1] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L158 [2] https://github.com/apache/spark/blob/v2.4.6/pom.xml#L2830 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of&

Re: Incorrect Scala version for Spark 2.4.x releases in the docs?

2020-09-17 Thread Jacek Laskowski
Thanks Sean for such a quick response! Let me propose a fix for the docs. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklasko

[3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [3.0.1] ExecutorMonitor.onJobStart and StageInfo.shuffleDepId that's never used?

2020-12-30 Thread Jacek Laskowski
ache/spark/blob/094563384478a402c36415edf04ee7b884a34fc9/core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala#L108 [2] https://github.com/apache/spark/blob/78df2caec8c94c31e5c9ddc30ed8acb424084181/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L179 Pozdrawiam, Jacek Lasko

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Jacek Laskowski
Hi, I'm curious why Spark 3.1.0 is already available in repo1.maven.org? https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.0/ Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/&g

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, I'm just reading this now. I'm for 3.1.1 with no 3.1.0 but the news that we're skipping that particular release. Gonna be more fun! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> F

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi, BTW, wondering aloud. Since it was agreed to skip 3.1.0 and go ahead with 3.1.1, what's gonna happen with v3.1.0 tag [1]? Is it going away and we'll see 3.1.1-rc1? [1] https://github.com/apache/spark/tree/v3.1.0-rc1 Pozdrawiam, Jacek Laskowski https://about.me/JacekLask

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-07 Thread Jacek Laskowski
Hi Sean, +1 to leave it. Makes so much more sense (as that's what really happened and the history of Apache Spark is...irreversible). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me o

[K8S] KUBERNETES_EXECUTOR_REQUEST_CORES

2021-01-12 Thread Jacek Laskowski
utorFeatureStep.scala#L72 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-20 Thread Jacek Laskowski
orPodsAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. 21/01/19 12:23:29 DEBUG ExecutorPodsAllocator: ResourceProfile Id: 0 pod allocation status: 2 running, 0 pending. 0 unacknowledged. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski &q

Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
(We could leave it as is as an "easter egg"-like thing too) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: Should 3.1.0 config props be 3.1.1 (as s.k.executor.missingPodDetectDelta)?

2021-01-23 Thread Jacek Laskowski
Hi Hyukjin, Agreed. I asked to see if I'm not missing anything. Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/

[K8S] ExecutorPodsWatchSnapshotSource with no spark-exec-inactive label in 3.1?

2021-01-23 Thread Jacek Laskowski
/ExecutorPodsPollingSnapshotSource.scala#L62 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

Re: How to contribute the code

2021-02-01 Thread Jacek Laskowski
Hi, http://spark.apache.org/contributing.html ? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Sat, J

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Jacek Laskowski
Hi, I'm "okay to add RocksDB StateStore as external module". See no reason not to. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <

Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Jacek Laskowski
Hi, Oh, yeah. Seen Tejas here and there in the commits. Well deserved. Jacek On 29 Sep 2017 9:58 pm, "Matei Zaharia" wrote: Hi all, The Spark PMC recently added Tejas Patil as a committer on the project. Tejas has been contributing across several areas of Spark for a while, focusing especiall

Re: Structured Streaming and Hive

2017-09-30 Thread Jacek Laskowski
Hi, Guessing it's a timing issue. Once you started the query the batch 0 did not have rows to save or didn't start yet (it's a separate thread) and so spark.sql ran once and saved nothing. You should rather use foreach writer to save results to Hive. Jacek On 29 Sep 2017 11:36 am, "HanPan" wro

[SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

2017-10-27 Thread Jacek Laskowski
k/sql/streaming/StreamingQueryManager.scala#L335 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

[SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-15 Thread Jacek Laskowski
--+ http://localhost:4040/SQL/execution/?id=0 shows no metrics for LocalTableScan. Is this intended? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-16 Thread Jacek Laskowski
TableScanExec. Could anyone explain it in more detail? I'd appreciate. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://tw

Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-17 Thread Jacek Laskowski
ScanExec does (and so does BroadcastExchangeExec, but that's not a data source so may have different reasons). [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L31-L32 Pozdrawiam, Jacek Laskowski https://about

Re: private methods in mllib

2017-12-01 Thread Jacek Laskowski
Hi Sahm, Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is considered @deprecated these days. That'd explain why "so many things made private". [1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L2

Deprecating UserDefinedGenerator logical operator?

2017-12-08 Thread Jacek Laskowski
m/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2092 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

BUILD FAILURE due to...not found: value AnalysisBarrier in spark-catalyst_2.11?

2017-12-08 Thread Jacek Laskowski
r(child) => child [error] ^ [error] 8 errors found [error] Compile failed at Dec 8, 2017 5:58:10 PM [8.170s] [INFO] -------- Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly

Re: BUILD FAILURE due to...not found: value AnalysisBarrier in spark-catalyst_2.11?

2017-12-09 Thread Jacek Laskowski
rc/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala?utf8=%E2%9C%93#L890 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-ap

Re: RDD[internalRow] -> DataSet

2017-12-09 Thread Jacek Laskowski
Hi Satyajit, That's exactly what Dataset.rdd does --> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.

GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-10 Thread Jacek Laskowski
e/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming http

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-11 Thread Jacek Laskowski
in whole-stage codegen it can extend CodegenSupport trait and enable accessing GenericInternalRow by turning supportCodegen flag off. I can understand how badly that can read, but without help from Spark SQL devs that's all I can figure out myself. Any help appreciated. Pozdrawiam, Jacek Lask

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-12 Thread Jacek Laskowski
t off because --> "Disable generate codegen since it fails my workload." - Wished he included the workload to showcase the issue :( Looks like there are a bunch of wise people already on it so I'll just listen... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Str

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Jacek Laskowski
t not http://spark.apache.org/docs/latest :( Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, D

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Jacek Laskowski
Hi Sean, What does "Not all the pieces are released yet" mean if you don't mind me asking? 2.2.1 has already been announced, hasn't it? [1] [1] http://spark.apache.org/news/spark-2-2-1-released.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark

Whole-stage codegen and SparkPlan.newPredicate

2017-12-30 Thread Jacek Laskowski
fun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) ... Is this a bug or does it work as intended? Why? [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386 Pozdrawiam, Jacek Laskowski ht

FileSystem.getContentSummary for total size stats in DetermineTableStats VS CommandUtils?

2018-01-02 Thread Jacek Laskowski
master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala#L66-L73 [2] https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala?utf8=%E2%9C%93#L126 Pozdrawiam, Jacek Laskowski https://about.me/JacekLask

Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Jacek Laskowski
ache.spark.sql.catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:27) // analyzed logical plan works fine scala> names.queryExecution.analyzed.stats res23: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(sizeInBytes=48.0 B, hints=none) Po

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Jacek Laskowski
Thanks Wenchen. That makes a lot of sense now (after you made the point about AnalysisBarrier that I've been seeing here and there, but haven't spent much time to explore yet, but turned out important). Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-06 Thread Jacek Laskowski
main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L895 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams htt

Remove or rename? What does ResolvedDataSourceSuite test?

2018-01-13 Thread Jacek Laskowski
/ResolvedDataSourceSuite.scala Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, -dev +user What's the query? How do you "fool spark"? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams

How to resolve UnresolvedRelations (to explore FindDataSourceTable)?

2018-01-16 Thread Jacek Laskowski
quot; [1]? [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala#L483-L488 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bi

Re: Whole-stage codegen and SparkPlan.newPredicate

2018-01-16 Thread Jacek Laskowski
Thanks for looking into it, Kazuaki! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow

DDLUtils.isDatasourceTable vs HiveExternalCatalog.isDatasourceTable

2018-01-17 Thread Jacek Laskowski
E2%9C%93#L1393 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitte

Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-01-21 Thread Jacek Laskowski
Hi, http://spark.apache.org/developer-tools.html#nightly-builds reads: > Spark nightly packages are available at: > Latest master build: https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest but the URL gives 404. Is this intended? Pozdrawiam, Jacek Laskowski

Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-25 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-26 Thread Jacek Laskowski
he table identifier is resolvable). That would help understanding that part of Spark SQL a little better (i.e. writing a unit test with logical rules and such). Should I fill an issue in JIRA for this? Any suggestions how to do it the right way? Pozdrawiam, Jacek Laskowski https://about.me/JacekL

Nondeterministic Catalyst expressions -- trait and property?!

2018-01-29 Thread Jacek Laskowski
the trait)? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

[SQL] Tests for ExtractFiltersAndInnerJoins.flattenJoin

2018-01-30 Thread Jacek Laskowski
reate the plans. I'm wondering if I should file a task in JIRA for this or just send a pull request? I'd appreciate some guidance. [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L167 Pozdrawiam,

Re: data source v2 online meetup

2018-02-02 Thread Jacek Laskowski
Hi Reynold, That in general is a very good idea to get the community engaged (even if most people would just listen / hide in the dark like myself). I know no other open source project at ASF or elsewhere that such an initiative was even tried. Kudos for the idea! Pozdrawiam, Jacek Laskowski

  1   2   3   4   >