HeartSaVioR closed pull request #50721: [SPARK-51922] [SS] Fix
UTFDataFormatException thrown from StateStoreChangelogReaderFactory for v1
URL: https://github.com/apache/spark/pull/50721
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
anishshri-db commented on PR #50719:
URL: https://github.com/apache/spark/pull/50719#issuecomment-2831892176
cc - @HeartSaVioR - PTAL, thanks !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HeartSaVioR closed pull request #50700: [SPARK-51904][SS] Removing async
metadata purging for StateSchemaV3 and ignoring non-batch files when listing
OperatorMetadata files
URL: https://github.com/apache/spark/pull/50700
--
This is an automated message from the Apache Git Service.
To respond
nija-at commented on PR #50604:
URL: https://github.com/apache/spark/pull/50604#issuecomment-2830397277
@cloud-fan thanks. I've fixed these now. Waiting for CI to confirm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
mihailoale-db commented on PR #50699:
URL: https://github.com/apache/spark/pull/50699#issuecomment-2830576716
IIUC we changed the API/approach for functions that explicitly add
`Sort`/`Aggregate`, but there are other functions/rules that do that implicitly
(e.g `randomSplit`)? @cloud-fan ar
mihailoale-db commented on code in PR #50590:
URL: https://github.com/apache/spark/pull/50590#discussion_r2060432450
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala:
##
@@ -314,6 +315,30 @@ case class GetArrayItem(
})
m8719-github closed pull request #50715: [SPARK-51918][CORE] Executor exit wait
for out/err appenders to stop + flush remaining data
URL: https://github.com/apache/spark/pull/50715
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060457816
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060457816
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
mihailoale-db commented on code in PR #50590:
URL: https://github.com/apache/spark/pull/50590#discussion_r2060463599
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala:
##
@@ -314,6 +315,30 @@ case class GetArrayItem(
})
HeartSaVioR commented on PR #50704:
URL: https://github.com/apache/spark/pull/50704#issuecomment-2830351699
@hvanhovell @HyukjinKwon @jingz-db Please take a look, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
cloud-fan commented on code in PR #50590:
URL: https://github.com/apache/spark/pull/50590#discussion_r2060321814
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala:
##
@@ -314,6 +315,30 @@ case class GetArrayItem(
})
}
joke1196 opened a new pull request, #50714:
URL: https://github.com/apache/spark/pull/50714
### What changes were proposed in this pull request?
This PR aligns the behavior of `DataFrame.dropDuplicates([])` to be the same
as `DataFrame.dropDuplicates()`.
### Why are
m8719-github opened a new pull request, #50715:
URL: https://github.com/apache/spark/pull/50715
### What changes were proposed in this pull request?
Fix executor exit routine to wait for stdout/stderr appenders to stop and
flush remaining data.
### Why are the chang
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060409987
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060409987
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060420903
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060475882
##
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala:
##
@@ -115,6 +116,45 @@ class UninterruptibleThreadSuite extends SparkFunSuite {
ass
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060471211
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -92,11 +110,17 @@ private[spark] class UninterruptibleThread(
* interrupted until it
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060471211
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -92,11 +110,17 @@ private[spark] class UninterruptibleThread(
* interrupted until it
Gschiavon commented on PR #50536:
URL: https://github.com/apache/spark/pull/50536#issuecomment-2830770986
@HyukjinKwon any thoughts on this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2060475882
##
core/src/test/scala/org/apache/spark/util/UninterruptibleThreadSuite.scala:
##
@@ -115,6 +116,45 @@ class UninterruptibleThreadSuite extends SparkFunSuite {
ass
wengh opened a new pull request, #50716:
URL: https://github.com/apache/spark/pull/50716
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was t
Pajaraja opened a new pull request, #50717:
URL: https://github.com/apache/spark/pull/50717
### What changes were proposed in this pull request?
Modify ColumnPruning optimizer rule to successfully prune UnionLoops.
For this, ColumnPruning had to migrated to inside the abstract Opti
mridulm commented on PR #50594:
URL: https://github.com/apache/spark/pull/50594#issuecomment-2829589932
Ok, I see the problem.
The expectation is for begin and end to be within try/finally - but usually
coding pattern would result in the try being used to catch InterruptedException
and h
dongjoon-hyun closed pull request #88: [SPARK-51911] Support `lateralJoin` in
`DataFrame`
URL: https://github.com/apache/spark-connect-swift/pull/88
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
mridulm commented on PR #50594:
URL: https://github.com/apache/spark/pull/50594#issuecomment-2829599847
Actually, I think my formulation above will still have deadlock - though for
a more involved reason, sigh.
--
This is an automated message from the Apache Git Service.
To respond to the
dongjoon-hyun commented on PR #88:
URL:
https://github.com/apache/spark-connect-swift/pull/88#issuecomment-2829594978
Let me merge this as a foundation of the future work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
zhengruifeng opened a new pull request, #50708:
URL: https://github.com/apache/spark/pull/50708
### What changes were proposed in this pull request?
`toJSON` and `rdd` throws `PySparkNotImplementedError` in connect model, but
`PySparkAttributeError` in connect-only model
zhengruifeng opened a new pull request, #50709:
URL: https://github.com/apache/spark/pull/50709
### What changes were proposed in this pull request?
Add 4 missing functions to API references
### Why are the changes needed?
for docs
### Does this PR introduc
mridulm commented on code in PR #50594:
URL: https://github.com/apache/spark/pull/50594#discussion_r2059722561
##
core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala:
##
@@ -69,10 +75,22 @@ private[spark] class UninterruptibleThread(
}
uninterruptib
zhengruifeng opened a new pull request, #50710:
URL: https://github.com/apache/spark/pull/50710
### What changes were proposed in this pull request?
Enable SparkConnectDataFrameDebug in connect-only mode
### Why are the changes needed?
to improve test coverage
dongjoon-hyun opened a new pull request, #90:
URL: https://github.com/apache/spark-connect-swift/pull/90
…
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
dongjoon-hyun commented on PR #90:
URL:
https://github.com/apache/spark-connect-swift/pull/90#issuecomment-2829694817
Thank you, @viirya ! Merged to main.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
dongjoon-hyun closed pull request #90: [SPARK-51916] Add
`create_(scala|table)_function` and `drop_function` test scripts
URL: https://github.com/apache/spark-connect-swift/pull/90
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
heyihong commented on code in PR #50696:
URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984
##
sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala:
##
@@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends
catalog.Catalog with
heyihong commented on code in PR #50696:
URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984
##
sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala:
##
@@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends
catalog.Catalog with
heyihong commented on code in PR #50696:
URL: https://github.com/apache/spark/pull/50696#discussion_r2059778984
##
sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala:
##
@@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends
catalog.Catalog with
mridulm commented on PR #50594:
URL: https://github.com/apache/spark/pull/50594#issuecomment-2829532707
> @mridulm Do you refer to
"[SPARK-51821](https://issues.apache.org/jira/browse/SPARK-51821)
uninterruptibleLock deadlock" test? No, it will not fail. Why do you think it
would fail?
cloud-fan closed pull request #50590: [SPARK-51805] [SQL] Get function with
improper argument should throw proper exception instead of an internal one
URL: https://github.com/apache/spark/pull/50590
--
This is an automated message from the Apache Git Service.
To respond to the message, please
cloud-fan commented on PR #50590:
URL: https://github.com/apache/spark/pull/50590#issuecomment-2830878038
thanks, merging to master/4.0!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
liviazhu-db closed pull request #50391: [SPARK-51596][SS] Fix concurrent
StateStoreProvider maintenance and closing
URL: https://github.com/apache/spark/pull/50391
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL a
liviazhu-db commented on PR #50391:
URL: https://github.com/apache/spark/pull/50391#issuecomment-2831035233
Replicated by https://github.com/apache/spark/pull/50595
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
anishshri-db commented on code in PR #50595:
URL: https://github.com/apache/spark/pull/50595#discussion_r2060769106
##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala:
##
@@ -1095,6 +1131,14 @@ object StateStore extends Logging {
}
bogao007 opened a new pull request, #50719:
URL: https://github.com/apache/spark/pull/50719
### What changes were proposed in this pull request?
Use long type (int64 for protobuf) for TTL duration in millisecond
### Why are the changes needed?
Allow users to set l
bogao007 commented on code in PR #50719:
URL: https://github.com/apache/spark/pull/50719#discussion_r2060945740
##
python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py:
##
@@ -1159,7 +1159,8 @@ class
PandasMapStateLargeTTLProcessor(PandasMapStateProcesso
AveryQi115 opened a new pull request, #50720:
URL: https://github.com/apache/spark/pull/50720
# I use this pr to test the handling altogether and modify/add testcases, do
not merge
### What changes were proposed in this pull request?
### Why are the changes needed?
anishshri-db commented on code in PR #50719:
URL: https://github.com/apache/spark/pull/50719#discussion_r2060942288
##
python/pyspark/sql/streaming/proto/StateMessage_pb2.py:
##
@@ -40,7 +40,7 @@
DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(
-
b'\n;org/apac
liviazhu-db opened a new pull request, #50721:
URL: https://github.com/apache/spark/pull/50721
### What changes were proposed in this pull request?
Catch the UTFDataFormatException thrown for v1 in the
StateStoreChangelogReaderFactory and assign the version to 1.
##
anishshri-db commented on code in PR #50719:
URL: https://github.com/apache/spark/pull/50719#discussion_r2060942687
##
python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py:
##
@@ -1159,7 +1159,8 @@ class
PandasMapStateLargeTTLProcessor(PandasMapStateProc
bogao007 commented on code in PR #50719:
URL: https://github.com/apache/spark/pull/50719#discussion_r2060945937
##
python/pyspark/sql/streaming/proto/StateMessage_pb2.py:
##
@@ -40,7 +40,7 @@
DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(
-
b'\n;org/apache/s
AveryQi115 closed pull request #49660: [WIP][SPARK-50983][SQL] Support Nested
Correlated Subqueries for Analyzer
URL: https://github.com/apache/spark/pull/49660
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
wengh opened a new pull request, #50722:
URL: https://github.com/apache/spark/pull/50722
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was t
bogao007 commented on code in PR #50719:
URL: https://github.com/apache/spark/pull/50719#discussion_r2060950968
##
python/pyspark/sql/tests/pandas/helper/helper_pandas_transform_with_state.py:
##
@@ -1159,7 +1159,8 @@ class
PandasMapStateLargeTTLProcessor(PandasMapStateProcesso
wengh commented on PR #50531:
URL: https://github.com/apache/spark/pull/50531#issuecomment-2831521899
> Thanks for the fix! But this is a breaking change. Can we document this in
the migration guide?
@allisonwang-db https://github.com/apache/spark/pull/50722
--
This is an automated
zhengruifeng closed pull request #50709: [MINOR][PYTHON][DOCS] Add 4 missing
functions to API references
URL: https://github.com/apache/spark/pull/50709
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
klu2300030052 opened a new pull request, #50712:
URL: https://github.com/apache/spark/pull/50712
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### H
zhengruifeng commented on PR #50709:
URL: https://github.com/apache/spark/pull/50709#issuecomment-2829976924
merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm
zhengruifeng commented on PR #50710:
URL: https://github.com/apache/spark/pull/50710#issuecomment-2829980550
merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
zhengruifeng closed pull request #50710: [SPARK-51915][PYTHON][CONNECT][TESTS]
Enable SparkConnectDataFrameDebug in connect-only mode
URL: https://github.com/apache/spark/pull/50710
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
HyukjinKwon commented on code in PR #50708:
URL: https://github.com/apache/spark/pull/50708#discussion_r2059904698
##
python/pyspark/sql/connect/dataframe.py:
##
@@ -2190,20 +2190,18 @@ def localCheckpoint(
assert isinstance(checkpointed._plan, plan.CachedRemoteRelation
zhengruifeng commented on code in PR #50708:
URL: https://github.com/apache/spark/pull/50708#discussion_r2059968399
##
python/pyspark/sql/connect/dataframe.py:
##
@@ -2190,20 +2190,18 @@ def localCheckpoint(
assert isinstance(checkpointed._plan, plan.CachedRemoteRelatio
dongjoon-hyun opened a new pull request, #91:
URL: https://github.com/apache/spark-connect-swift/pull/91
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
klu2300030052 opened a new pull request, #50713:
URL: https://github.com/apache/spark/pull/50713
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### H
yaooqinn opened a new pull request, #50711:
URL: https://github.com/apache/spark/pull/50711
### What changes were proposed in this pull request?
This PR adds `com.mysql.cj` to `spark.sql.hive.metastore.sharedPrefixes`
### Why are the changes needed?
Following upst
HyukjinKwon commented on code in PR #50696:
URL: https://github.com/apache/spark/pull/50696#discussion_r2059902276
##
sql/core/src/main/scala/org/apache/spark/sql/classic/Catalog.scala:
##
@@ -184,25 +185,19 @@ class Catalog(sparkSession: SparkSession) extends
catalog.Catalog w
wengh commented on code in PR #50716:
URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095
##
python/docs/source/user_guide/sql/python_data_source.rst:
##
@@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic
Data Source using Ar
Usa
wengh commented on PR #50716:
URL: https://github.com/apache/spark/pull/50716#issuecomment-2830832282
@allisonwang-db @HyukjinKwon please take a look
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
wengh commented on code in PR #50716:
URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095
##
python/docs/source/user_guide/sql/python_data_source.rst:
##
@@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic
Data Source using Ar
Usa
wengh commented on code in PR #50716:
URL: https://github.com/apache/spark/pull/50716#discussion_r2060518095
##
python/docs/source/user_guide/sql/python_data_source.rst:
##
@@ -520,4 +520,6 @@ The following example demonstrates how to implement a basic
Data Source using Ar
Usa
anishshri-db commented on code in PR #50595:
URL: https://github.com/apache/spark/pull/50595#discussion_r2060583685
##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala:
##
@@ -1009,14 +1013,46 @@ object StateStore extends Logging {
jingz-db opened a new pull request, #50718:
URL: https://github.com/apache/spark/pull/50718
### What changes were proposed in this pull request?
Fix a bug in TransformWithStateInPandas in Python.
### Why are the changes needed?
Currently, all user provided state
MaxGekk closed pull request #50697: [SPARK-51900][SQL] Properly throw datatype
mismatch in single-pass Analyzer
URL: https://github.com/apache/spark/pull/50697
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
MaxGekk commented on PR #50697:
URL: https://github.com/apache/spark/pull/50697#issuecomment-2831140760
+1, LGTM. Merging to master.
Thank you, @vladimirg-db.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
allisonwang-db commented on PR #50531:
URL: https://github.com/apache/spark/pull/50531#issuecomment-2831044013
Thanks for the fix! But this is a breaking change. Can we document this in
the migration guide?
--
This is an automated message from the Apache Git Service.
To respond to the me
ericm-db commented on code in PR #50595:
URL: https://github.com/apache/spark/pull/50595#discussion_r2060768228
##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala:
##
@@ -1095,6 +1131,14 @@ object StateStore extends Logging {
}
}
HeartSaVioR commented on PR #50700:
URL: https://github.com/apache/spark/pull/50700#issuecomment-2831641630
Thanks! Merging to master/4.0 (since this makes trouble with schema
evolution).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
anishshri-db commented on code in PR #50721:
URL: https://github.com/apache/spark/pull/50721#discussion_r2060954104
##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala:
##
@@ -368,7 +368,10 @@ class StateStoreChangelogReaderFactory
liviazhu-db commented on code in PR #50721:
URL: https://github.com/apache/spark/pull/50721#discussion_r2060955056
##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala:
##
@@ -368,7 +368,10 @@ class StateStoreChangelogReaderFactory(
github-actions[bot] closed pull request #48407: [SPARK-49919][SQL] Add special
limits support for return content as JSON dataset
URL: https://github.com/apache/spark/pull/48407
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
github-actions[bot] commented on PR #49467:
URL: https://github.com/apache/spark/pull/49467#issuecomment-2831655569
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #49464:
URL: https://github.com/apache/spark/pull/49464#issuecomment-2831655578
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] closed pull request #49318: [SPARK-48665][PYTHON][CONNECT]
Support providing a dict in pyspark lit to create a map.
URL: https://github.com/apache/spark/pull/49318
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
github-actions[bot] closed pull request #49470: [SPARK-50802][SQL] Remove
ApplyCharTypePadding rule
URL: https://github.com/apache/spark/pull/49470
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
HeartSaVioR commented on PR #50721:
URL: https://github.com/apache/spark/pull/50721#issuecomment-2831874969
Thanks! Merging to master/4.0 (The fix is straightforward and low risk.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
85 matches
Mail list logo