Hey all
I have an interesting problem in hand. We have cases where we want to pass
multiple(20 to 30) data frames to cogroup.applyInPandas function.
RDD currently supports cogroup with upto 4 dataframes (ZippedPartitionsRDD4)
where as cogroup with pandas can handle only 2 dataframes (with
Zip
Created a PR: https://github.com/apache/spark/pull/39902
<https://github.com/apache/spark/pull/39902>
> On 24 Jan 2023, at 15:04, Santosh Pingale wrote:
>
> Hey all
>
> I have an interesting problem in hand. We have cases where we want to pass
> multiple(
at this.
>
> On Mon, Feb 6, 2023 at 5:29 AM Santosh Pingale
> wrote:
> Created a PR: https://github.com/apache/spark/pull/39902
> <https://github.com/apache/spark/pull/39902>
>
>
>> On 24 Jan 2023, at 15:04, Santosh Pingale > <mailto:santosh.ping...@ad
Very interesting and user focused discussion, thanks for the proposal.
Would it be better if we rather let users set the preference about the
language they want to see first in the code examples? This preference can
be easily stored on the browser side and used to decide ordering. This is
inline w
default (before users take an action).
>
>
> On Fri, 24 Feb 2023 at 15:35, Santosh Pingale
> wrote:
>
>> Very interesting and user focused discussion, thanks for the proposal.
>>
>> Would it be better if we rather let users set the preference about the
>&g
Hey all
It would useful to support volcano in spark distro itself just like yunikorn.
So I am wondering what is the reason behind this decision of not packaging it
already.
https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-volcano-as-customized-scheduler-for-spark-on-kubernet
> In any way, I'd like to say that the root cause of the difference is
those scheduler designs instead of Apache Spark itself. For example, Apache
YuniKorn doesn't force us to add a new dependency at all while Volcano did.
This makes sense!
> In these day, I prefer and invest more Apache YuniKorn
Hey community
Spark UI with the plan visualisation is an excellent resource for finding
out crucial information about how your application is doing and what parts
of the execution can still be optimized to fulfill time/resource
constraints.
The graph in its current form is sufficient for simpler
Makes sense given the nature of those commits.
On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote:
> Hi, All.
>
> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103
> commits including important security and correctness patches like
> SPARK-44251, SPARK-44805, and SPARK-44940.
Hey there
Spark 3.5 branch has accumulated 199 commits with quite a few bug
fixes related to correctness. Are there any plans for releasing 3.5.1?
Kind regards
Santosh
+1
On Sun, Feb 4, 2024, 8:18 PM Xiao Li wrote:
> +1
>
> On Sun, Feb 4, 2024 at 6:07 AM beliefer wrote:
>
>> +1
>>
>>
>>
>> 在 2024-02-04 15:26:13,"Dongjoon Hyun" 写道:
>>
>> +1
>>
>> On Sat, Feb 3, 2024 at 9:18 PM yangjie01
>> wrote:
>>
>>> +1
>>>
>>> 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apac
+1
On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote:
> FYI, there is a proposal to drop Python 3.8 because its EOL is October
> 2024.
>
> https://github.com/apache/spark/pull/46228
> [SPARK-47993][PYTHON] Drop Python 3.8
>
> Since it's still alive and there will be an overlap between the lifecy
Thanks Wenchen for starting this!
How do we define "the user" for spark?
1. End users: There are some users that use spark as a service from a
provider
2. Providers/Operators: There are some users that provide spark as a
service for their internal(on-prem setup with yarn/k8s)/external(Something
li
One nit: Community is voting for release of version 3.5.2 and there won't
be any preview for this version, RC is sufficient.
On Sun, Aug 11, 2024, 2:02 AM Matei Zaharia wrote:
> Hi all,
>
> It’s time to submit our board report soon on August 14th. Here’s a draft,
> feel free to leave any comment
> Maybe we should do it at runtime: if Spark retries a shuffle stage but
the data becomes different (e.g. use checksum to check it), then Spark
should retry all the partitions of this stage.
Having bitten hard recently by this behavior in spark and after having gone
down the rabbit hole to investi
.
>>>>> The patch modifies the source code by using inline code interception
>>>>> and tweaks to reproduce the issue of data corruption using a unit test.
>>>>>
>>>>> And I am also attaching the BugTest which I used.
>>>>>
>
16 matches
Mail list logo