mail
| From | Gourav Sengupta |
| Date | 12/25/2021 03:46 |
| To | Sean Owen |
| Cc | Andrew Davidson、Nicholas
Gustafson、User |
| Subject | Re: OOM Joining thousands of dataframes Was: AnalysisException:
Trouble using select() to append multiple columns |
Hi,
may be I am getting confused as
Hi,
may be I am getting confused as always :) , but the requirement looked
pretty simple to me to be implemented in SQL, or it is just the euphoria of
Christmas eve
Anyways, in case the above can be implemented in SQL, then I can have a
look at it.
Yes, indeed there are bespoke scenarios where
This is simply not generally true, no, and not in this case. The
programmatic and SQL APIs overlap a lot, and where they do, they're
essentially aliases. Use whatever is more natural.
What I wouldn't recommend doing is emulating SQL-like behavior in custom
code, UDFs, etc. The native operators will
union them all together. Each “part” will still
> need to iterate 16000 times
>
>
>
> In general I assume we want to avoid for loops. I assume Spark is unable
> to optimize them. It would be nice if spark provide some sort of join all
> function even if it used a for loop to hide this
holidays
Andy
From: Sean Owen
Date: Friday, December 24, 2021 at 8:30 AM
To: Gourav Sengupta
Cc: Andrew Davidson , Nicholas Gustafson
, User
Subject: Re: AnalysisException: Trouble using select() to append multiple
columns
(that's not the situation below we are commenting on)
On Fri
t;>> Thanks Nicholas
>>>>
>>>>
>>>>
>>>> Andy
>>>>
>>>>
>>>>
>>>> *From: *Nicholas Gustafson
>>>> *Date: *Friday, December 17, 2021 at 6:12 PM
>>>> *To: *Andrew Davidson
>>>
Davidson
>> wrote:
>>
>>> Thanks Nicholas
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>> *From: *Nicholas Gustafson
>>> *Date: *Friday, December 17, 2021 at 6:12 PM
>>> *To: *Andrew Davidson
>&g
ustafson
>> *Date: *Friday, December 17, 2021 at 6:12 PM
>> *To: *Andrew Davidson
>> *Cc: *"user@spark.apache.org"
>> *Subject: *Re: AnalysisException: Trouble using select() to append
>> multiple columns
>>
>>
>>
>> Since df1 and df2 are dif
las
>
>
>
> Andy
>
>
>
> *From: *Nicholas Gustafson
> *Date: *Friday, December 17, 2021 at 6:12 PM
> *To: *Andrew Davidson
> *Cc: *"user@spark.apache.org"
> *Subject: *Re: AnalysisException: Trouble using select() to append
> multiple columns
>
Thanks Nicholas
Andy
From: Nicholas Gustafson
Date: Friday, December 17, 2021 at 6:12 PM
To: Andrew Davidson
Cc: "user@spark.apache.org"
Subject: Re: AnalysisException: Trouble using select() to append multiple
columns
Since df1 and df2 are different DataFrames, you will need to
Since df1 and df2 are different DataFrames, you will need to use a join. For
example: df1.join(df2.selectExpr(“Name”, “NumReads as ctrl_2”), on=[“Name”])
> On Dec 17, 2021, at 16:25, Andrew Davidson wrote:
>
>
> Hi I am a newbie
>
> I have 16,000 data files, all files have the same number o
Hi I am a newbie
I have 16,000 data files, all files have the same number of rows and columns.
The row ids are identical and are in the same order. I want to create a new
data frame that contains the 3rd column from each data file
I wrote a test program that uses a for loop and Join. It works w
12 matches
Mail list logo