Unless I am reading this wrong, this can be achieved with aws sync ?
aws s3 sync
s3://my-bucket/ingestion/source1/y=2019/m=12/d=12
s3://my-bucket/ingestion/processed/
*src_category=other*/y=2019/m=12/d=12
Thanks,
-Shraddha
On Thu, Jan 9, 2020 at 7:05 AM Gourav Sengupta
wrote:
> why s3a?
>
>
After digging in a bit more, it looks like maxrecordsperfile does not
provide full parallelism as expected. Any thoughts on this would be really
helpful.
On Sat, Nov 23, 2019 at 11:36 PM Rishi Shah
wrote:
> Hi All,
>
> Version 2.2 introduced maxrecordsperfile option while writing data, could
> s
Any suggestions?
On Wed, May 22, 2019 at 6:32 AM Rishi Shah wrote:
> Hi All,
>
> If dataframe is repartitioned in memory by (date, id) columns and then if
> I use multiple window functions which uses partition by clause with (date,
> id) columns --> we can avoid shuffle/sort again I believe.. Ca
Also the same thing for groupby agg operation, how can we use one
aggregated result (say min(amount)) to derive another aggregated column?
On Sun, Apr 21, 2019 at 11:24 PM Rishi Shah
wrote:
> Hello All,
>
> How can we use a derived column1 for deriving another column in the same
> dataframe oper