This sounds like a perfect example of using windowing functions. Have you
tried something like the following:
select ACCT_ID, CR_RVKD_STAT_CD, ACCT_SFX_NUM, SCURT_FRD_STAT_CD,
CLSD_REAS_CD from (select *, max(instnc_id) *over ()* as max_inst_id FROM
Stat_hist) where instnc_id=max_inst_id
However,
There is no way to solve this within spark.
One option you could do is break up your application into multiple application.
First application can filter and write the filtered results into a kafka queue.
Second application can read from queue and sum. Third application can read from
queue and d
I have a table, and I want to find the latest records in the table. The table
has a column called instnc_id that is incremented everyday. So, I want to find
the records that have the max instnc_id.
I am trying to do this using subqueries, but it gives me an error. For example,
when I try this
you may have to recreate your cluster with below configuration at emr
creation
"Configurations": [
{
"Properties": {
"maximizeResourceAllocation": "false"
},
"Classification": "spark"
}
]
On Fri
On 28 Dec 2017, at 19:25, Patrick Alwell wrote:
> Dynamic allocation is great; but sometimes I’ve found explicitly setting the
> num executors, cores per executor, and memory per executor to be a better
> alternative.
No difference with spark.dynamicAllocation.enabled set to false.
JM
--
Hello,
Just a quick update as I did not made much progress yet.
On 28 Dec 2017, at 21:09, Gourav Sengupta wrote:
> can you try to then use the EMR version 5.10 instead or EMR version 5.11
> instead?
Same issue with EMR 5.11.0. Task 0 in one stage never finishes.
> can you please try selectin
Hi,
Do we have an option to write a csv or text file with a custom record/line
separator through spark ?
I could not find any ref on the api. I have a issue while loading data into
a warehouse as one of the column on csv have a new line character and the
warehouse is not letting to escape that ne