ght?
Thanks,
-Mike
From: Nicholas Hakobian [mailto:nicholas.hakob...@rallyhealth.com]
Sent: Friday, December 30, 2016 5:50 PM
To: Sesterhenn, Mike
Cc: ayan guha; user@spark.apache.org
Subject: Re: Best way to process lookup ETL with Dataframes
Yep, sequential joins is what I have done in the p
row because bad data
will result.
Any other thoughts?
From: Nicholas Hakobian
Sent: Friday, December 30, 2016 2:12:40 PM
To: Sesterhenn, Mike
Cc: ayan guha; user@spark.apache.org
Subject: Re: Best way to process lookup ETL with Dataframes
It looks like Sp
hat I need is to join after
the first join fails.
From: ayan guha
Sent: Thursday, December 29, 2016 11:06 PM
To: Sesterhenn, Mike
Cc: user@spark.apache.org
Subject: Re: Best way to process lookup ETL with Dataframes
How about this -
select a.*, nvl(b.col,nvl(
Hi all,
I'm writing an ETL process with Spark 1.5, and I was wondering the best way to
do something.
A lot of the fields I am processing require an algorithm similar to this:
Join input dataframe to a lookup table.
if (that lookup fails (the joined fields are null)) {
Lookup into some
It only exists in the latest docs, not in versions <= 1.6.
From: Sean Owen
Sent: Tuesday, October 4, 2016 1:51:49 PM
To: Sesterhenn, Mike; user@spark.apache.org
Subject: Re: Time-unit of RDD.countApprox timeout parameter
The API docs already say: "maxi
Nevermind. Through testing it seems it is MILLISECONDS. This should be added
to the docs.
From: Sesterhenn, Mike
Sent: Tuesday, October 4, 2016 1:02:25 PM
To: user@spark.apache.org
Subject: Time-unit of RDD.countApprox timeout parameter
Hi all,
Does anyone
Hi all,
Does anyone know what the unit is on the 'timeout' parameter to the
RDD.countApprox() function?
(ie. is that seconds, milliseconds, nanoseconds, ...?)
I was searching through the source but it got hairy pretty quickly.
Thanks