ransformation and thus is not
> actually applied until some action (like 'foreach') is called on the
> resulting RDD.
> You can find more information in the Spark Programming Guide
> http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations.
>
> best,
> --
I'm currently developing a Spark Streaming application.
I have a function that receives an RDD and an object instance as a
parameter, and returns an RDD:
def doTheThing(a: RDD[A], b: B): RDD[C]
Within the function, I do some processing within a map of the RDD.
Like this:
def doTheThing(a: RD
names.
>
> On Mon, Oct 5, 2015 at 12:59 AM, Hemminger Jeff wrote:
>
>> I have a rather odd use case. I have a DataFrame column name with a +
>> value in it.
>> The app performs some processing steps before determining the column
>> name, and it
>> would be
The spark-ec2 script generates spark config files from templates. Those are
located here:
https://github.com/amplab/spark-ec2/tree/branch-1.5/templates/root/spark/conf
Note the link is referring to the 1.5 branch.
Is this what you are looking for?
Jeff
On Mon, Oct 5, 2015 at 8:56 AM, Renato Perini
I have a rather odd use case. I have a DataFrame column name with a + value
in it.
The app performs some processing steps before determining the column name,
and it
would be much easier to code if I could use the DataFrame filter operations
with a String.
This demonstrates the issue I am having:
I am trying to understand the process of caching and specifically what the
behavior is when the cache is full. Please excuse me if this question is a
little vague, I am trying to build my understanding of this process.
I have an RDD that I perform several computations with, I persist it with
IN_ME
;>>
>>> On Fri, Aug 28, 2015 at 12:44 PM, Jason wrote:
>>>
>>>> You could try using an external key value store (like HBase, Redis) and
>>>> perform lookups/updates inside of your mappers (you'd need to create the
>>>> connection wit
Hi,
I am working on a Spark application that is using of a large (~3G)
broadcast variable as a lookup table. The application refines the data in
this lookup table in an iterative manner. So this large variable is
broadcast many times during the lifetime of the application process.
>From what I ha