Hi,
I'm using Spark 1.5.0, I wrote a custom Hadoop InputFormat to load data
from 3rdparty datasource, the data type mapping has been taken care of in
my code, but when I issued below query,
SELECT * FROM ( SELECT count(*) as failures from test WHERE state !=
'success' ) as tmp WHERE ( COALESCE(f
Hi,
I'm using Spark 1.5.0, I wrote a custom Hadoop InputFormat to load data
from 3rdparty datasource, the data type mapping has been taken care of in
my code, but when I issued below query,
SELECT * FROM ( SELECT count(*) as failures from test WHERE state !=
'success' ) as tmp WHERE ( COALESCE(f
Hi,
Here's the problem I'm facing, I have a standalone java application which
is periodically submit Spark jobs to my yarn cluster, btw I'm not using
'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs
are successful and I can see them on Yarn RM webUI, but when I want to
f
Sorry, I have to re-send it again as I did not get the answer.
Here's the problem I'm facing, I have a standalone java application which
is periodically submit Spark jobs to my yarn cluster, btw I'm not using
'spark-submit' or 'org.apache.spark.launcher' to submit my jobs. These jobs
are successfu
Sorry, I have to re-send it again as I did not get the answer.
Here's the problem I'm facing, I'm using Spark 1.5.0 release, I have a
standalone java application which is periodically submit Spark jobs to my
yarn cluster, btw I'm not using 'spark-submit' or
'org.apache.spark.launcher' to submit my
Hi,
Here's my situation, I have some kind of offline dataset, but I want to
form a virtual data stream feeding to Spark Streaming, my code looks like
this
// sort offline data by time
1) JavaRDD sortedByTime = offlineDataRDD.sortBy( );
// compute a list of JavaRDD, each element JavaRDD
Hi,
Here's my situation, I have some kind of offline dataset and got them
loaded them into Spark as RDD, but I want to form a virtual data stream
feeding to Spark Streaming, my code looks like this
// sort offline data by time, the dataset spans 2 hours
1) JavaRDD sortedByTime = offlineData
Hi,
I just want to understand the cost of DataFrame.registerTempTable(String),
is it just a trivial operation(like creating a object reference) in
master(Driver) JVM? And Can I have multiple tables with different name
referencing to the same DataFrame?
Thanks
--
--Anfernee
Hi,
I have a pretty large data set(2M entities) in my RDD, the data has already
been partitioned by a specific key, the key has a range(type in long), now
I want to create a bunch of key buckets, for example, the key has range
1 -> 100,
I will break the whole range into below buckets
1
Thanks Yong for your response.
Let me see if I can understand what you're suggesting, so the whole data
set, when I load them into Spark(I'm using custom Hadoop InputFormat), I
will add an extra field to each element in RDD, like bucket_id.
For example
Key:
1 - 10, bucket_id=1
11-20, bucket_id=
Hi Spark experts,
First of all, happy Thanksgiving!
The comes to my question, I have implemented custom Hadoop InputFormat to
load millions of entities from my data source to Spark(as JavaRDD and
transform to DataFrame). The approach I took in implementing the custom
Hadoop RDD is loading all ID'
Hi,
I have a doubt regarding yarn-cluster mode and spark.driver.
allowMultipleContexts for below usercases.
I have a long running backend server where I will create a short-lived
Spark job in response to each user request, base on the fact that by
default multiple Spark Context cannot be created
ultipleContexts", "true")
> .set("spark.driver.allowMultipleContexts", "true"))
> ./core/src/test/scala/org/apache/spark/SparkContextSuite.scala
>
> FYI
>
> On Tue, Dec 1, 2015 at 3:32 PM, Anfernee Xu wrote:
>
>> Hi,
>>
>> I have
PM, Anfernee Xu wrote:
> > I have a long running backend server where I will create a short-lived
> Spark
> > job in response to each user request, base on the fact that by default
> > multiple Spark Context cannot be created in the same JVM, looks like I
> have
> > 2 choic
If multiple users are looking at the same data set, then it's good choice
to share the SparkContext.
But my usercases are different, users are looking at different data(I use
custom Hadoop InputFormat to load data from my data source based on the
user input), the data might not have any overlap. F
Hi Spark experts,
I'm coming across these terminologies and having some confusions, could you
please help me understand them better?
For instance I have implemented a Hadoop InputFormat to load my external
data in Spark, in turn my custom InputFormat will create a bunch of
InputSplit's, my questi
16 matches
Mail list logo