Thanks Raghavendra :)
Will look into Analyzer as well.
Kapil Malik
*Sr. Principal Engineer | Data Platform, Technology*
M: +91 8800836581 | T: 0124-433 | EXT: 20910
ASF Centre A | 1st Floor | Udyog Vihar Phase IV |
Gurgaon | Haryana | India
*Disclaimer:* This communication is for the sole
lan.
So apparently it looks like I need to extend SessionCatalog only.
However, just wanted to get a feedback on if there's a better / recommended
approach to achieve this.
Thanks and regards,
Kapil Malik
*Sr. Principal Engineer | Data Platform, Technology*
M: +91 8800836581 | T: 0124
Hi,
We have an analytics usecase where we are collecting user click logs. The
data can be considered as hierarchical with 3 type of logs -
User (attributes like userId, emailId)
- Session (attributes like sessionId, device, OS, browser, city etc.)
- - PageView (attributes like url, referrer, page-
Very interesting and relevant thread for production level usage of spark.
@Arun, can you kindly confirm if Daniel’s suggestion helped your usecase?
Thanks,
Kapil Malik | kma...@adobe.com<mailto:kma...@adobe.com> | 33430 / 8800836581
From: Daniel Mahler [mailto:dmah...@gmail.com]
Sent: 13
Replace
val sqlContext = new SQLContext(sparkContext)
with
@transient
val sqlContext = new SQLContext(sparkContext)
-Original Message-
From: kpeng1 [mailto:kpe...@gmail.com]
Sent: 04 March 2015 23:39
To: user@spark.apache.org
Subject: Passing around SparkContext with in the Driver
Hi
Hi Pengcheng YIN,
RDD cache / persist calls do not trigger evaluation.
Unpersist call is blocking (it does have an async flavor but am not sure what
are the SLAs on behavior).
val rdd = sc.textFile().map()
rdd.persist() // This does not trigger actual storage
while(true){
val count = rdd.filt
Hi Naveen,
Quoting
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
SparkContext is Main entry point for Spark functionality. A SparkContext
represents the connection to a Spark cluster, and can be used to create RDDs,
accumulators and broadcast variables o
Hi Sanjay,
Oh yes .. on flatMapValues, it's defined in PairRDDFunctions, and you need to
import org.apache.spark.rdd.SparkContext._ to use them
(http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
)
@Sean, yes indeed flatMap / flatMapValues both can b
Hi Sanjay,
I tried running your code on spark shell piece by piece –
// Setup
val line1 = “025126,Chills,8.10,Injection site oedema,8.10,Injection site
reaction,8.10,Malaise,8.10,Myalgia,8.10”
val line2 = “025127,Chills,8.10,Injection site oedema,8.10,Injection site
reaction,8.10,Malaise,8.10,M
Regarding: Can we create such an array and then parallelize it?
Parallelizing an array of RDDs -> i.e. RDD[RDD[x]] is not possible.
RDD is not serializable.
From: Deep Pradhan [mailto:pradhandeep1...@gmail.com]
Sent: 04 December 2014 15:39
To: user@spark.apache.org
Subject: Determination of numbe
/hadoop-0.20-mapreduce/lib/native/Linux-amd64-64
export
SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar
Pointing to eqv. snappy / MR directory on your box.
Thanks,
Kapil Malik
From: Naveen Kumar Pokala [mailto:npok...@spcapitaliq.com]
Sent: 12 November
Hi,
How is 78g distributed in driver, daemon, executor ?
Can you please paste the logs regarding " that I don't have enough memory to
hold the data in memory"
Are you collecting any data in driver ?
Lastly, did you try doing a re-partition to create smaller and evenly
distributed partitions?
Ohh !
I thought you're unsubscribing :)
Kapil Malik | kma...@adobe.com | 33430 / 8800836581
-Original Message-
From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: 12 March 2014 00:51
To: user@spark.apache.org
Subject: Re: unsubscribe
To unsubscribe from this list, p
13 matches
Mail list logo