Hi guys
This is possibly going to sound like a vague, stupid question but I have a
problem to solve and I need help. So any which way I go is only up :-)
I have a bunch of R scripts (I am not a R expert) and we are currently
evaluating how to translate these R scripts to SparkR data frame syntax
slave
will report back if this works !
thanks
sanjay From: shenLiu
To: Sanjay Subramanian ; User
Sent: Monday, November 9, 2015 10:23 PM
Subject: RE: Is it possible Running SparkR on 2 nodes without HDFS
#yiv4791623997 #yiv4791623997 --.yiv4791623997hmmessage
P{margin:0px;padding
hey guys
I have a 2 node SparkR (1 master 1 slave)cluster on AWS using
spark-1.5.1-bin-without-hadoop.tgz
Running the SparkR job on the master node
/opt/spark-1.5.1-bin-hadoop2.6/bin/sparkR --master
spark://ip-xx-ppp-vv-ddd:7077 --packages com.databricks:spark-csv_2.10:1.2.0
--executor-cores
know if this was Hive on Tez.
- Steve
From: Sanjay Subramanian
Reply-To: Sanjay Subramanian
Date: Thursday, June 18, 2015 at 11:08
To: "user@spark.apache.org"
Subject: Spark-sql versus Impala versus Hive
I just published results of my findings
herehttps://bigdatalatte.wordpress.com/2
I just published results of my findings
herehttps://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/
aers; create table
unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter
from aers.aers_demo_view " --driver-memory 4G --total-executor-cores 12
--executor-memory 4G
thanks
From: Sanjay Subramanian
To: "user@spark.apache.org"
Sent: Thursday, J
hey guys
I have CDH 5.3.3 with Spark 1.2.0 (on Yarn)
This does not work /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql
--deploy-mode client --master yarn --driver-memory 1g -e "select j.person_id,
p.first_name, p.last_name, count(*) from (select person_id from
cdr.cdr_mjp_joborder where pers
hey guys
After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is
not supported by Databricks cloud.My speed bottleneck is to transfer ~1TB of
snapshot HDFS data (250+ external hive tables) to S3 :-(
I want to use databricks cloud but this to me is a starting disabler.The ha
ing Spark 1.4.0 with SQL code generation turned on; this should make a
> huge difference.
>
>> On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian
>> wrote:
>> hey guys
>>
>> I tried the following settings as well. No luck
>>
>> --total-executor
:-) to my questions on all CDH groups, Spark, Hive
best regards
sanjay
From: Josh Rosen
To: Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Friday, June 12, 2015 7:15 AM
Subject: Re: spark-sql from CLI --->EXCEPTION: java.lang.OutOfMemoryError:
Java heap space
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node
| spark | 1.2.0+cdh5.3
Cant figure out spark-sql errors - switching to Hive and Impala for now - sorry
guys, no hard feelings
From: Sanjay Subramanian
To: Sanjay Subramanian ; user
Sent: Saturday, May 30, 2015 1:52 PM
Subject: Re: spark-sql errors
any ideas guys ? how to solve this ?
From
any ideas guys ? how to solve this ?
From: Sanjay Subramanian
To: user
Sent: Friday, May 29, 2015 5:29 PM
Subject: spark-sql errors
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/6SqGuYemnbc
I use spark on EC2 but it's a CDH 5.3.3 distribution (starving developer
version) installed thru Cloudera Manager. Spark is configured to run on Yarn.
Regards
Sanjay
Sent from my iPhone
> On May 29, 2015, at 6:16 PM, roni wrote:
>
> Hi ,
> Any update on this?
> I am not sure if the issue I
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/6SqGuYemnbc
t;SQL File" mode
- /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -f get_names.hql
From: Andrew Otto
To: Sanjay Subramanian
Cc: user
Sent: Thursday, May 28, 2015 7:26 AM
Subject: Re: Pointing SparkSQL to existing Hive Metadata with data file
locations in HDFS
hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x ,
there are about 300+ hive tables.The data is stored an text (moving slowly to
Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be
able to define JOINS etc using a programming structure
Thanks Sean. that works and I started the join of this mappedRDD to another one
I have.I have to internalize the use of Map versus FlatMap. Thinking Map Reduce
Java Hadoop code often blinds me :-)
From: Sean Owen
To: Sanjay Subramanian
Cc: Cheng Lian ; Jorge Lopez-Malla
; "
hey guys
I am not following why this happens
DATASET===Tab separated values (164 columns)
Spark command 1val mjpJobOrderRDD =
sc.textFile("/data/cdr/cdr_mjp_joborder_raw")val mjpJobOrderColsPairedRDD =
mjpJobOrderRDD.map(line => { val tokens =
line.split("\t");(tokens(23),to
cool let me adapt that. thanks a tonregardssanjay
From: Sean Owen
To: Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Monday, January 5, 2015 3:19 AM
Subject: Re: FlatMapValues
For the record, the solution I was suggesting was about like this:
inputRDD.flatM
val sconf = new
SparkConf().setMaster("local").setAppName("MedicalSideFx-CassandraLogsMessageTypeCount")
val sc = new SparkContext(sconf)val inputDir = "/path/to/cassandralogs.txt"
sc.textFile(inputDir).map(line => line.replace("\"", "")).map(line =>
(line.split(' ')(0) + " " + line.split(' ')(2
s hopefully answered :-)
(2,List(1001,1000,1002,1003, 1004,1001,1006,1007))(3,List(1011,1012,1013,1010,
1007,1009,1005,1008))(1,List(1001,1000,1002,1003, 1011,1012,1013,1010,
1004,1001,1006,1007, 1007,1009,1005,1008))
From: Shixiong Zhu
To: Sanjay Subramanian
Cc: dcmovva ; "user@
6,1007))
(3,CompactBuffer(1011,1012,1013,1010, 1007,1009,1005,1008))
(1,CompactBuffer(1001,1000,1002,1003, 1011,1012,1013,1010, 1004,1001,1006,1007,
1007,1009,1005,1008))
From: Sanjay Subramanian
To: dcmovva ; "user@spark.apache.org"
Sent: Saturday, January 3, 2015 12:19 PM
Subject: Re:
This is my design. Now let me try and code it in Spark.
rdd1.txt =1~4,5,6,72~4,53~6,7
rdd2.txt
4~1001,1000,1002,10035~1004,1001,1006,10076~1007,1009,1005,10087~1011,1012,1013,1010
TRANSFORM 1===map each value to key (like an inverted
index)4~15~16~17~15~24~26~37~3
TRANSFOR
@lailaBased on the error u mentioned in the nabble link below, it seems like
there are no permissions to write to HDFS. So this is possibly why
saveAsTextFile is failing.
From: Pankaj Narang
To: user@spark.apache.org
Sent: Saturday, January 3, 2015 4:07 AM
Subject: Re: saveAsTextFile
else {
("")
}
}).flatMap(str => str.split('\t')).filter(line =>
line.toString.length() > 0).saveAsTextFile("/data/vaers/msfx/reac/" + outFile)
From: Sanjay Subramanian
To: Hitesh Khamesra
Cc:
thanks let me try that out
From: Hitesh Khamesra
To: Sanjay Subramanian
Cc: Kapil Malik ; Sean Owen ;
"user@spark.apache.org"
Sent: Thursday, January 1, 2015 9:46 AM
Subject: Re: FlatMapValues
How about this..apply flatmap on per line. And in that function, parse each
,Injection site oedema025005,Injection site reaction
thanks
sanjay
From: Kapil Malik
To: Sean Owen ; Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Wednesday, December 31, 2014 9:35 AM
Subject: RE: FlatMapValues
Hi Sanjay,
Oh yes .. on flatMapValues, it
else {
("","")
}
}).filter(pair => pair._1.length() > 0).flatMapValues(skus =>
skus.split('\t')).saveAsTextFile("/data/vaers/msfx/reac/" + outFile) Please
note that this too saves lines like (025126,Chills),i.e. with opening and
closing bracke
hey guys
My dataset is like this
025126,Chills,8.10,Injection site oedema,8.10,Injection site
reaction,8.10,Malaise,8.10,Myalgia,8.10
Intended output is ==025126,Chills
025126,Injection site oedema
025126,Injection site reaction
025126,Malaise
025126,Myalgia
My code is as follo
at I do at www.medicalsidefx.orgPrimarily an iPhone app but
underlying is Lucene, Hadoop and hopefully soon in 2015 - Spark :-)
From: Sean Owen
To: Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Wednesday, December 24, 2014 8:56 AM
Subject: Re: How to identify er
lter.map(line => {
if (line.split('$').length >= 13){
line.split('$')(0) + "~" + line.split('$')(5) + "~" + line.split('$')(11) +
"~" + line.split('$')(12)
}
})
From: Sanjay Subramanian
To: "use
hey guys
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line =>
!line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") ||
!line.contains("primaryid$caseid$caseversion"))
var demoRddFilterMap = demoRddFilter.map(line => line.split('$')(0) + "~" +
line.s
Thanks a ton Ashishsanjay
From: Ashish Rangole
To: Sanjay Subramanian
Cc: Krishna Sankar ; Sean Owen ;
Guillermo Ortiz ; user
Sent: Sunday, November 23, 2014 11:03 AM
Subject: Re: Spark or MR, Scala or Java?
This being a very broad topic, a discussion can quickly get subjective
I am a newbie as well to Spark. Been Hadoop/Hive/Oozie programming extensively
before this. I use Hadoop(Java MR code)/Hive/Impala/Presto on a daily basis.
To get me jumpstarted into Spark I started this gitHub where there is
"IntelliJ-ready-To-run" code (simple examples of jon, sparksql etc) and
quot;,")(0),
x.split(",")(1))).reduceByKey((v1,v2) => v1+"|"+v2)
file1Rdd.collect().foreach(println)
file2Rdd.collect().foreach(println)
file1Rdd.join(file2Rdd).collect().foreach( e =>
println(e.toString.replace("(","").replace(")","
Thanks Jeyregardssanjay
From: Jey Kottalam
To: Sanjay Subramanian
Cc: Arun Ahuja ; Andrew Ash ; user
Sent: Friday, November 21, 2014 10:07 PM
Subject: Extracting values from a Collecion
Hi Sanjay,
These are instances of the standard Scala collection type "Set"
(4,(ringo,Set(With a Little Help From My Friends, Octopus's
Garden)))(2,(john,Set(Julia, Nowhere Man)))(3,(george,Set(While My Guitar
Gently Weeps, Norwegian Wood)))(1,(paul,Set(Yesterday, Michelle)))
Again the question is how do I extract values from the Set ?
thanks
sanjay From
hey guys
names.txt= 1,paul2,john3,george4,ringo
songs.txt= 1,Yesterday2,Julia3,While My Guitar Gently Weeps4,With a
Little Help From My Friends1,Michelle2,Nowhere Man3,Norwegian Wood4,Octopus's
Garden
What I want to do is real simple
Desired Output ==(4,(With a Litt
s" to quickly
test , experiment and debug code.
From: Jay Vyas
To: Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Thursday, November 20, 2014 4:53 PM
Subject: Re: Code works in Spark-Shell but Fails inside IntelliJ
This seems pretty standard: your IntelliJ classp
Subramanian
Cc: "user@spark.apache.org"
Sent: Thursday, November 20, 2014 4:49 PM
Subject: Re: Code works in Spark-Shell but Fails inside IntelliJ
Looks like intelij might be trying to load the wrong version of spark?
On Thu, Nov 20, 2014 at 4:35 PM, Sanjay Subramanian
wrote:
hey guys
I am at AmpCamp 2014 at UCB right now :-)
Funny Issue...
This code works in Spark-Shell but throws a funny exception in IntelliJ
CODE
val sqlContext = new
org.apache.spark.sql.SQLContext(sc)sqlContext.setConf("spark.sql.parquet.binaryAsString",
"true")val wikiData =
sqlContext.parq
hey guys
Anyone using CDH Spark StandaloneI installed Spark standalone thru Cloudera
Manager
$ spark-shell --total-executor-cores 8
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/bin/../lib/spark/bin/spark-shell:
line 44:
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/bin/utils.sh:
adClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 12 more
I am gonna keep working to solve this. Meanwhile if u can provide some guidance
that would be cool
sanjay From: Daniel Siegmann
To: Ashish Jain
Cc: Sanjay Subramanian ; "user@spark.apache.o
cool thanks will set this up and report back how things wentregardssanjay
From: Daniel Siegmann
To: Ashish Jain
Cc: Sanjay Subramanian ; "user@spark.apache.org"
Sent: Thursday, October 2, 2014 6:52 AM
Subject: Re: Spark inside Eclipse
You don't need to do anything
: Error was:
Failure(java.net.BindException: Address already in use)14/10/01 17:34:38 INFO
SparkUI: Started SparkUI at http://hadoop02:4041
sanjay
From: Matei Zaharia
To: Sanjay Subramanian
Cc: "user@spark.apache.org"
Sent: Wednesday, October 1, 2014 5:19 PM
Subject: Re: Mult
hey guys
Is there a way to run Spark in local mode from within Eclipse.I am running
Eclipse Kepler on a Macbook Pro with MavericksLike one can run hadoop
map/reduce applications from within Eclipse and debug and learn.
thanks
sanjay
hey guys
I am using spark 1.0.0+cdh5.1.0+41
When two users try to run "spark-shell" , the first guy's spark-shell shows
active in the 18080 Web UI but the second user shows WAITING and the shell
has a bunch of errors but does go the spark-shell and "sc.master" seems to
point to the correct master
48 matches
Mail list logo