Read HDFS file from an executor(closure)

2016-01-12 Thread Udit Mehta
Hi, Is there a way to read a text file from inside a spark executor? I need to do this for an streaming application where we need to read a file(whose contents would change) from a closure. I cannot use the "sc.textFile" method since spark context is not serializable. I also cannot read a file us

Kafka Direct Stream

2015-09-30 Thread Udit Mehta
Hi, I am using spark direct stream to consume from multiple topics in Kafka. I am able to consume fine but I am stuck at how to separate the data for each topic since I need to process data differently depending on the topic. I basically want to split the RDD consisting on N topics into N RDD's ea

Json Serde used by Spark Sql

2015-08-18 Thread Udit Mehta
Hi, I was wondering what json serde does spark sql use. I created a JsonRDD out of a json file and then registered it as a temp table to query. I can then query the table using dot notation for nested structs/arrays. I was wondering how does spark sql deserialize the json data based on the query.

Spark thrift server on yarn

2015-08-25 Thread Udit Mehta
Hi, I am trying to start a spark thrift server using the following command on Spark 1.3.1 running on yarn: * ./sbin/start-thriftserver.sh --master yarn://resourcemanager.snc1:8032 --executor-memory 512m --hiveconf hive.server2.thrift.bind.host=test-host.sn1 --hiveconf hive.server2.thrift.port=1

Re: Spark thrift server on yarn

2015-08-25 Thread Udit Mehta
, 2015 at 5:32 PM, Cheng, Hao wrote: > Did you register temp table via the beeline or in a new Spark SQL CLI? > > > > As I know, the temp table cannot cross the HiveContext. > > > > Hao > > > > *From:* Udit Mehta [mailto:ume...@groupon.com] > *Sent:* Wedne

Provide sampling ratio while loading json in spark version > 1.4.0

2015-09-23 Thread Udit Mehta
Hi, In earlier versions of spark(< 1.4.0), we were able to specify the sampling ratio while using *sqlContext.JsonFile* or *sqlContext.JsonRDD* so that we dont inspect each and every element while inferring the schema. I see that the use of these methods is deprecated in the newer spark version an

Spark sql issue

2015-02-23 Thread Udit Mehta
Hi, I am using spark sql to create/alter hive tables. I have a highly nested json and I am using the schemRDD to infer the schema. The json has 6 columns and 1 of the column (which is a struct) has around 60 fields (key value pairs). When I run the spark sql query for the above table, it just hang

Spark per app logging

2015-03-20 Thread Udit Mehta
Hi, We have spark setup such that there are various users running multiple jobs at the same time. Currently all the logs go to 1 file specified in the log4j.properties. Is it possible to configure log4j in spark for per app/user logging instead of sending all logs to 1 file mentioned in the log4j.

Re: Spark per app logging

2015-03-23 Thread Udit Mehta
latter, can each application use its own log4j.properties ? >> >> Cheers >> >> On Fri, Mar 20, 2015 at 1:43 PM, Udit Mehta wrote: >> >>> Hi, >>> >>> We have spark setup such that there are various users running multiple >>> jobs at t

Hive context datanucleus error

2015-03-23 Thread Udit Mehta
I am trying to run a simple query to view tables in my hive metastore using hive context. I am getting this error: spark Persistence process has been specified to use a *ClassLoader Resolve* of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLA

Re: Hive context datanucleus error

2015-03-23 Thread Udit Mehta
has this issue been fixed in spark 1.2: https://issues.apache.org/jira/browse/SPARK-2624 On Mon, Mar 23, 2015 at 9:19 PM, Udit Mehta wrote: > I am trying to run a simple query to view tables in my hive metastore > using hive context. > I am getting this error: > spark Persistence

Re: Does HiveContext connect to HiveServer2?

2015-03-24 Thread Udit Mehta
Another question related to this, how can we propagate the hive-site.xml to all workers when running in the yarn cluster mode? On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin wrote: > It does neither. If you provide a Hive configuration to Spark, > HiveContext will connect to your metastore ser

log4j.properties in jar

2015-03-30 Thread Udit Mehta
Hi, Is it possible to put the log4j.properties in the application jar such that the driver and the executors use this log4j file. Do I need to specify anything while submitting my app so that this file is used? Thanks, Udit

Re: why "Shuffle Write" is not zero when everything is cached and there is enough memory?

2015-03-31 Thread Udit Mehta
I have noticed a similar issue when using spark streaming. The spark shuffle write size increases to a large size(in GB) and then the app crashes saying: java.io.FileNotFoundException: /data/vol0/nodemanager/usercache/$user/appcache/application_1427480955913_0339/spark-local-20150330231234-db1a/0b/

Re: why "Shuffle Write" is not zero when everything is cached and there is enough memory?

2015-03-31 Thread Udit Mehta
itten > to disk. > > You can set spark.shuffle.spill to false if you don't want to spill to > the disk and assuming you have enough heap memory. > > On Tue, Mar 31, 2015 at 12:35 PM, Udit Mehta wrote: > > I have noticed a similar issue when using spark streaming. The spark > shuf

How to use the --files arg

2015-04-10 Thread Udit Mehta
Hi, Suppose I have a command and I pass the --files arg as below: bin/spark-submit --class com.test.HelloWorld --master yarn-cluster --num-executors 8 --driver-memory 512m --executor-memory 2048m --executor-cores 4 --queue public * --files $HOME/myfile.txt* --name test_1 ~/test_code-1.0-SNAPSHOT

Metrics Servlet on spark 1.2

2015-04-17 Thread Udit Mehta
Hi, I am unable to access the metrics servlet on spark 1.2. I tried to access it from the app master UI on port 4040 but i dont see any metrics there. Is it a known issue with spark 1.2 or am I doing something wrong? Also how do I publish my own metrics and view them on this servlet? Thanks, Udit

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Udit Mehta
I followed the steps described above and I still get this error: Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher I am trying to build spark 1.3 on hdp 2.2. I built spark from source using: build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Udit Mehta
anything wrong in your mvn command. Can you > check whether the ExecutorLauncher class is in your jar file or not? > > BTW: For spark-1.3, you can use the binary distribution from apache. > > Thanks. > > Zhan Zhang > > > > On Apr 17, 2015, at 2:01 PM, Udit

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Udit Mehta
. I am using hdp 2.2 with hadoop 2.6/ On Fri, Apr 17, 2015 at 2:21 PM, Udit Mehta wrote: > Thanks. Would that distribution work for hdp 2.2? > > On Fri, Apr 17, 2015 at 2:19 PM, Zhan Zhang > wrote: > >> You don’t need to put any yarn assembly in hdfs. The spark assembl

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Udit Mehta
-Dhdp.version=2.2.0.0-2041 Is there anything wrong in what I am trying to do? thanks again! On Fri, Apr 17, 2015 at 2:56 PM, Zhan Zhang wrote: > Hi Udit, > > By the way, do you mind to share the whole log trace? > > Thanks. > > Zhan Zhang > > On Apr 17, 2015, at

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-04-17 Thread Udit Mehta
park-env.sh > hive-site.xml log4j.properties metrics.properties > slaves.template spark-defaults.conf.template > spark-env.sh.template > *[root@c6402 conf]# more java-opts* > * -Dhdp.version=2.2.0.0-2041* > [root@c6402 conf]# > > > Thanks. > > Zhan

Spark metrics source

2015-04-20 Thread Udit Mehta
Hi, I am running spark 1.3 on yarn and am trying to publish some metrics from my app. I see that we need to use the codahale library to create a source and then specify the source in the metrics.properties. Does somebody have a sample metrics source which I can use in my app to forward the metrics