Unable To access Hive From Spark

2016-04-15 Thread Amit Singh Hora
Hi All, I am trying to access hive from Spark but getting exception The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- Code :- String logFile = "hdfs://hdp23ha/logs"; // Should be some file on

Spark MLib LDA Example

2016-04-14 Thread Amit Singh Hora
Hi All, I am very new to Spark-MLib .I am trying to understand and implement Spark Mlib's LDA algorithm Goal is to get Topic present documents given and terms with in those topics . I followed below link https://gist.github.com/jkbradley/ab8ae22a8282b2c8ce33

RE: Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Singh Hora
This property already exists. -Original Message- From: "ashesh_28 [via Apache Spark User List]" Sent: ‎4/‎13/‎2016 11:02 AM To: "Amit Singh Hora" Subject: Re: Unable to Access files in Hadoop HA enabled from using Spark Try adding the following propert

Unable to Access files in Hadoop HA enabled from using Spark

2016-04-12 Thread Amit Singh Hora
I am trying to access directory in Hadoop from my Spark code on local machine.Hadoop is HA enabled . val conf = new SparkConf().setAppName("LDA Sample").setMaster("local[2]") val sc=new SparkContext(conf) val distFile = sc.textFile("hdfs://hdpha/mini_newsgroups/") println(distFile.count()) but ge

SPARKONHBase checkpointing issue

2015-10-27 Thread Amit Singh Hora
Hi all , I am using Cloudera's SparkObHbase to bulk insert in hbase ,Please find below code object test { def main(args: Array[String]): Unit = { val conf = ConfigFactory.load("connection.conf").getConfig("connection") val checkpointDirectory=conf.getString("spark.chec

Unable to use saveAsSequenceFile

2015-10-24 Thread Amit Singh Hora
Hi All, I am trying to wrote an RDD as Sequence file into my Hadoop cluster but getting connection time out again and again ,I can ping the hadoop cluster and also directory gets created with the file name i specify ,I believe I am missing some configuration ,Kindly help me object WriteSequenceF

Spark opening to many connection with zookeeper

2015-10-20 Thread Amit Singh Hora
Hi All , My spark job started reporting zookeeper errors after seeing the zkdumps from Hbase master i realized that there are N number of connection being made from the nodes where worker of spark are running i believe some how the connections are not getting closed that is leading to error ple

HBase Spark Streaming giving error after restore

2015-10-16 Thread Amit Singh Hora
Hi All, I am using below code to stream data from kafka to hbase ,everything works fine until i restart the job so that it can restore the state from checkpoint directory ,but while trying to restore the state it give me below error ge 0.0 (TID 0, localhost): java.lang.ClassCastException: scala.r

HBase Spark Streaming giving error after restore

2015-10-16 Thread Amit Singh Hora
Hi All, I am using below code to stream data from kafka to hbase ,everything works fine until i restart the job so that it can restore the state from checkpoint directory ,but while trying to restore the state it give me below error ge 0.0 (TID 0, localhost): java.lang.ClassCastException: scala.r

Spark retrying task indefinietly

2015-10-11 Thread Amit Singh Hora
I am running spark locally to understand how countByValueAndWindow works val Array(brokers, topics) = Array("192.XX.X.XX:9092", "test1") // Create context with 2 second batch interval val sparkConf = new SparkConf().setAppName("ReduceByWindowExample").setMaster("local[1,1]

Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Amit Singh Hora
Hi All, I have downloaded pre built Spark 1.1.1 for Hadoop 2.3.0 then i did mvn install for the jar spark-assembly-1.1.1-hadoop2.3.0.jar available in lib folder of the spark downloaded and added its dependency as following in my java program org.apache.spark spark-core_2.10

Re: Spark Hbase job taking long time

2014-08-12 Thread Amit Singh Hora
i=0>> wrote: > >> Can you try specifying some value (100, e.g.) for >> "hbase.mapreduce.scan.cachedrows" in your conf ? >> >> bq. table contains 10lakh rows >> >> How many rows are there in the table ? >> >> nit: Example uses classOf[TableInputFormat] inste

Spark Hbase job taking long time

2014-08-06 Thread Amit Singh Hora
Hi All, I am trying to run a SQL query on HBase using spark job ,till now i am able to get the desierd results but as the data set size increases Spark job is taking a long time I believe i am doing something wrong,as after going through documentation and videos discussing on spark performance