Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Jagat Singh
Do you have winutils in your system relevant for your system. This SO post has infomation related https://stackoverflow.com/questions/34196302/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions On 21 October 2017 at 03:16, Marco Mistroni wrote: > Did u build spark or

Re: Spark Job trigger in production

2016-07-18 Thread Jagat Singh
You can use following options * spark-submit from shell * some kind of job server. See spark-jobserver for details * some notebook environment See Zeppelin for example On 18 July 2016 at 17:13, manish jaiswal wrote: > Hi, > > > What is the best approach to trigger spark job in production cl

Re: Broadcast hash join implementation in Spark

2016-07-08 Thread Jagat Singh
Hi, Please see the property spark.sql.autoBroadcastJoinThreshold here http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options Thanks, Jagat Singh On Sat, Jul 9, 2016 at 9:50 AM, Lalitha MV wrote: > Hi, > > 1. What implementation is used for the

Re: spark 1.6.0 connect to hive metastore

2016-02-09 Thread Jagat Singh
Hi, I am using by telling Spark about hive version we are using. This is done by setting following properties spark.sql.hive.version spark.sql.hive.metastore.jars Thanks On Wed, Feb 10, 2016 at 7:39 AM, Koert Kuipers wrote: > hey thanks. hive-site is on classpath in conf directory > > i curr

Stop Spark yarn-client job

2015-11-26 Thread Jagat Singh
Hi, What is the correct way to stop fully the Spark job which is running as yarn-client using spark-submit. We are using sc.stop in the code and can see the job still running (in yarn resource manager) after final hive insert is complete. The code flow is start context do somework insert to hiv

Re: Spark and Spring Integrations

2015-11-15 Thread Jagat Singh
Not direct answer to your question. But It might be useful for you to check Spring XD Spark integration. https://github.com/spring-projects/spring-xd-samples/tree/master/spark-streaming-wordcount-java-processor On Mon, Nov 16, 2015 at 6:14 AM, Muthu Jayakumar wrote: > I have only written Akk

Re: Spark thrift service and Hive impersonation.

2015-10-05 Thread Jagat Singh
Hello Steve, Thanks for confirmation. Is there any work planned work on this. Thanks, Jagat Singh On Wed, Sep 30, 2015 at 9:37 PM, Vinay Shukla wrote: > Steve is right, > The Spark thing server does not profs page end user identity downstream > yet. > > > > On We

Re: HDFS small file generation problem

2015-10-03 Thread Jagat Singh
Hello Nicolas, Hive solution is just to concatenate the files , it does not alter or change records. On 3 Oct 2015 6:42 pm, wrote: > Hello, > Finally Hive is not a solution as I cannot update the data. > And for archive file I think it would be the same issue. > Any other solutions ? > > Nicolas

Re: Spark thrift service and Hive impersonation.

2015-09-29 Thread Jagat Singh
trying to read as spark user , using which we started thrift server. Since spark user does not have actual read access we get the error. However the beeline is used by end user not spark user and throws error. Thanks, Jagat Singh On Wed, Sep 30, 2015 at 11:24 AM, Mohammed Guller wrote: > D

Spark thrift service and Hive impersonation.

2015-09-29 Thread Jagat Singh
Hi, I have started the Spark thrift service using spark user. Does each user needs to start own thrift server to use it? Using beeline i am able to connect to server and execute show tables; However when we try to execute some real query it runs as spark user and HDFS permissions does not allow

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-12 Thread Jagat Singh
Sorry to answer your question fully. The job starts tasks and few of them fail and some are successful. The failed one have that PermGen error in logs. But ultimately full job is marked fail and session quits. On Sun, Sep 13, 2015 at 10:48 AM, Jagat Singh wrote: > Hi Davies, > >

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-12 Thread Jagat Singh
queries? > > Is this in local mode or cluster mode? > > On Fri, Sep 11, 2015 at 3:00 AM, Jagat Singh wrote: > > Hi, > > > > We have queries which were running fine on 1.4.1 system. > > > > We are testing upgrade and even simple query like > > > >

Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Jagat Singh
Hi, We have queries which were running fine on 1.4.1 system. We are testing upgrade and even simple query like val t1= sqlContext.sql("select count(*) from table") t1.show This works perfectly fine on 1.4.1 but throws OOM error in 1.5.0 Are there any changes in default memory settings from 1.

Re: insert Hive table with RDD

2015-03-03 Thread Jagat Singh
Will this recognize the hive partitions as well. Example insert into specific partition of hive ? On Tue, Mar 3, 2015 at 11:42 PM, Cheng, Hao wrote: > Using the SchemaRDD / DataFrame API via HiveContext > > Assume you're using the latest code, something probably like: > > val hc = new HiveCont

Spark based ETL pipelines

2015-02-11 Thread Jagat Singh
Hi, I want to work on some use case something like below. Just want to know if something similar has been already done which can be reused. Idea is to use Spark for ETL / Data Science / Streaming pipeline. So when data comes inside the cluster front door we will do following steps 1) Upload

Re: Why RDD is not cached?

2014-10-28 Thread Jagat Singh
What setting you are using for persist() or cache() http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence On Tue, Oct 28, 2014 at 6:18 PM, shahab wrote: > Hi, > > I have a standalone spark , where the executor is set to have 6.3 G memory > , as I am using two workers so in

Re: Multi master Spark

2014-04-09 Thread Jagat Singh
pass in a single one. For example, you might start your SparkContext pointing to spark://host1:port1,host2:port2. This would cause your SparkContext to try registering with both Masters - if host1 goes down, this configuration would still be correct as we'd find the new leader, host2. Thanks,

Re: Problem with running LogisticRegression in spark cluster mode

2014-04-09 Thread Jagat Singh
Hi Jenny, How are you packaging your jar. Can you please confirm if you have included the Mlib jar inside the fat jar you have created for your code. libraryDependencies += "org.apache.spark" % "spark-mllib_2.9.3" % "0.8.1-incubating" Thanks, Jagat Singh O