Collecting large dataset

2019-09-05 Thread Rishikesh Gawade
data without failure? Thanks, Rishikesh

How to combine all rows into a single row in DataFrame

2019-08-19 Thread Rishikesh Gawade
operation on the dataframe, however, i am unaware of how to go about it. Any suggestions/approach would be much appreciated. Thanks, Rishikesh

Re: Hive external table not working in sparkSQL when subdirectories are present

2019-08-07 Thread Rishikesh Gawade
Hi, I did not explicitly create a Hive Context. I have been using the spark.sqlContext that gets created upon launching the spark-shell. Isn't this sqlContext same as the hiveContext? Thanks, Rishikesh On Wed, Aug 7, 2019 at 12:43 PM Jörn Franke wrote: > Do you use the HiveContext in S

Re: Hive external table not working in sparkSQL when subdirectories are present

2019-08-06 Thread Rishikesh Gawade
Hi. I am using Spark 2.3.2 and Hive 3.1.0. Even if i use parquet files the result would be same, because after all sparkSQL isn't able to descend into the subdirectories over which the table is created. Could there be any other way? Thanks, Rishikesh On Tue, Aug 6, 2019, 1:03 PM Mich Taleb

Hive external table not working in sparkSQL when subdirectories are present

2019-08-05 Thread Rishikesh Gawade
o be set on the spark side so that this works as it does via hive cli? I am using Spark on YARN. Thanks, Rishikesh Tags: subdirectories, subdirectory, recursive, recursion, hive external table, orc, sparksql, yarn

Re: Connecting to Spark cluster remotely

2019-04-22 Thread Rishikesh Gawade
To put it simply, what are the configurations that need to be done on the client machine so that it can run driver on itself and executors on spark-yarn cluster nodes? On Mon, Apr 22, 2019, 8:22 PM Rishikesh Gawade wrote: > Hi. > I have been experiencing trouble while trying to connec

Connecting to Spark cluster remotely

2019-04-22 Thread Rishikesh Gawade
d practice? Thanks & Regards, Rishikesh

How to use same SparkSession in another app?

2019-04-16 Thread Rishikesh Gawade
Hi. I wish to use a SparkSession created by one app in another app so that i can use the dataframes belonging to that session. Is it possible to use the same sparkSession in another app? Thanks, Rishikesh

Error: NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT while running a Spark-Hive Job

2018-04-16 Thread Rishikesh Gawade
it out and suggest me the required changes. Also, if it's the case that i might have misconfigured spark and hive, please suggest me the changes in configuration, a link guiding through all necessary configs would also be appreciated. Thank you in anticipation. Regards, Rishikesh Gawade

ERROR: Hive on Spark

2018-04-15 Thread Rishikesh Gawade
thing is wrong then please suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A link to a webpage having relevant info would also be appreciated. Thank you in anticipation. Regards, Rishikesh Gawade

Accessing Hive Database (On Hadoop) using Spark

2018-04-15 Thread Rishikesh Gawade
t org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) I request you to please check this and if anything is wrong then please suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A link to a webpage having relevant info would also be appreciated. Thank you in anticipation. Regards, Rishikesh Gawade

Executor unable to pick postgres driver in Spark standalone cluster

2017-04-03 Thread Rishikesh Teke
Hi all, I was submitting the play application to spark 2.1 standalone cluster . In play application postgres dependency is also added and application works on local spark libraries. But at run time on standalone cluster it gives me error : o.a.s.s.TaskSetManager - Lost task 0.0 in stage 0.0 (TI

Re: About transformations

2016-12-12 Thread Rishikesh Teke
Hi, Spark is very efficeint in SQL because of https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html you can see all the metrics of your all the transforma

Re: took more time to get data from spark dataset to driver program

2016-11-14 Thread Rishikesh Teke
Again if you run spark cluster in standalone mode with optimum number of executors with balanced cores and memory configuration, it will run faster as more parallel operations took place. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/took-more-time-to-ge

Re: Newbie question - Best way to bootstrap with Spark

2016-11-14 Thread Rishikesh Teke
Integrate spark with apache zeppelin https://zeppelin.apache.org/ its again a very handy way to bootstrap with spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Newbie-question-Best-way-to-bootstrap-with-Spark-tp28032p2

Error while Partitioning

2015-07-19 Thread rishikesh
Hi I am executing a simple flow as shown below *data = sc.wholeTextFiles(...) tokens = data.flatMap(<>) counts = tokens.map(lambda token: (token,1)) counters = counts.reduceByKey(lambda a,b: a+b) counters.sortBy(lambda x:x[1],False).saveAsTextFile(...) * There are some problems that I am facing

Re: Random Forest Error

2015-07-15 Thread rishikesh
Thanks, that fixed the problem. Cheers Rishi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Random-Forest-Error-tp23847p23850.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Random Forest Error

2015-07-15 Thread rishikesh
Hi I am trying to train a Random Forest over my dataset. I have a binary classification problem. When I call the train method as below model = RandomForest.trainClassifier(data, numClasses=2, categoricalFeaturesInfo={},numTrees=3, featureSubsetStrategy="auto", impurity='gini maxDepth=4, maxBins=3

RE: Feature Generation On Spark

2015-07-04 Thread rishikesh thakur
...@gmail.com CC: rishikeshtha...@hotmail.com; user@spark.apache.org Do you have one document per file or multiple document in the file? On 4 Jul 2015 23:38, "Michal Čizmazia" wrote: Spark Context has a method wholeTextFiles. Is that what you need? On 4 July 2015 at 07:04, rishikesh wrote: &g

RE: Feature Generation On Spark

2015-07-04 Thread rishikesh thakur
09:37:52 -0400 > Subject: Re: Feature Generation On Spark > From: mici...@gmail.com > To: rishikeshtha...@hotmail.com > CC: user@spark.apache.org > > Spark Context has a method wholeTextFiles. Is that what you need? > > On 4 July 2015 at 07:04, rishikesh wrote: > &g

Feature Generation On Spark

2015-07-04 Thread rishikesh
Hi I am new to Spark and am working on document classification. Before model fitting I need to do feature generation. Each document is to be converted to a feature vector. However I am not sure how to do that. While testing locally I have a static list of tokens and when I parse a file I do a look