from:"Ali Tajeldin EDU"

Re: DataFrame First method is resulting different results in each iteration

2016-02-04 Thread Ali Tajeldin EDU

Hi Satish, Take a look at the smvTopNRecs() function in the SMV package. It does exactly what you are looking for. It might be overkill to bring in all of SMV for just one function but you will also get a lot more than just DF helper functions (modular views, higher level graphs, dynamic loa

Re: Dynamic sql in Spark 1.5

2016-02-04 Thread Ali Tajeldin EDU

the help. > > > Thanks, > Divya > > > > > > On 3 February 2016 at 11:42, Ali Tajeldin EDU wrote: > While you can construct the SQL string dynamically in scala/java/python, it > would be best to use the Dataframe API for creating dynamic SQL queries. See &

Re: Dynamic sql in Spark 1.5

2016-02-02 Thread Ali Tajeldin EDU

While you can construct the SQL string dynamically in scala/java/python, it would be best to use the Dataframe API for creating dynamic SQL queries. See http://spark.apache.org/docs/1.5.2/sql-programming-guide.html for details. On Feb 2, 2016, at 6:49 PM, Divya Gehlot wrote: > Hi, > Does Spar

Re: Workflow manager for Spark and Spark SQL

2015-12-10 Thread Ali Tajeldin EDU

Hi Alexander, We developed SMV to address the exact issue you mentioned. While it is not a workflow engine per-se, It does allow for the creation of modules with dependency and automates the execution of these modules. See https://github.com/TresAmigosSD/SMV/blob/master/docs/user/smv_intro.md

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-12-07 Thread Ali Tajeldin EDU

Checkout the Sameer Farooqui video on youtube for spark internals (https://www.youtube.com/watch?v=7ooZ4S7Ay6Y&list=PLIxzgeMkSrQ-2Uizm4l0HjNSSy2NxgqjX) Starting at 2:15:00, he describes YARN mode. btw, highly recommend the entire video. Very detailed and concise. -- Ali On Dec 7, 2015, at 8:3

Re: Local mode: Stages hang for minutes

2015-12-03 Thread Ali Tajeldin EDU

You can try to run "jstack" a couple of times while the app is hung to look for patterns for where the app is hung. -- Ali On Dec 3, 2015, at 8:27 AM, Richard Marscher wrote: > I should add that the pauses are not from GC and also in tracing the CPU call > tree in the JVM it seems like nothin

Re: ClassNotFoundException with a uber jar.

2015-11-26 Thread Ali Tajeldin EDU

I'm not %100 sure, but I don't think a jar within a jar will work without a custom class loader. You can perhaps try to use "maven-assembly-plugin" or "maven-shade-plugin" to build your uber/fat jar. Both of these will build a flattened single jar. -- Ali On Nov 26, 2015, at 2:49 AM, Marc de

Re: How to run two operations on the same RDD simultaneously

2015-11-20 Thread Ali Tajeldin EDU

You can try to use an Accumulator (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator) to keep count in map1. Note that the final count may be higher than the number of records if there were some retries along the way. -- Ali On Nov 20, 2015, at 3:38 PM, jlua

Re: Is it possible Running SparkR on 2 nodes without HDFS

2015-11-10 Thread Ali Tajeldin EDU

make sure "/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv" is accessible on your slave node. -- Ali On Nov 9, 2015, at 6:06 PM, Sanjay Subramanian wrote: > hey guys > > I have a 2 node SparkR (1 master 1 slave)cluster on AWS using > spark-1.5.1-bin-without-hadoop.

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ali Tajeldin EDU

You can take a look at the smvPivot function in the SMV library ( https://github.com/TresAmigosSD/SMV ). Should look for method "smvPivot" in SmvDFHelper ( http://tresamigossd.github.io/SMV/scaladocs/index.html#org.tresamigos.smv.SmvDFHelper). You can also perform the pivot on a group-by-group

Re: Request for submitting Spark jobs in code purely, without jar

2015-10-22 Thread Ali Tajeldin EDU

The Spark job-server project may help (https://github.com/spark-jobserver/spark-jobserver). -- Ali On Oct 21, 2015, at 11:43 PM, ?? wrote: > Hi developers, I've encountered some problem with Spark, and before opening > an issue, I'd like to hear your thoughts. > > Currently, if you want t

Re: dataframe average error: Float does not take parameters

2015-10-21 Thread Ali Tajeldin EDU

Which version of Spark are you using? I just tried the example below on 1.5.1 and it seems to work as expected: scala> val res = df.groupBy("key").count.agg(min("count"), avg("count")) res: org.apache.spark.sql.DataFrame = [min(count): bigint, avg(count): double] scala> res.show +--+---

Re: How to distinguish columns when joining DataFrames with shared parent?

2015-10-21 Thread Ali Tajeldin EDU

Furthermore, even adding aliasing as suggested by the warning doesn't seem to help either. Slight modification to example below: > scala> val largeValues = df.filter('value >= 10).as("lv") And just looking at the join results: > scala> val j = smallValues > .join(largeValues, smallValues("key

Re: Incremental load of RDD from HDFS?

2015-10-20 Thread Ali Tajeldin EDU

I could be misreading the code, but looking at the code for toLocalIterator (copied below), it should lazily call runJob on each partition in your input. It shouldn't be parsing the entire RDD before returning from the first "next" call. If it is taking a long time on the first "next" call, it

Re: Problem of RDD in calculation

2015-10-16 Thread Ali Tajeldin EDU

Since DF2 only has the userID, I'm assuming you are musing DF2 to filter for desired userIDs. You can just use the join() and groupBy operations on DataFrame to do what you desire. For example: scala> val df1=app.createDF("id:String; v:Integer", "X,1;X,2;Y,3;Y,4;Z,10") df1: org.apache.spark.sql

Re: DataFrame First method is resulting different results in each iteration

Re: Dynamic sql in Spark 1.5

Re: Dynamic sql in Spark 1.5

Re: Workflow manager for Spark and Spark SQL

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

Re: Local mode: Stages hang for minutes

Re: ClassNotFoundException with a uber jar.

Re: How to run two operations on the same RDD simultaneously

Re: Is it possible Running SparkR on 2 nodes without HDFS

Re: Pivot Data in Spark and Scala

Re: Request for submitting Spark jobs in code purely, without jar

Re: dataframe average error: Float does not take parameters

Re: How to distinguish columns when joining DataFrames with shared parent?

Re: Incremental load of RDD from HDFS?

Re: Problem of RDD in calculation

15 matches

Site Navigation

Mail list logo

Footer information