Hi Satish,
Take a look at the smvTopNRecs() function in the SMV package. It does
exactly what you are looking for. It might be overkill to bring in all of SMV
for just one function but you will also get a lot more than just DF helper
functions (modular views, higher level graphs, dynamic loa
the help.
>
>
> Thanks,
> Divya
>
>
>
>
>
> On 3 February 2016 at 11:42, Ali Tajeldin EDU wrote:
> While you can construct the SQL string dynamically in scala/java/python, it
> would be best to use the Dataframe API for creating dynamic SQL queries. See
&
While you can construct the SQL string dynamically in scala/java/python, it
would be best to use the Dataframe API for creating dynamic SQL queries. See
http://spark.apache.org/docs/1.5.2/sql-programming-guide.html for details.
On Feb 2, 2016, at 6:49 PM, Divya Gehlot wrote:
> Hi,
> Does Spar
Hi Alexander,
We developed SMV to address the exact issue you mentioned. While it is not a
workflow engine per-se, It does allow for the creation of modules with
dependency and automates the execution of these modules. See
https://github.com/TresAmigosSD/SMV/blob/master/docs/user/smv_intro.md
Checkout the Sameer Farooqui video on youtube for spark internals
(https://www.youtube.com/watch?v=7ooZ4S7Ay6Y&list=PLIxzgeMkSrQ-2Uizm4l0HjNSSy2NxgqjX)
Starting at 2:15:00, he describes YARN mode.
btw, highly recommend the entire video. Very detailed and concise.
--
Ali
On Dec 7, 2015, at 8:3
You can try to run "jstack" a couple of times while the app is hung to look for
patterns for where the app is hung.
--
Ali
On Dec 3, 2015, at 8:27 AM, Richard Marscher wrote:
> I should add that the pauses are not from GC and also in tracing the CPU call
> tree in the JVM it seems like nothin
I'm not %100 sure, but I don't think a jar within a jar will work without a
custom class loader. You can perhaps try to use "maven-assembly-plugin" or
"maven-shade-plugin" to build your uber/fat jar. Both of these will build a
flattened single jar.
--
Ali
On Nov 26, 2015, at 2:49 AM, Marc de
You can try to use an Accumulator
(http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator)
to keep count in map1. Note that the final count may be higher than the
number of records if there were some retries along the way.
--
Ali
On Nov 20, 2015, at 3:38 PM, jlua
make sure
"/mnt/local/1024gbxvdf1/all_adleads_cleaned_commas_in_quotes_good_file.csv" is
accessible on your slave node.
--
Ali
On Nov 9, 2015, at 6:06 PM, Sanjay Subramanian
wrote:
> hey guys
>
> I have a 2 node SparkR (1 master 1 slave)cluster on AWS using
> spark-1.5.1-bin-without-hadoop.
You can take a look at the smvPivot function in the SMV library (
https://github.com/TresAmigosSD/SMV ). Should look for method "smvPivot" in
SmvDFHelper (
http://tresamigossd.github.io/SMV/scaladocs/index.html#org.tresamigos.smv.SmvDFHelper).
You can also perform the pivot on a group-by-group
The Spark job-server project may help
(https://github.com/spark-jobserver/spark-jobserver).
--
Ali
On Oct 21, 2015, at 11:43 PM, ?? wrote:
> Hi developers, I've encountered some problem with Spark, and before opening
> an issue, I'd like to hear your thoughts.
>
> Currently, if you want t
Which version of Spark are you using? I just tried the example below on 1.5.1
and it seems to work as expected:
scala> val res = df.groupBy("key").count.agg(min("count"), avg("count"))
res: org.apache.spark.sql.DataFrame = [min(count): bigint, avg(count): double]
scala> res.show
+--+---
Furthermore, even adding aliasing as suggested by the warning doesn't seem to
help either. Slight modification to example below:
> scala> val largeValues = df.filter('value >= 10).as("lv")
And just looking at the join results:
> scala> val j = smallValues
> .join(largeValues, smallValues("key
I could be misreading the code, but looking at the code for toLocalIterator
(copied below), it should lazily call runJob on each partition in your input.
It shouldn't be parsing the entire RDD before returning from the first "next"
call. If it is taking a long time on the first "next" call, it
Since DF2 only has the userID, I'm assuming you are musing DF2 to filter for
desired userIDs.
You can just use the join() and groupBy operations on DataFrame to do what you
desire. For example:
scala> val df1=app.createDF("id:String; v:Integer", "X,1;X,2;Y,3;Y,4;Z,10")
df1: org.apache.spark.sql
15 matches
Mail list logo