You can invoke exactly the same functions on scala side as well i.e.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
Have you tried them?
On Thu, Mar 24, 2016 at 10:29 PM, Mich Talebzadeh wrote:
>
> Hi,
>
> Read a CSV in with the following schema
>
> sca
Extending breaks chaining and not nice. I think it is much better to write
implicit class with extra methods. This way you add new methods without
touching hierarchy at all i.e.
object RddFunctions {
implicit class RddFunctionsImplicit[T](rdd: RDD[T]) {
/***
* Cache RDD and name it in o
e custom rdd, there are some fields I have defined so that both custom
> method and compute method can see and operate them, can the method in
> implicit class implement that?
>
>> On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin
>> wrote:
>> Extending breaks chainin
So, why not make a fake key and aggregate on it?
On Mon, Mar 28, 2016 at 6:21 PM, sujeet jog wrote:
> Hi,
>
> I have a RDD like this .
>
> [ 12, 45 ]
> [ 14, 50 ]
> [ 10, 35 ]
> [ 11, 50 ]
>
> i want to aggreate values of first two rows into 1 row and subsequenty the
> next two rows into anothe
You drop label column and later you try to select it. It won't find it, indeed.
--
Alexander
aka Six-Hat-Thinker
> On 28 Mar 2016, at 23:34, Jerry Lam wrote:
>
> Hi spark users and developers,
>
> I'm using spark 1.5.1 (I have no choice because this is what we used). I ran
> into some very un
e.g. select max value for column "foo":
from pyspark.sql.functions import max, col
df.select(max(col("foo"))).show()
On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <
a...@santacruzintegration.com> wrote:
> I am using pyspark 1.6.1 and python3.
>
>
> *Given:*
>
> idDF2 = idDF.select(idDF.id, idDF
ou still need to write more code than I
> would expect. I wonder if there is a easier way to work with Rows?
>
> In [19]:
>
> from pyspark.sql.functions import max
>
> maxRow = idDF2.select(max("col[id]")).collect()
>
> max = maxRow[0].asDict()['max(col[id
You can even use the fact that pyspark has dynamic properties
rows = idDF2.select(max("col[id]").alias("max")).collect()
firstRow = rows[0]
max = firstRow.max
On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin wrote:
> You should be able to index columns directly ei