Re: Converting String to Datetime using map

2016-03-24 Thread Alexander Krasnukhin
You can invoke exactly the same functions on scala side as well i.e. http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ Have you tried them? On Thu, Mar 24, 2016 at 10:29 PM, Mich Talebzadeh wrote: > > Hi, > > Read a CSV in with the following schema > > sca

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
Extending breaks chaining and not nice. I think it is much better to write implicit class with extra methods. This way you add new methods without touching hierarchy at all i.e. object RddFunctions { implicit class RddFunctionsImplicit[T](rdd: RDD[T]) { /*** * Cache RDD and name it in o

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
e custom rdd, there are some fields I have defined so that both custom > method and compute method can see and operate them, can the method in > implicit class implement that? > >> On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin >> wrote: >> Extending breaks chainin

Re: Aggregate subsequenty x row values together.

2016-03-28 Thread Alexander Krasnukhin
So, why not make a fake key and aggregate on it? On Mon, Mar 28, 2016 at 6:21 PM, sujeet jog wrote: > Hi, > > I have a RDD like this . > > [ 12, 45 ] > [ 14, 50 ] > [ 10, 35 ] > [ 11, 50 ] > > i want to aggreate values of first two rows into 1 row and subsequenty the > next two rows into anothe

Re: [Spark SQL] Unexpected Behaviour

2016-03-28 Thread Alexander Krasnukhin
You drop label column and later you try to select it. It won't find it, indeed. -- Alexander aka Six-Hat-Thinker > On 28 Mar 2016, at 23:34, Jerry Lam wrote: > > Hi spark users and developers, > > I'm using spark 1.5.1 (I have no choice because this is what we used). I ran > into some very un

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-28 Thread Alexander Krasnukhin
e.g. select max value for column "foo": from pyspark.sql.functions import max, col df.select(max(col("foo"))).show() On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am using pyspark 1.6.1 and python3. > > > *Given:* > > idDF2 = idDF.select(idDF.id, idDF

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Alexander Krasnukhin
ou still need to write more code than I > would expect. I wonder if there is a easier way to work with Rows? > > In [19]: > > from pyspark.sql.functions import max > > maxRow = idDF2.select(max("col[id]")).collect() > > max = maxRow[0].asDict()['max(col[id

Re: looking for an easy to to find the max value of a column in a data frame

2016-03-29 Thread Alexander Krasnukhin
You can even use the fact that pyspark has dynamic properties rows = idDF2.select(max("col[id]").alias("max")).collect() firstRow = rows[0] max = firstRow.max On Tue, Mar 29, 2016 at 7:14 PM, Alexander Krasnukhin wrote: > You should be able to index columns directly ei