from:"Tony Lane"

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Tony Lane

ome time and > identify which of these 112 factors are actually causative. Some domain > knowledge of the data may be required. Then, you can start of with PCA. > > HTH, > > Regards, > > Sivakumaran S > > On 08-Aug-2016, at 3:01 PM, Tony Lane wrote: > > Great quest

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Tony Lane

Great question Rohit. I am in my early days of ML as well and it would be great if we get some idea on this from other experts on this group. I know we can reduce dimensions by using PCA, but i think that does not allow us to understand which factors from the original are we using in the end. -

Re: Kmeans dataset initialization

2016-08-06 Thread Tony Lane

Can anyone suggest how I can initialize kmeans structure directly from a dataset of Row On Sat, Aug 6, 2016 at 1:03 AM, Tony Lane wrote: > I have all the data required for KMeans in a dataset in memory > > Standard approach to load this data from a file is > spark.read().format(&q

Kmeans dataset initialization

2016-08-05 Thread Tony Lane

I have all the data required for KMeans in a dataset in memory Standard approach to load this data from a file is spark.read().format("libsvm").load(filename) where the file has data in the format 0 1:0.0 2:0.0 3:0.0 How do i this from an in-memory dataset already present. Any suggestions ? -T

Re: Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread Tony Lane

Mike. I have figured how to do this . Thanks for the suggestion. It works great. I am trying to figure out the performance impact of this. thanks again On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane wrote: > @mike - this looks great. How can i do this in java ? what is the > perfo

Re: Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread Tony Lane

aranteed > unique (but not necessarily consecutive) IDs. Calling something like: > > df.withColumn("id", monotonically_increasing_id()) > > You don't mention which language you're using but you'll need to pull in > the sql.functions library. > > Mike &

Re: Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread Tony Lane

Fri, Aug 5, 2016 at 6:35 PM, ayan guha wrote: > Hi > > Can you explain a little further? > > best > Ayan > > On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane wrote: > >> I have a row with structure like >> >> identifier: String >> value: int >> >&

Generating unique id for a column in Row without breaking into RDD and joining back

2016-08-05 Thread Tony Lane

I have a row with structure like identifier: String value: int All identifier are unique and I want to generate a unique long id for the data and get a row object back for further processing. I understand using the zipWithUniqueId function on RDD, but that would mean first converting to RDD and

Re: Using sparse vector leads to array out of bounds exception

2016-08-03 Thread Tony Lane

wen wrote: > You mean "new int[] {0,1,2}" because vectors are 0-indexed. > > On Wed, Aug 3, 2016 at 11:52 AM, Tony Lane wrote: > > Hi Sean, > > > > I did not understand, > > I created a KMeansModel with 3 dimensions and then I am calling predict > > me

Re: Using sparse vector leads to array out of bounds exception

2016-08-03 Thread Tony Lane

hat the vector has 3 dimensions, but then refer to its > 4th dimension (at index 3). That is the error. > > On Wed, Aug 3, 2016 at 10:43 AM, Tony Lane wrote: > > I am using the following vector definition in java > > > > Vectors.sparse(3, new int[] { 1, 2, 3 }

Using sparse vector leads to array out of bounds exception

2016-08-03 Thread Tony Lane

I am using the following vector definition in java Vectors.sparse(3, new int[] { 1, 2, 3 }, new double[] { 1.1, 1.1, 1.1 })) However when I run the predict method on this vector it leads to Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.spark.mllib.linalg.BL

Re: Calling KmeansModel predict method

2016-08-03 Thread Tony Lane

use factory methods in Vectors On Wed, Aug 3, 2016 at 9:54 PM, Rohit Chaddha wrote: > The predict method takes a Vector object > I am unable to figure out how to make this spark vector object for getting > predictions from my model. > > Does anyone has some code in java for this ? > > Thanks > R

Re: Stop Spark Streaming Jobs

2016-08-03 Thread Tony Lane

SparkSession exposes stop() method On Wed, Aug 3, 2016 at 8:53 AM, Pradeep wrote: > Thanks Park. I am doing the same. Was trying to understand if there are > other ways. > > Thanks, > Pradeep > > > On Aug 2, 2016, at 10:25 PM, Park Kyeong Hee > wrote: > > > > So sorry. Your name was Pradeep !!

Re: Error in building spark core on windows - any suggestions please

2016-08-03 Thread Tony Lane

Compiling without running tests... and this is going fine .. On Wed, Aug 3, 2016 at 8:00 PM, Tony Lane wrote: > I am trying to build spark in windows, and getting the following test > failures and consequent build failures. > > [INFO] --- maven-surefire-plugin:2.19.1:test (

Error in building spark core on windows - any suggestions please

2016-08-03 Thread Tony Lane

I am trying to build spark in windows, and getting the following test failures and consequent build failures. [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ spark-core_2.11 --- --- T E S T S --

error while running filter on dataframe

2016-07-31 Thread Tony Lane

Can someone help me understand this error which occurs while running a filter on a dataframe 2016-07-31 21:01:57 ERROR CodeGenerator:91 - failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 117, Column 58: Expression "mapelements_isNull" is not an rvalue

spark java - convert string to date

2016-07-31 Thread Tony Lane

Any built in function in java with spark to convert string to date more efficiently or do we just use the standard java techniques -Tony

Visualization of data analysed using spark

2016-07-30 Thread Tony Lane

I am developing my analysis application by using spark (in eclipse as the IDE) what is a good way to visualize the data, taking into consideration i have multiple files which make up my spark application. I have seen some notebook demo's but not sure how to use my application with such notebooks.

Re: how to order data in descending order in spark dataset

2016-07-30 Thread Tony Lane

just to clarify I am try to do this in java ts.groupBy("b").count().orderBy("count"); On Sun, Jul 31, 2016 at 12:00 AM, Tony Lane wrote: > ts.groupBy("b").count().orderBy("count"); > > how can I order this data in descending order of count > Any suggestions > > -Tony >

how to order data in descending order in spark dataset

2016-07-30 Thread Tony Lane

ts.groupBy("b").count().orderBy("count"); how can I order this data in descending order of count Any suggestions -Tony

Spark 2.0 blocker on windows - spark-warehouse path issue

2016-07-30 Thread Tony Lane

Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/ibm/spark-warehouse Anybody knows a solution to this? cheers tony

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Tony Lane

I am facing the same issue and completely blocked here. *Sean can you please help with this issue. * Migrating to 2.0.0 has really stalled our development effort. -Tony > -- Forwarded message -- > From: Sean Owen > Date: Fri, Jul 29, 2016 at 12:47 AM > Subject: Re: Spark 2.0 -

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Re: Kmeans dataset initialization

Kmeans dataset initialization

Re: Generating unique id for a column in Row without breaking into RDD and joining back

Re: Generating unique id for a column in Row without breaking into RDD and joining back

Re: Generating unique id for a column in Row without breaking into RDD and joining back

Generating unique id for a column in Row without breaking into RDD and joining back

Re: Using sparse vector leads to array out of bounds exception

Re: Using sparse vector leads to array out of bounds exception

Using sparse vector leads to array out of bounds exception

Re: Calling KmeansModel predict method

Re: Stop Spark Streaming Jobs

Re: Error in building spark core on windows - any suggestions please

Error in building spark core on windows - any suggestions please

error while running filter on dataframe

spark java - convert string to date

Visualization of data analysed using spark

Re: how to order data in descending order in spark dataset

how to order data in descending order in spark dataset

Spark 2.0 blocker on windows - spark-warehouse path issue

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

22 matches

Site Navigation

Mail list logo

Footer information