Yes, that's exactly what PCA is for as Sivakumaran noted. Do you really want to select features or just obtain a lower-dimensional representation of them, with less redundancy?
On Mon, Aug 8, 2016 at 4:10 PM, Tony Lane <tonylane....@gmail.com> wrote: > There must be an algorithmic way to figure out which of these factors > contribute the least and remove them in the analysis. > I am hoping same one can throw some insight on this. > > On Mon, Aug 8, 2016 at 7:41 PM, Sivakumaran S <siva.kuma...@me.com> wrote: >> >> Not an expert here, but the first step would be devote some time and >> identify which of these 112 factors are actually causative. Some domain >> knowledge of the data may be required. Then, you can start of with PCA. >> >> HTH, >> >> Regards, >> >> Sivakumaran S >> >> On 08-Aug-2016, at 3:01 PM, Tony Lane <tonylane....@gmail.com> wrote: >> >> Great question Rohit. I am in my early days of ML as well and it would be >> great if we get some idea on this from other experts on this group. >> >> I know we can reduce dimensions by using PCA, but i think that does not >> allow us to understand which factors from the original are we using in the >> end. >> >> - Tony L. >> >> On Mon, Aug 8, 2016 at 5:12 PM, Rohit Chaddha <rohitchaddha1...@gmail.com> >> wrote: >>> >>> >>> I have a data-set where each data-point has 112 factors. >>> >>> I want to remove the factors which are not relevant, and say reduce to 20 >>> factors out of these 112 and then do clustering of data-points using these >>> 20 factors. >>> >>> How do I do these and how do I figure out which of the 20 factors are >>> useful for analysis. >>> >>> I see SVD and PCA implementations, but I am not sure if these give which >>> elements are removed and which are remaining. >>> >>> Can someone please help me understand what to do here >>> >>> thanks, >>> -Rohit >>> >> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org