Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Sean Owen Mon, 08 Aug 2016 11:08:43 -0700

Yes, that's exactly what PCA is for as Sivakumaran noted. Do you
really want to select features or just obtain a lower-dimensional
representation of them, with less redundancy?


On Mon, Aug 8, 2016 at 4:10 PM, Tony Lane <tonylane....@gmail.com> wrote:
> There must be an algorithmic way to figure out which of these factors
> contribute the least and remove them in the analysis.
> I am hoping same one can throw some insight on this.
>
> On Mon, Aug 8, 2016 at 7:41 PM, Sivakumaran S <siva.kuma...@me.com> wrote:
>>
>> Not an expert here, but the first step would be devote some time and
>> identify which of these 112 factors are actually causative. Some domain
>> knowledge of the data may be required. Then, you can start of with PCA.
>>
>> HTH,
>>
>> Regards,
>>
>> Sivakumaran S
>>
>> On 08-Aug-2016, at 3:01 PM, Tony Lane <tonylane....@gmail.com> wrote:
>>
>> Great question Rohit.  I am in my early days of ML as well and it would be
>> great if we get some idea on this from other experts on this group.
>>
>> I know we can reduce dimensions by using PCA, but i think that does not
>> allow us to understand which factors from the original are we using in the
>> end.
>>
>> - Tony L.
>>
>> On Mon, Aug 8, 2016 at 5:12 PM, Rohit Chaddha <rohitchaddha1...@gmail.com>
>> wrote:
>>>
>>>
>>> I have a data-set where each data-point has 112 factors.
>>>
>>> I want to remove the factors which are not relevant, and say reduce to 20
>>> factors out of these 112 and then do clustering of data-points using these
>>> 20 factors.
>>>
>>> How do I do these and how do I figure out which of the 20 factors are
>>> useful for analysis.
>>>
>>> I see SVD and PCA implementations, but I am not sure if these give which
>>> elements are removed and which are remaining.
>>>
>>> Can someone please help me understand what to do here
>>>
>>> thanks,
>>> -Rohit
>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

Reply via email to