Is it necessary to convert categorical data into integers?

Any tips would be greatly appreciated!

-Rex

On Sun, Jun 14, 2015 at 10:05 AM, Rex X <dnsr...@gmail.com> wrote:

> For clustering analysis, we need a way to measure distances.
>
> When the data contains different levels of measurement -
> *binary / categorical (nominal), counts (ordinal), and ratio (scale)*
>
> To be concrete, for example, working with attributes of
> *city, zip, satisfaction_level, price*
>
> In the meanwhile, the real data usually also contains string attributes,
> for example, book titles. The distance between two strings can be measured
> by minimum-edit-distance.
>
>
> In SPSS, it provides Two-Step Cluster, which can handle both ratio scale
> and ordinal numbers.
>
>
> What is right algorithm to do hierarchical clustering analysis with all
> these four-kind attributes above with *MLlib*?
>
>
> If we cannot find a right metric to measure the distance, an alternative
> solution is to do a topological data analysis (e.g. linkage, and etc).
> Can we do such kind of analysis with *GraphX*?
>
>
> -Rex
>
>

Reply via email to