Interesting problem. Eric, as you said earlier, K-means requires a way to measure the distance between objects -- so that those with smaller distances can be grouped together. A problem is that there are a number of features, which may not be correlated. For example, there is an income trajectory, a change of company trajectory, a change of level-of-responsibility trajectory, a change of subject-matter-focus trajectory, and probably more. You might build separate trajectories for each person and then see if you can group the trajectories. For example, a "company man" may or may not have an increasing responsibility trajectory. You would then have a multi-dimensional space into which to put people.
-- Russ On Mon, Jan 9, 2023 at 10:11 AM Nicholas Thompson <[email protected]> wrote: > To my uneducated eye, this seemed like one of Jon’s problems. > > Sent from my Dumb Phone > > On Jan 7, 2023, at 6:23 AM, Frank Wimberly <[email protected]> wrote: > > > This answer seems reasonable to me. I worked on Project Talent during > 1967 which had some similar goals and data. See > > https://en.m.wikipedia.org/wiki/Project_Talent > > Our data was for thousands of highschool students and our software was all > written in Fortran. > > --- > Frank C. Wimberly > 140 Calle Ojo Feliz, > Santa Fe, NM 87505 > > 505 670-9918 > Santa Fe, NM > > On Fri, Jan 6, 2023, 11:32 PM Pieter Steenekamp < > [email protected]> wrote: > >> I asked https://chat.openai.com/chat and here is the conversation: >> >> *Pieter Steenekamp* >> can you suggest a solution for the following problem "I'm hoping someone >> here could help out. Let's imagine I had some data where each row was a >> person's career. We could list major events every year.For example: 2004 >> they were highered, 2007 they get a promotion, 2010 they leave for a >> different company, 2012 they come back at a higher level, 2015 get a >> promotion, then no change until 2022. Let's say I had data like this for >> roughly 2 million people, and that there are around 10 different types of >> changes that could happen during any time period (could be yearly, >> quarterly, monthly, I can make it how I want). I was hoping we could ask a >> computer to tell us if there were "types of careers" that people had. We >> could say "put all these careers into 4 buckets" or "7 buckets" based on >> similarity. Then we could look at the piles the computer made and try to >> make sense of them. One type might be "company man" for people who tend to >> stay in place for 20 or more years, another type could be a "rotator", who >> leaves and returns every 3 years or so. Etc. The point is, I want a >> computer to make the piles for me, rather than trying to come up with >> potential piles a priori. Are there methods for doing this? I know it's a >> problem we've *talked* about a lot, but I don't know if there are >> solutions. " >> *https://chat.openai.com/chat <https://chat.openai.com/chat>* >> There are several approaches that you could use to group the careers of >> these 2 million people into clusters based on similarity. One approach is >> to use unsupervised machine learning techniques, specifically clustering >> algorithms, to automatically group the data into clusters. Some popular >> clustering algorithms that you could use for this problem include k-means, >> hierarchical clustering, and DBSCAN (Density-Based Spatial Clustering of >> Applications with Noise). You could also use dimensionality reduction >> techniques, such as PCA (Principal Component Analysis) or t-SNE >> (t-Distributed Stochastic Neighbor Embedding), to reduce the dimensionality >> of the data and then use a clustering algorithm to group the data into >> clusters. Another approach is to use supervised machine learning >> techniques, where you would need to manually label a subset of the data >> with the cluster labels, and then use this labeled data to train a >> classifier to predict the cluster labels for the remaining data. It's worth >> noting that these approaches will require some feature engineering, where >> you will need to extract relevant features from the data that can be used >> to represent the careers of the people in the dataset. Once you have >> identified and extracted these features, you can then use them as input to >> the machine learning algorithms to group the data into clusters. >> *Pieter Steenekamp* >> would it then be possible to get for example if there were "types of >> careers" that people had? >> *https://chat.openai.com/chat <https://chat.openai.com/chat>* >> Yes, once you have grouped the careers of the 2 million people into >> clusters using one of the approaches I mentioned above, you can then >> analyze the characteristics of each cluster to identify common patterns and >> trends within each cluster. This will allow you to identify "types of >> careers" that people have, based on the similarity of their career events. >> For example, if one of the clusters is characterized by people who have had >> long tenures at a single company, you could label this cluster as "company >> man" careers. Similarly, if another cluster is characterized by people who >> frequently switch between companies, you could label this cluster as >> "rotator" careers. You can then use this information to gain insights into >> the types of careers that are represented in the dataset, and potentially >> draw conclusions about the factors that may influence career paths and >> success. >> *Pieter Steenekamp* >> can you propose python code to do sometingh like this please >> *https://chat.openai.com/chat <https://chat.openai.com/chat>* >> Sure, here is an example of how you could use the scikit-learn library in >> Python to group the careers of the 2 million people into clusters using the >> k-means algorithm: >> Copy code >> from sklearn.cluster import KMeans >> # Load the data into a NumPy array >> data = np.loadtxt('career_data.txt') >> # Create the k-means model >> model = KMeans(n_clusters=4) >> # Fit the model to the data >> model.fit(data) >> # Predict the cluster labels for each data point >> labels = model.predict(data) >> # Print the cluster labels >> print(labels) >> This code will group the data into 4 clusters, as specified by the >> n_clusters parameter. You can adjust this parameter to change the number of >> clusters that the data is grouped into. >> To extract features from the data that can be used as input to the >> k-means model, you will need to preprocess the data and select relevant >> features. This will likely involve some feature engineering, which will >> depend on the specific characteristics of the data and the goals of your >> analysis. >> I hope this helps! Let me know if you have any questions. >> >> On Fri, 6 Jan 2023 at 19:34, Eric Charles <[email protected]> >> wrote: >> >>> Greetings all, >>> I'm hoping someone here could help out. Let's imagine I had some data >>> where each row was a person's career. We could list major events every >>> year. >>> >>> For example: 2004 they were highered, 2007 they get a promotion, 2010 >>> they leave for a different company, 2012 they come back at a higher level, >>> 2015 get a promotion, then no change until 2022. >>> >>> Let's say I had data like this for roughly 2 million people, and that >>> there are around 10 different types of changes that could happen during any >>> time period (could be yearly, quarterly, monthly, I can make it how I >>> want). >>> >>> I was hoping we could ask a computer to tell us if there were "types of >>> careers" that people had. We could say "put all these careers into 4 >>> buckets" or "7 buckets" based on similarity. Then we could look at the >>> piles the computer made and try to make sense of them. >>> >>> One type might be "company man" for people who tend to stay in place for >>> 20 or more years, another type could be a "rotator", who leaves and returns >>> every 3 years or so. Etc. The point is, I want a computer to make the piles >>> for me, rather than trying to come up with potential piles a priori. >>> >>> Are there methods for doing this? I know it's a problem we've *talked* >>> about a lot, but I don't know if there are solutions. >>> >>> Any help would be appreciated. >>> >>> Best, >>> Eric >>> >>> <[email protected]> >>> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . >>> FRIAM Applied Complexity Group listserv >>> Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom >>> https://bit.ly/virtualfriam >>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com >>> FRIAM-COMIC http://friam-comic.blogspot.com/ >>> archives: 5/2017 thru present >>> https://redfish.com/pipermail/friam_redfish.com/ >>> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >>> >> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . >> FRIAM Applied Complexity Group listserv >> Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom >> https://bit.ly/virtualfriam >> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com >> FRIAM-COMIC http://friam-comic.blogspot.com/ >> archives: 5/2017 thru present >> https://redfish.com/pipermail/friam_redfish.com/ >> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >> > -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . > FRIAM Applied Complexity Group listserv > Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom > https://bit.ly/virtualfriam > to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com > FRIAM-COMIC http://friam-comic.blogspot.com/ > archives: 5/2017 thru present > https://redfish.com/pipermail/friam_redfish.com/ > 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ > -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . > FRIAM Applied Complexity Group listserv > Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom > https://bit.ly/virtualfriam > to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com > FRIAM-COMIC http://friam-comic.blogspot.com/ > archives: 5/2017 thru present > https://redfish.com/pipermail/friam_redfish.com/ > 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >
-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. . FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
