https://gist.github.com/hoyeunglee/3d340ab4e9a3e2b7ad7307322055b550
I updated again how to do better because some words are stored in different files On Thursday, August 3, 2017 at 10:02:01 AM UTC+8, Ho Yeung Lee wrote: > https://gist.github.com/hoyeunglee/f371f66d55f90dda043f7e7fea38ffa2 > > I am near succeed in another way, please run above code > > when so much black words, it will be very slow > so I only open notepad and maximum it without any content > then capture screen and save as roster.png > > and run it, but I discover it can not circle all words with red rectangle > and only part of words > > > On Wednesday, August 2, 2017 at 3:06:40 PM UTC+8, Peter Otten wrote: > > Glenn Linderman wrote: > > > > > On 8/1/2017 2:10 PM, Piet van Oostrum wrote: > > >> Ho Yeung Lee <jobmatt...@gmail.com> writes: > > >> > > >>> def isneighborlocation(lo1, lo2): > > >>> if abs(lo1[0] - lo2[0]) < 7 and abs(lo1[1] - lo2[1]) < 7: > > >>> return 1 > > >>> elif abs(lo1[0] - lo2[0]) == 1 and lo1[1] == lo2[1]: > > >>> return 1 > > >>> elif abs(lo1[1] - lo2[1]) == 1 and lo1[0] == lo2[0]: > > >>> return 1 > > >>> else: > > >>> return 0 > > >>> > > >>> > > >>> sorted(testing1, key=lambda x: (isneighborlocation.get(x[0]), x[1])) > > >>> > > >>> return something like > > >>> [(1,2),(3,3),(2,5)] > > > > >> I think you are trying to sort a list of two-dimensional points into a > > >> one-dimensiqonal list in such a way thet points that are close together > > >> in the two-dimensional sense will also be close together in the > > >> one-dimensional list. But that is impossible. > > > > > It's not impossible, it just requires an appropriate distance function > > > used in the sort. > > > > That's a grossly misleading addition. > > > > Once you have an appropriate clustering algorithm > > > > clusters = split_into_clusters(items) # needs access to all items > > > > you can devise a key function > > > > def get_cluster(item, clusters=split_into_clusters(items)): > > return next( > > index for index, cluster in enumerate(clusters) if item in cluster > > ) > > > > such that > > > > grouped_items = sorted(items, key=get_cluster) > > > > but that's a roundabout way to write > > > > grouped_items = sum(split_into_clusters(items), []) > > > > In other words: sorting is useless, what you really need is a suitable > > approach to split the data into groups. > > > > One well-known algorithm is k-means clustering: > > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans.html > > > > Here is an example with pictures: > > > > https://dzone.com/articles/k-means-clustering-scipy -- https://mail.python.org/mailman/listinfo/python-list