You can quickly play with HNSW in R to see if it does anything close to what you want.
https://cran.r-project.org/web/packages/RcppHNSW/index.html Note that you might consider examining the small world links that it constructs as they might approximate or be a close-enough proxy for your cliques in the first place; although obviously the results are unlikely to match your current process exactly of course. The HNSW theory paper is https://arxiv.org/abs/1603.09320 Faiss links: https://medium.com/@DataPlayer/scalable-approximate-nearest-neighbour-search-using-googles-scann-and-facebook-s-faiss-3e84df25ba https://ai.meta.com/tools/faiss/ Note it has GPU support which might give you speed up too. On Sunday, June 22, 2025 at 6:32:27 PM UTC+2 William Gilmore wrote: > Jason, > > Thank you for your response. I will take a look at greenpack and the > other algorithms that you mention. > > Thank you! > > lbe > > On Sunday, June 22, 2025 at 11:02:25 AM UTC-5 Jason E. Aten wrote: > >> Hi Ibe, >> >> gob is unsupported and not optimal in many ways. >> >> I would invite you to try my serialization package. >> https://github.com/glycerine/greenpack >> You can serialize unexported fields if you wish to with the -unexported >> flag, though I generally don't. >> >> For the slow down, you might look at using non-quadratic approximate >> nearest neighbors methods >> like HNSW or faiss from facebook. >> >> Best, >> Jason >> >> On Sunday, June 22, 2025 at 1:21:52 AM UTC+2 Learned Byerror wrote: >> >>> All, >>> >>> I have an application that uses gonum/vptree and gonum/graph/simple. >>> Currently every run of the application instantiates the vptree from a >>> database, queries the vptree to find nearest neighbors, writes the data to >>> database table A, creates a weighted undirected graph from table A, >>> extracts k-clique communities from the graph and then generates SVGs for >>> each community. >>> >>> All of this works as expected and runs in about 7 minutes. The initial >>> version of the program ran in about 2 hours. I heavily used pprof to >>> identify optimization opportunities. I think the code is optimized as much >>> as is possible using the gonum packages. >>> >>> This program is run against a ever growing set of data. The current >>> data set has almost 3MM entities and grows by approximately 25K for each >>> new data set. As such, I expect the run time to continue increase. There >>> are two steps that take over 80% of the run time: vptree nearest neighbor >>> query, generation of the weighted undirected graph. I would like to be able >>> to save the state of the vptree and graph at the end of each run and use >>> that as input in the next run. >>> >>> Both gonum/vptree an gonum/graph/simple contain private fields. Neither >>> implement a GobEncode or GobDecode Interface. Consequently, I cannot use >>> encoding/gob. >>> >>> Please note, this is about saving and retrieving state in the same >>> version of the same program. It is not about transferring data from one >>> program to a different program. Neither of these packages have a dependency >>> on external state at run time. >>> >>> Are there alternative methods that I should consider? Approaches using >>> unsafe are acceptable to me. If something fails, I can alway recover by >>> making a full run as I am currently doing. >>> >>> The only alternative that I have come up with at this point if to make a >>> pull request and add the GobEncode/Decode functionality myself. While this >>> is an option, the effort required to do so is likely significant. >>> >>> Thank you in advance for your guidance! >>> >>> lbe >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/d3321260-ff3c-4588-ac54-269eebc2ad6cn%40googlegroups.com.