I don't know about MySQL specifics, but I did a similar job on top of postgresql and relied on its built in temporary tables, indexes and spill-to-disk functionality, the dataset/computation in question was already blowing up a Ruby process. There's no need to reinvent it in clojure if the database can do the work and it's already deployed/operationalized.
We also made heavy use of js and plv8 on that project for complex map ops. The tests were written in clojure :). On Nov 12, 2017 8:18 AM, <lawrence.krub...@gmail.com> wrote: > I recently worked on a minor project that nevertheless needed to use 10 > gigs of RAM. It ran on a reasonably powerful server, yet it taxed that > server. And I wondered, how are people scaling up such processes? If my > approach was naive, what does the less naive approach look like? > > I wrote a simple app that pulled data from a MySQL database, denormalized > it, and then stored it in ElasticSearch. It pulled about 4 million > documents from MySQL. Parts of the data needed to be built up into complex > structures (maps, vectors) before being put into ElasticSearch. In the end, > the 4 million rows from MySQL became 1.5 million documents in ElasticSearch. > > I was wondering, what if, instead of 4 million documents, I needed to > process 400 million documents? I assume I would have to distribute the work > over several machines? I'm curious what are some of the most common routes > for doing so? Would this be the situation where people would start to use > something like Onyx or Storm or Hadoop? I looked at Spark but it seems to > be for a different use case, more about querying that denormalizing. > Likewise, dumping everything onto S3 and then using something like Athena > seems to be more for querying than denormalizing. > > For unrelated reasons, I am moving toward the architecture where all data > is stored in Kafka. I suppose I could write a denormalizing app that reads > over Kafka and builds up the data and then inserts it to ElasticSearch, > though I suppose, on the narrow issue of memory usage, using Kafka is no > different than using using MySQL. > > So, I'm asking about common patterns here. When folks have an app that > needs more RAM than a typical server, what is the first and most common > steps they take? > > > > > > > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.