That's honestly closer to what I was originally envisioning--I've never really looked into graph dbs before, but I'll check out Neo4j tonight. Do you know whether you can model multiple edges between the same nodes? I'd love to be able to have POS-based wildcarding as a feature, so you could search for e.g. "the ADJ goose", but that's a whole other layer of stuff, so it might go in the "eventually, maybe" pile.
On Tuesday, March 10, 2015 at 3:47:37 PM UTC-4, Ray Miller wrote: > > On 10 March 2015 at 17:58, Sam Raker <sam....@gmail.com <javascript:>> > wrote: > > I more meant deciding on a maximum size and storing them qua ngrams--it > > seems limiting. On the other hand, after a certain size, they stop being > > ngrams and start being something else--"texts," possibly. > > Exactly. When I first read your post, I almost suggested you model > this in a graph database like Neo4j or Titan. Each word would be a > node in the graph with an edge linking it to the next word in the > sentence. You could define an index on the words (so retrieving all > nodes for a given word would be fast), then follow edges to find and > count particular n-grams. This is more complicated than the relational > model I proposed, and will be a bit slower to query. But if you don't > want to put an upper-bound on the length of the n-gram when you index > the data, it might be the way to go. > > Ray. > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.