Re. Tim's points below: *i)* The seqs have to be ordered, or one of them has to be loaded fully into memory; I don't think there's any way around that.
*ii)* Frank's solution does *not* require the seqs to be the same length, and it gives you the complete 'diff' of the seqs (aka outer join), which could be handy. The one snag I see is that it is eager, not lazy, so it's going to put the answer completely in memory. So unless you are projecting out a small subset of the fields from each record, you will probably end up using as much memory as the other solutions. I wrote a lazy version using 'iterate', but I'm not sure it doesn't keep both entire seqs in memory, too. My two cents: 1. If you have enough memory, go with Moritz' suggestion to read the smaller seq into a map. Then you can do a simple for comprehension and arrange it so that the second, larger seq will never be completely in memory. 2. Another possible solution is to load the textfile into a temp table in your database. Then the solution is one simple SQL query, backed by hyper-optimized code designed to deal with this exact problem. 3. You may want to try the naive approach: 400k records sounds like it could very well fit into memory, as long as each record doesn't have a huge amount of data. 4. A library that has tools to deal with big files: https://github.com/kyleburton/clj-etl-utils --Leif On Monday, March 10, 2014 11:01:07 PM UTC-4, frye wrote: > > Hey Frank, > > Right. So I tried this loop / recur, and it runs, giving a result of *([4 > nil] [3 3] [2 nil] [1 1])*. But I'm not sure how that's going to help you > (although not discounting the possibility). > > You can simultaneously iterate through pairs of lists, to compare values. > However you cannot guarantee that those lists will be *i)* ordered, and > *ii)* the same length. Both those conditions are required for your > algorithm to work. Plus, what you suggest still means that you'll have to > scan through the entire space of both results. So we're not going to avoid > that. > > Based on your requirements, I still see my original *for* comprehension > as the most straightforward way to solve the problem. My second suggested > algorithm could also work. But I could be wrong and am always learning too. > So trying different solutions is a good habit to keep. > > > Hth > > Tim Washington > Interruptsoftware.com <http://interruptsoftware.com> > > > On Mon, Mar 10, 2014 at 4:53 AM, Frank Behrens <fbeh...@gmail.com<javascript:> > > wrote: > >> Hey, just to share, I came up with this code, which seem quite ok to me, >> Feels like I already understand something, do i, >> Have a nice day, Frank >> >> (loop >> [a '(1 2 3 4) >> b '(1 3) >> out ()] >> (cond >> (and (empty? a)(empty? b)) out >> (empty? a) (recur a (rest b) (conj out [nil (first >> b)])) >> (empty? b) (recur (rest a) b (conj out [(first a) >> nil])) >> :else (let >> [fa (first a) >> fb (first b) >> cmp (compare fa fb)] >> (cond >> (= 0 cmp) (recur (rest a) (rest b) (conj out [fa fb])) >> (> 0 cmp) (recur (rest a) b (conj out [fa nil])) >> :else (recur a (rest b) (conj out [nil >> fb])))))) >> >> >> Am Montag, 10. März 2014 09:26:14 UTC+1 schrieb Frank Behrens: >> >>> Thanks for your suggestions. >>> a for loop has to do 100.000 * 300.000 compares >>> Storing the database table into a 300.000 element hash, would be a >>> memory penalty I want to avoid. >>> >>> I'm quite shure that assential part of the solution is a function to >>> iterate through both list at once, >>> spitting out pairs of values according to compare >>> >>> (merge-sortedlists >>> '(1 2 3) >>> '( 2 4)) >>> => ([1 nil] [2 2] [3 nil] [nil 4]) >>> >>> Seems quite doable. >>> Try to implement now. >>> >>> Frank >>> >>> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.