The question marks are actual question marks. I'm not sure how to find the "duplicate" keys in the map in memory. As far as I can tell there is only one "? 5" key in the in memory map.
I thought maybe computing the frequencies of the hash values of the keys and looking for any with more than one would find them, but this code: read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash (keys phrases))))) #'read-notes/dupes read-notes> (count dupes) 8911 seems to indicate 8,911 keys with identical hash values. On Wednesday, November 25, 2015 at 10:27:29 PM UTC-6, Ghadi Shayban wrote: > > While in memory before writing, are the hash codes for the "duplicate" > keys the same? You can call (hash) on the keys. I'm thinking there is > perhaps an issue with unicode string serialization... Are the question > marks a particular character? > > If you can find the similar strings in memory, before they are written, > call: > (map int the-string) > To see the actual unicode characters for the question marks. > > On Wednesday, November 25, 2015 at 11:07:34 PM UTC-5, Dave Kincaid wrote: >> >> The number of keys in the map is 8,054,160. >> >> On Wednesday, November 25, 2015 at 10:04:11 PM UTC-6, Dave Kincaid wrote: >>> >>> I have something very strange going on when I try to write a map out to >>> a file and read it back in. It's a perfectly fine hash-map with ????? >>> key/values (so it's pretty big). When I write the map out to a file using >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases >>> )) >>> >>> and then read it back in with >>> >>> (edn/read (PushbackReader. (io/reader >>> "/tmp/mednotes6153968756847768349/repl-write.edn"))) >>> >>> I am getting a duplicate key exception indicating that "? 5" is >>> duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the >>> map are strings and the values are numbers. When I get the value for "? 5" >>> from the map it returns 352. >>> >>> I tried to grep the file to find the occurrences of the key "? 5" (and >>> the 30 characters before and after it) and it seems to return 4 of them. >>> The second one is the right one from the map, but I have no idea where the >>> other 3 are coming from. >>> >>> [/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" >>> repl-write.edn >>> hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to >>> "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren >>> udden" 32, "being up all" 32, "? 5" 32, "limited financial means" >>> , "count , everytime she" 32, "? 5" 32, "had a partial mandibulect >>> >>> Does anyone have an idea what might be happening when the map is written >>> out to the file? How is that key getting duplicated? >>> >>> I have tried a few slightly different ways of writing to the file >>> including >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding >>> [*print-dup* true] (pr-str phrases))) >>> >>> and >>> >>> (spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString >>> phrases)) >>> >>> based on some StackOverflow answers I found. They all seem to do the >>> same thing. >>> >>> Here is the exception stack trace. >>> >>> 1. Caused by java.lang.IllegalArgumentException >>> Duplicate key: ? 5 >>> >>> PersistentHashMap.java: 67 >>> clojure.lang.PersistentHashMap/createWithCheck >>> RT.java: 1538 clojure.lang.RT/map >>> EdnReader.java: 631 >>> clojure.lang.EdnReader$MapReader/invoke >>> EdnReader.java: 142 clojure.lang.EdnReader/read >>> EdnReader.java: 108 clojure.lang.EdnReader/read >>> edn.clj: 35 clojure.edn/read >>> edn.clj: 33 clojure.edn/read >>> AFn.java: 154 clojure.lang.AFn/applyToHelper >>> AFn.java: 144 clojure.lang.AFn/applyTo >>> Compiler.java: 3623 >>> clojure.lang.Compiler$InvokeExpr/eval >>> Compiler.java: 439 clojure.lang.Compiler$DefExpr/eval >>> Compiler.java: 6787 clojure.lang.Compiler/eval >>> Compiler.java: 6745 clojure.lang.Compiler/eval >>> core.clj: 3081 clojure.core/eval >>> main.clj: 240 >>> clojure.main/repl/read-eval-print/fn >>> main.clj: 240 clojure.main/repl/read-eval-print >>> main.clj: 258 clojure.main/repl/fn >>> main.clj: 258 clojure.main/repl >>> RestFn.java: 1523 clojure.lang.RestFn/invoke >>> interruptible_eval.clj: 58 >>> clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn >>> AFn.java: 152 clojure.lang.AFn/applyToHelper >>> AFn.java: 144 clojure.lang.AFn/applyTo >>> core.clj: 630 clojure.core/apply >>> core.clj: 1868 clojure.core/with-bindings* >>> RestFn.java: 425 clojure.lang.RestFn/invoke >>> interruptible_eval.clj: 56 >>> clojure.tools.nrepl.middleware.interruptible-eval/evaluate >>> interruptible_eval.clj: 191 >>> clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn >>> interruptible_eval.clj: 159 >>> clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn >>> AFn.java: 22 clojure.lang.AFn/run >>> ThreadPoolExecutor.java: 1142 >>> java.util.concurrent.ThreadPoolExecutor/runWorker >>> ThreadPoolExecutor.java: 617 >>> java.util.concurrent.ThreadPoolExecutor$Worker/run >>> Thread.java: 745 java.lang.Thread/run >>> >>> >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.