Hi Rusty, On Thu, Apr 14, 2011 at 8:00 PM, Rusty Klophaus <ru...@basho.com> wrote: > Hi Morten, > Thanks for sending the log files. I was able to figure out, at least > partially, what's going on here.
Fantastic - thanks! > The "Failed to compact" message is a result of trying to index a token > that's greater than 32kb in size. (The index storage engine, called > merge_index, assumes tokens sizes smaller than 32kb.) I was able to decode > part of the term in question by pulling data from the log file, and it looks > like you may be indexing HTML with base64 encoded inline images, ie: <img > src="..."> The inline image is being treated > as a single token, and it's greater than 32kb. That's odd - in the search schema, I asked it to ignore everything besides a few specific fields: { schema, [ {version, "0.1"}, {default_field, "_owner"}, {n_val, 1} ], [ %% Don't parse _id and _owner, just treat it as single token {field, [ {name, "id"}, {required, true}, {analyzer_factory, {erlang, text_analyzers, noop_analyzer_factory}} ]}, {field, [ {name, "_owner"}, {required, true}, {analyzer_factory, {erlang, text_analyzers, noop_analyzer_factory}} ]}, %% Parse Name fields for full-text indexing {field, [ {name, "displayName"}, {aliases, ["nickname", "preferredUsername", "name_formatted", "name_displayName"]}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}} ]}, {field, [ {name, "emails_value"}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}} ]}, %% Add modification dates {field, [ {name, "published"}, {aliases, ["updated"]}, {type, date} ]}, %% Skip all else... {dynamic_field, [ {name, "*"}, {skip, true} ]} ] }. (We're indexing Portable Contacts, where the user images reside in a 'image'-field.) > The short term workaround is to either: > 1) Preprocess your data to avoid this situation. > 2) Or, create a custom analyzer that limits the size of terms > (See http://wiki.basho.com/Riak-Search---Schema.html for more information > about analyzers and custom analyzers.) > The long term solution is for us to increase the maximum token size in > merge_index. I've filed a bugzilla issue for this, trackable > here: https://issues.basho.com/show_bug.cgi?id=1069 > Still investigating the "Too many db tables" error. This is being caused by > the system opening too many ETS tables. It *may* be related to the > compaction error described above, but I'm not sure. > Search (specifically merge_index) uses ETS tables heavily, and the number of > tables is affected by a few different factors. Can you send me some more > information to help debug, specifically: > > How many partitions (vnodes) are in your cluster? (If you haven't changed > any settings, then the default is 64.) It's 64 (no defaults changed at all). > How many machines are in your cluster? Four. > How many segments are on the node where you are seeing these errors? > (Run: "find DATAPATH/merge_index/*/*.data | wc -l", replacing DATAPATH with > the path to your Riak data directory for that node.) foreach srv ( nosql1 nosql2 nosql4 nosql5 ) echo -n "$srv "; ssh $srv sh -c 'find /var/lib/riaksearch/merge_index/*/*.data | wc -l' end nosql1 32434 nosql2 14170 nosql4 15480 nosql5 13501 (nosql1 is the one the error log is lifted from - but the errors seemed to come of all of the servers.) > Approximately how much data are you loading (# Docs and # MB), and how > quickly are you trying to load it? ~17m records, weighing in just shy of four GB. While I didn't do the loading, I believe we did it with 25 concurrent threads, using the four machines in round-robin fashion. /Siebuhr _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com