Forget about the quoted comment a the bottom below. It is not true. Both the fast/efficient and the slow/memory-consuming query follow the getTermCounts-path.

But I have identified another place where they take different paths in the code. In SimpleFacets.getTermCounts you will find the code below. I have pointed out where the two queries go.
    if (params.getFieldBool(field, GroupParams.GROUP_FACET, false)) {
counts = getGroupedCounts(searcher, docs, field, multiToken, offset,limit, mincount, missing, sort, prefix);
    } else {
      assert method != null;
      switch (method) {
        case ENUM:
          assert TrieField.getMainValuePrefix(ft) == null;
counts = getFacetTermEnumCounts(searcher, docs, field, offset, limit, mincount,missing,sort,prefix);
          break;
        case FCS:
          assert !multiToken;
          if (ft.getNumericType() != null && !sf.multiValued()) {
*** ---> The fast/efficient query (facet.field=a_dlng_doc_sto) goes here
            // force numeric faceting
            if (prefix != null && !prefix.isEmpty()) {
throw new SolrException(ErrorCode.BAD_REQUEST, FacetParams.FACET_PREFIX + " is not supported on numeric types");
            }
counts = NumericFacets.getCounts(searcher, docs, field, offset, limit, mincount, missing, sort);
          } else {
PerSegmentSingleValuedFaceting ps = new PerSegmentSingleValuedFaceting(searcher, docs, field, offset,limit, mincount, missing, sort, prefix); Executor executor = threads == 0 ? directExecutor : facetExecutor;
            ps.setNumThreads(threads);
            counts = ps.getFacetCounts(executor);
          }
          break;
        case FC:
          if (sf.hasDocValues()) {
*** ---> The slow/memory-consuming query (facet.field=c_dstr_doc_sto) goes here counts = DocValuesFacets.getCounts(searcher, docs, field, offset,limit, mincount, missing, sort, prefix); } else if (multiToken || TrieField.getMainValuePrefix(ft) != null) { UnInvertedField uif = UnInvertedField.getUnInvertedField(field, searcher); counts = uif.getCounts(searcher, docs, offset, limit, mincount,missing,sort,prefix);
          } else {
counts = getFieldCacheCounts(searcher, docs, field, offset,limit, mincount, missing, sort, prefix);
          }
          break;
        default:
          throw new AssertionError();
      }
    }

I also believe I have found where the huge memory allocation is done. Did a memory dump while the slow/memory-consuming c_dstr_doc_sto-query was going on (penty of time to do that - 100+ secs). It seems that a lot of memory is allocated under SlowCompositeReaderWrapper.cachedOrdMaps which holds HashMaps containing MultiDocValues$OrdinalMaps as values, and those MultiDocValues$OrdinalMaps have a field ordDeltas-array of MonotonicAppendingLongBuffers ... bla bla ... containing Packed64 containing long-arrays. See https://dl.dropboxusercontent.com/u/25718039/mem-dump-while-searching-on-facet.field-c_dstr_doc_sto.png

SlowCompositeReaderWrapper and all this memory-allocation does not seem to be part of the fast a_dlng_doc_sto-query.

Does this information provide any leads on how to fix response-time/memory-consumption issue? Maybe it helps telling if going to 4.5 will fix the issue?

Regards, Per Steffensen

On 11/5/13 1:47 PM, Per Steffensen wrote:
Looking at threaddumps

It seems like one of the major differences in what is done for c_dstr_doc_sto vs a_dlng_doc_sto is in SimpleFactes.getFacetFieldCounts, where c_dstr_doc_sto takes the "getTermCounts"-path and a_dlng_doc_sto takes the "getListedTermCounts"-path.

String termList = localParams == null ? null : localParams.get(CommonParams.TERMS);
            if (termList != null) {
              res.add(key, getListedTermCounts(facetValue, termList));
            } else {
              res.add(key, getTermCounts(facetValue));
            }

getTermCounts seems to do a lot more and to be a lot more complex than getListedTermCounts

Reply via email to