We're thinking of writing a custom request handler to do that, although the handler will also query all the collections at the backend.
Will this lead to a faster response speed for the user? Regards, Edwin On 8 June 2015 at 00:06, Erick Erickson <[email protected]> wrote: > bq: we still need those information to be stored in a separate collection > for security reasons. > > Not necessarily. I've seen lots of installations where "auth tokens" are > embedded in the document (say groups that can see this doc). Then > the front-end simply attaches &fq=auth_field:(groups each user belongs to) > to every query to restrict access. > > That said, some organizations aren't comfortable with this and demand > separate collections, in which case you're stuck. > > You've defined an architecture though, and one of the consequences > of that is if you have many collections, you'll have to fire off many > queries (perhaps in parallel, but still). There's no magic to get around > that. And it really doesn't matter, because in what you've described > what has to happen is one query has to be fired to each collection. > It doesn't matter whether Solr does that for you or you spawn a bunch > of threads on the client, the same work has to happen somewhere. > > You also have to figure out how to present the results to the user, > if it's simple count you're OK. But scores will _not_ be comparable > across the various collections so the presentation will be challenging. > > Best, > Erick > > On Sun, Jun 7, 2015 at 6:29 AM, Zheng Lin Edwin Yeo > <[email protected]> wrote: > > The reasons we want to have different collections is that each of the > > collections have different fields, and that some collections will contain > > information that are more sensitive than others. > > > > As such, we may need to restrict access to certain collections for some > > users. Although the restriction will be done on the front end client > side, > > but we still need those information to be stored in a separate collection > > for security reasons.. > > > > Regards, > > Edwin > > > > > > On 7 June 2015 at 12:23, Erick Erickson <[email protected]> wrote: > > > >> bq: Yup this information will need to be collected each time the user > >> search > >> for a query, as we want to show the number of records that matches the > >> search query in each of the collections. > >> > >> You're looking at something akin to "federated search". About all you > can > >> do is send out parallel queries to each collection. > >> > >> This is an "interesting" requirement, and I really question whether > it's a > >> wise > >> thing to insist on. I'd really think about going back to the design. > >> For instance, > >> could you consolidate all these collections into a single one, with > perhaps > >> a collection_id? Then the problem is relatively simple, use field > >> collapsing > >> (aka "grouping"). > >> > >> Best, > >> Erick > >> > >> On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo > >> <[email protected]> wrote: > >> > Yup this information will need to be collected each time the user > search > >> > for a query, as we want to show the number of records that matches the > >> > search query in each of the collections. > >> > > >> > Currently I only have 6 collections, but it could increase to > hundreds of > >> > collections in the future. So I'm worried that it could slow down the > >> > system a lot if we have to pass hundreds of queries for each search > >> request. > >> > > >> > Regards, > >> > Edwin > >> > > >> > > >> > On 5 June 2015 at 21:00, Upayavira <[email protected]> wrote: > >> > > >> >> I'm not so sure this is as bad as it sounds. When your collection is > >> >> sharded, no single node knows about the documents in other > shards/nodes, > >> >> so to find the total number, a query will need to go to every node. > >> >> > >> >> Trying to work out something to do a single request to every node, > >> >> combine their collection statistics and aggregate them into a single > >> >> result sounds very complicated, and likely overkill. > >> >> > >> >> Are you needing to collect this information often? Do you have a lot > of > >> >> collections? > >> >> > >> >> Upayavira > >> >> > >> >> > >> >> On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote: > >> >> > I'm trying to write a SolrJ program in Java to read and consolidate > >> all > >> >> > the > >> >> > information into a JSON file, The client will just need to call > this > >> >> > SolrJ > >> >> > program and read this JSON file to get the details. But the problem > >> is we > >> >> > are still querying the Solr once for each collection, just that > this > >> time > >> >> > it is done in the SolrJ program in a for-loop, while previously > it's > >> done > >> >> > on the client side. Not sure will this lead to performance > >> improvement? > >> >> > > >> >> > For your suggestion on spawning a bunch of threads, does it mean > the > >> same > >> >> > thing as I did? > >> >> > > >> >> > Regards, > >> >> > Edwin > >> >> > > >> >> > > >> >> > On 5 June 2015 at 12:03, Erick Erickson <[email protected]> > >> wrote: > >> >> > > >> >> > > Have you considered spawning a bunch of threads, one per > collection > >> >> > > and having them all run in parallel? > >> >> > > > >> >> > > Best, > >> >> > > Erick > >> >> > > > >> >> > > On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo > >> >> > > <[email protected]> wrote: > >> >> > > > The reason we wanted to do a single call is to improve on the > >> >> > > performance, > >> >> > > > as our application requires to list the total number of > records in > >> >> each > >> >> > > of > >> >> > > > the collections, and the number of records that matches the > query > >> >> each of > >> >> > > > the collections. > >> >> > > > > >> >> > > > Currently we are querying each collection one by one to > retrieve > >> the > >> >> > > > numFound value and display them, but this can slow down the > system > >> >> > > > significantly when the number of collection grows. So we are > >> >> thinking of > >> >> > > > ways to improve the speed in this area. > >> >> > > > > >> >> > > > Any other methods which you can suggest that we can do to > overcome > >> >> this > >> >> > > > speed problem? > >> >> > > > > >> >> > > > Regards, > >> >> > > > Edwin > >> >> > > > On 5 Jun 2015 00:16, "Erick Erickson" <[email protected] > > > >> >> wrote: > >> >> > > > > >> >> > > >> Not in a single call that I know of. These are really > orthogonal > >> >> > > >> concepts. Getting the cluster status merely involves reading > the > >> >> > > >> Zookeeper clusterstate whereas getting the total number of > docs > >> for > >> >> > > >> each would involve querying each collection, i.e. going to the > >> Solr > >> >> > > >> nodes themselves. I'd guess it's unlikely to be combined. > >> >> > > >> > >> >> > > >> Best, > >> >> > > >> Erick > >> >> > > >> > >> >> > > >> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo > >> >> > > >> <[email protected]> wrote: > >> >> > > >> > Hi, > >> >> > > >> > > >> >> > > >> > Would like to check, are we able to use the Collection API > or > >> any > >> >> > > other > >> >> > > >> > method to list all the collections in the cluster together > with > >> >> the > >> >> > > >> number > >> >> > > >> > of records in each of the collections in one output? > >> >> > > >> > > >> >> > > >> > Currently, I only know of the List Collections > >> >> > > >> > /admin/collections?action=LIST. However, this only list the > >> names > >> >> of > >> >> > > the > >> >> > > >> > collections that are in the cluster, but not the number of > >> >> records. > >> >> > > >> > > >> >> > > >> > Is there a way to show the number of records in each of the > >> >> > > collections > >> >> > > >> as > >> >> > > >> > well? > >> >> > > >> > > >> >> > > >> > Regards, > >> >> > > >> > Edwin > >> >> > > >> > >> >> > > > >> >> > >> >
