Issues with rename command of the Collections API

2022-09-29 Thread Jesús Roca
Hi,

I had tried to use the rename command of the collection API to rename a
SolrCloud collection but I couldn't get it to work properly.
I'm using Solr 8.11.2 and when I try to rename a collection called "test"
to "test-new" with the following command:

http://localhost:8983/solr/admin/collections?action=RENAME&name=test&target=test-new
it creates an alias called "test" pointing to a collection "test-new"
(which doesn't exists).

Then, if I use for example the following queries:
http://localhost:8983/solr/test/select?q=*%3A*
http://localhost:8983/solr/test-new/select?q=*%3A*
I receive the message "HTTP ERROR 404 Not Found" in both cases.

And then, if I delete the alias "test" created by the previous rename
command I can query the collection with the original name again with:
http://localhost:8983/solr/test/select?q=*%3A*

Has anyone used this command?

Thanks in advance.


Fastest way to index data to solr

2022-09-29 Thread Shankar R
Hi,
 We are having nearly 70-80 millions of data which need to be indexed in
solr 8.6.1.
 We want to choose between Java BInary format or direct JSON format.
 Our source data is DBMS which is a structured data.

Regards
Ravi


Re: Fastest way to index data to solr

2022-09-29 Thread Jan Høydahl
Hi,

If you want to index fast you shold
* Make sure you have enough hardware on the solr side to handle the bulk load
* Index with multiple threads on the client, experiment to find a good number 
based on the number of CPUs on receiving side
* If using JAVA on client, use CloudSolrClient which is smart enough to send 
docs to correct shard
* Do NOT commit during the bulk load, wait until the end
* Experiemnt with batch size, e.g. try sending 500 docs in each update request, 
then 1000 etc until you find the best compromise
* Use JavaBin if you can, it should be slightly faster than JSON, but probably 
not much
* Remember that your RDBMS may be the bottleneck at the end of the day, how 
many rows can it deliver? You may need to partition the data set with SELECT 
... WHERE clauses for each client to read in parallell.

Jan

> 29. sep. 2022 kl. 10:06 skrev Shankar R :
> 
> Hi,
> We are having nearly 70-80 millions of data which need to be indexed in
> solr 8.6.1.
> We want to choose between Java BInary format or direct JSON format.
> Our source data is DBMS which is a structured data.
> 
> Regards
> Ravi



NullPointer Exception when using Cross Collection Join

2022-09-29 Thread Sean Wu
Hi, Solr team.

I'm using Solr 9.0.0 and when I query with Cross Collection Join on
our own data. There is a NullPointer Exception.

Error line: 
solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803

DocIdSetIterator it = new BitSetIterator(bits, 0);

It seems like 'bits' is null and the constructor of BitSetIterator
throws a NullPointer Exception.


With deeper debugging, I found that the code assumes that the length
of 'sets'(bits) and 'leaves' are equal(As shown by code below).
However in my test, the last few elements of 'sets' could be null,
which caused a NullPointer Exception.

sets = (FixedBitSet[]) req.getContext().get("export");

List leaves =
req.getSearcher().getTopReaderContext().leaves();

SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()];
for (int i = 0; i < segmentIterators.length; i++) {
  SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy());
  segmentIterators[i] =
  new SegmentIterator(bits[i], leaves.get(i), sortQueue, sortDoc.copy());
}


Then I tried to skip creating a BitSetIterator object if bits == null,
and it worked as expected. The query results were returned without
data missing.

But I still don't know if the Exception is expected or it is a bug.
Hope to get your response, thanks a lot!


Conditional Joins in Solr

2022-09-29 Thread Kahler, Jason J (US)
Is it possible to have a solr join query only apply under certain conditions? 
We have a solr document store that performs access control following various 
rules related to the data stored in solr. Consider the following scenario


{
Id:"doc1"
linkedIDs:"doc2"
Desc:"desc 1"
Group:"1"
}
{
Id:"doc2"
Desc:"desc 2"
Group:"2"
}
{
Id:"doc3"
Desc:"desc 3"
Group:"3"
}

Suppose Internally for a given user we have a rule that says user cannot see 
anything with Group = "2". Therefore our system augments the following user 
specified query q=Id:* and translates that to q=Id:* AND !Group:2

This results in the search response containing:
{
Id:"doc1"
linkedIDs:"doc2"
Desc:"desc 1"
Group:"1"
}
{
Id:"doc3"
Desc:"desc 3"
Group:"3"
}

However I'd like to somehow leverage join to make it so the user will also not 
get back results that Link/reference products they can't see. Such that they'd 
only get back

{
Id:"doc3"
Desc:"desc 3"
Group:"3"
}


Is it possible to formulate a query like this?



NPE in collapse

2022-09-29 Thread 南拓弥
Hello all,

NPE in collapse hint=top_fc,
When there are no segments, using hint=top_fc in collapse results in NPE.

* query
http://localhost:8983/solr/bukken/select?fq={!collapse
field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams=

* response
"error":{
"msg":"Cannot invoke
\"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
\"this.collapseValues\" is null",
"trace":"java.lang.NullPointerException: Cannot invoke
\"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
\"this.collapseValues\" is null\n\tat
org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat
org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125)
.

Has this issue already been discussed?

Minami Takuya


Hadoop vulnerability in Solr 8.11.2 from scan

2022-09-29 Thread Richard Li
Hi,

Our vulnerability scanning tool found a vulnerability from Hadoop in Solr 
8.11.2. More specifically, it is introduced through 
org.apache.solr:solr-core@8.11.2 › org.apache.hadoop:hadoop-common@3.2.2. The 
published vulnerability is listed as CVE-2022-25168: 
https://lists.apache.org/thread/mxqnb39jfrwgs3j6phwvlrfq4mlox130

This vulnerability is not listed on Solr Security News, but also not under the 
false positives on the SolrSecurity Confluence page.

We were wondering if this is a real vulnerability for Solr and if in particular 
Solr 8.11.2 is affected by this vulnerability?

Thanks in advance.

Kind regards,

Richard


Re: Fastest way to index data to solr

2022-09-29 Thread Andy Lester



> On Sep 29, 2022, at 4:17 AM, Jan Høydahl  wrote:
> 
> * Index with multiple threads on the client, experiment to find a good number 
> based on the number of CPUs on receiving side

That may also mean having multiple clients. We went from taking about 8 hours 
to index our entire 42M rows to about 1.5 hours because we ran 10 indexer 
clients at once. Each indexer takes roughly 1/10th of the data and churns away. 
We don't have any of the clients do a commit. After the indexers are done, we 
run one more time through the queue with a commit at the end.

As Jan says, make sure it's not your database that is the bottleneck, and 
experiment with how many clients you want to have going at once.

Andy

Re: Fastest way to index data to solr

2022-09-29 Thread Dave
Another way to handle this is have your indexing code fork out to as many cores 
as the solr indexing server has. It’s way less work to force the code to run 
itself that many times in parallel, and as long as your sql queries and said 
tables are properly indexed the database shouldn’t be a bottle neck, just need 
to make sure the indexing server has the resources needed, since obviously you 
never index you a query server.  It’s just a copy and tuned different than the 
indexer for fast reads, not writes. 

> On Sep 29, 2022, at 2:21 PM, Andy Lester  wrote:
> 
> 
> 
>> On Sep 29, 2022, at 4:17 AM, Jan Høydahl  wrote:
>> 
>> * Index with multiple threads on the client, experiment to find a good 
>> number based on the number of CPUs on receiving side
> 
> That may also mean having multiple clients. We went from taking about 8 hours 
> to index our entire 42M rows to about 1.5 hours because we ran 10 indexer 
> clients at once. Each indexer takes roughly 1/10th of the data and churns 
> away. We don't have any of the clients do a commit. After the indexers are 
> done, we run one more time through the queue with a commit at the end.
> 
> As Jan says, make sure it's not your database that is the bottleneck, and 
> experiment with how many clients you want to have going at once.
> 
> Andy


Re: Conditional Joins in Solr

2022-09-29 Thread Mikhail Khludnev
Hi, Jason.
Could it be something like
q=id:* -Group:2 -{!join from=id to=linkedIDs}Group:2
?

On Thu, Sep 29, 2022 at 7:47 PM Kahler, Jason J (US)
 wrote:

> Is it possible to have a solr join query only apply under certain
> conditions? We have a solr document store that performs access control
> following various rules related to the data stored in solr. Consider the
> following scenario
>
>
> {
> Id:"doc1"
> linkedIDs:"doc2"
> Desc:"desc 1"
> Group:"1"
> }
> {
> Id:"doc2"
> Desc:"desc 2"
> Group:"2"
> }
> {
> Id:"doc3"
> Desc:"desc 3"
> Group:"3"
> }
>
> Suppose Internally for a given user we have a rule that says user cannot
> see anything with Group = "2". Therefore our system augments the following
> user specified query q=Id:* and translates that to q=Id:* AND !Group:2
>
> This results in the search response containing:
> {
> Id:"doc1"
> linkedIDs:"doc2"
> Desc:"desc 1"
> Group:"1"
> }
> {
> Id:"doc3"
> Desc:"desc 3"
> Group:"3"
> }
>
> However I'd like to somehow leverage join to make it so the user will also
> not get back results that Link/reference products they can't see. Such that
> they'd only get back
>
> {
> Id:"doc3"
> Desc:"desc 3"
> Group:"3"
> }
>
>
> Is it possible to formulate a query like this?
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: NullPointer Exception when using Cross Collection Join

2022-09-29 Thread Mikhail Khludnev
Hi, Sean.
It's not clear if it can be reproduced with bare solr dstro install,
indexing a few docs and querying it; or it's something about
hacking/customising it as a library?

On Thu, Sep 29, 2022 at 7:42 PM Sean Wu  wrote:

> Hi, Solr team.
>
> I'm using Solr 9.0.0 and when I query with Cross Collection Join on
> our own data. There is a NullPointer Exception.
>
> Error line:
> solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803
>
> DocIdSetIterator it = new BitSetIterator(bits, 0);
>
> It seems like 'bits' is null and the constructor of BitSetIterator
> throws a NullPointer Exception.
>
>
> With deeper debugging, I found that the code assumes that the length
> of 'sets'(bits) and 'leaves' are equal(As shown by code below).
> However in my test, the last few elements of 'sets' could be null,
> which caused a NullPointer Exception.
>
> sets = (FixedBitSet[]) req.getContext().get("export");
>
> List leaves =
> req.getSearcher().getTopReaderContext().leaves();
>
> SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()];
> for (int i = 0; i < segmentIterators.length; i++) {
>   SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy());
>   segmentIterators[i] =
>   new SegmentIterator(bits[i], leaves.get(i), sortQueue,
> sortDoc.copy());
> }
>
>
> Then I tried to skip creating a BitSetIterator object if bits == null,
> and it worked as expected. The query results were returned without
> data missing.
>
> But I still don't know if the Exception is expected or it is a bug.
> Hope to get your response, thanks a lot!
>


-- 
Sincerely yours
Mikhail Khludnev


Re: NPE in collapse

2022-09-29 Thread Joel Bernstein
What version of Solr are you using?

Try removing the top_fc hint, does the error still occur?


Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 29, 2022 at 12:47 PM 南拓弥  wrote:

> Hello all,
>
> NPE in collapse hint=top_fc,
> When there are no segments, using hint=top_fc in collapse results in NPE.
>
> * query
> http://localhost:8983/solr/bukken/select?fq={!collapse
> field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams=
>
> * response
> "error":{
> "msg":"Cannot invoke
> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
> \"this.collapseValues\" is null",
> "trace":"java.lang.NullPointerException: Cannot invoke
> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
> \"this.collapseValues\" is null\n\tat
>
> org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat
>
> org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125)
> .
>
> Has this issue already been discussed?
>
> Minami Takuya
>


Re: NPE in collapse

2022-09-29 Thread Joel Bernstein
Oh, there are no segments...

If this error is still occurring in the latest Solr version without top_fc
hint then it's a bug.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 29, 2022 at 3:27 PM Joel Bernstein  wrote:

> What version of Solr are you using?
>
> Try removing the top_fc hint, does the error still occur?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 29, 2022 at 12:47 PM 南拓弥  wrote:
>
>> Hello all,
>>
>> NPE in collapse hint=top_fc,
>> When there are no segments, using hint=top_fc in collapse results in NPE.
>>
>> * query
>> http://localhost:8983/solr/bukken/select?fq={!collapse
>> field=str_field&hint=top_fc}&indent=true&q.op=OR&q=*:*&useParams=
>>
>> * response
>> "error":{
>> "msg":"Cannot invoke
>> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
>> \"this.collapseValues\" is null",
>> "trace":"java.lang.NullPointerException: Cannot invoke
>> \"org.apache.lucene.index.SortedDocValues.getValueCount()\" because
>> \"this.collapseValues\" is null\n\tat
>>
>> org.apache.solr.search.CollapsingQParserPlugin$OrdScoreCollector.(CollapsingQParserPlugin.java:621)\n\tat
>>
>> org.apache.solr.search.CollapsingQParserPlugin$CollectorFactory.getCollector(CollapsingQParserPlugin.java:2125)
>> .
>>
>> Has this issue already been discussed?
>>
>> Minami Takuya
>>
>


Re: NullPointer Exception when using Cross Collection Join

2022-09-29 Thread Joel Bernstein
Can you share the stack trace?

Also in the Solr log there will be a call to the /export handler. Can you
get that from the log?

Then we can isolate the call to the export handler and see if we can
reproduce it.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Sep 29, 2022 at 3:01 PM Mikhail Khludnev  wrote:

> Hi, Sean.
> It's not clear if it can be reproduced with bare solr dstro install,
> indexing a few docs and querying it; or it's something about
> hacking/customising it as a library?
>
> On Thu, Sep 29, 2022 at 7:42 PM Sean Wu  wrote:
>
> > Hi, Solr team.
> >
> > I'm using Solr 9.0.0 and when I query with Cross Collection Join on
> > our own data. There is a NullPointer Exception.
> >
> > Error line:
> >
> solr-9.0.0/solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java#803
> >
> > DocIdSetIterator it = new BitSetIterator(bits, 0);
> >
> > It seems like 'bits' is null and the constructor of BitSetIterator
> > throws a NullPointer Exception.
> >
> >
> > With deeper debugging, I found that the code assumes that the length
> > of 'sets'(bits) and 'leaves' are equal(As shown by code below).
> > However in my test, the last few elements of 'sets' could be null,
> > which caused a NullPointer Exception.
> >
> > sets = (FixedBitSet[]) req.getContext().get("export");
> >
> > List leaves =
> > req.getSearcher().getTopReaderContext().leaves();
> >
> > SegmentIterator[] segmentIterators = new SegmentIterator[leaves.size()];
> > for (int i = 0; i < segmentIterators.length; i++) {
> >   SortQueue sortQueue = new SortQueue(sizes[i], sortDoc.copy());
> >   segmentIterators[i] =
> >   new SegmentIterator(bits[i], leaves.get(i), sortQueue,
> > sortDoc.copy());
> > }
> >
> >
> > Then I tried to skip creating a BitSetIterator object if bits == null,
> > and it worked as expected. The query results were returned without
> > data missing.
> >
> > But I still don't know if the Exception is expected or it is a bug.
> > Hope to get your response, thanks a lot!
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Fastest way to index data to solr

2022-09-29 Thread Gus Heck
>
> * Do NOT commit during the bulk load, wait until the end
>

Unless something changed this is slightly risky. It can lead to very large
transaction logs and very long playback of the tx log on startup. If Solr
goes down during indexing to something like an OOM, it could take a very
long time for it to restart, likely leading to people restarting it because
they think it's stuck... and then it has to start at the beginning
again...  (Ref:
https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/)
... Infrequent commits might be better than none (but definitely you do not
want  not frequent commits, certainly not after every batch, or even worse
after every doc).

-Gus


Re: Fastest way to index data to solr

2022-09-29 Thread Shawn Heisey

On 9/29/22 22:28, Gus Heck wrote:

* Do NOT commit during the bulk load, wait until the end

Unless something changed this is slightly risky. It can lead to very large
transaction logs and very long playback of the tx log on startup.


It is always good practice to have autoCommit configured with 
openSearcher set to false and a relatively low maxTime value.  I believe 
the configs that Solr ships with set this to 15 seconds (actual value 
being 15000 milliseconds), but I prefer making it 60 seconds just so 
there is less overall stress on the system.  That setting will eliminate 
the problem with huge transaction logs.  I believe this is discussed on 
that Lucidworks article that you linked.


A commit that opens a new searcher should be done at the end of the 
major indexing job.  I would do this as a soft commit, but there's 
nothing wrong with making it a hard commit that has openSearcher set to 
true.  On large indexing jobs there is likely to be little difference in 
performance between the two types of commit.


Thanks,
Shawn


Re: Fastest way to index data to solr

2022-09-29 Thread Gus Heck
70 million can be a lot or a little. Doc count is not even half the story.
How much storage space do these documents occupy in the database? Is the
text tweet sized, or multi-megabyte sized clobs, or links files on a file
store that need to be fetched and parsed (or OCR'd or converted from
audio/video to transcripts)? IOT type docs with very minimal text can be
indexed much faster than 50 page pdf documents. With very large clusters
and indexing systems distributing work across a spark cluster I've seen as
high as 1.3M docs/sec... and 70M would be trivial for that system (they had
hundreds of billions). But text documents are typically much, much slower
than that, especially if the text must be extracted from dirty formats such
as pdf or word data, or complex custom analysis is involved, or additional
fetching of files or data to merge into the doc is required.

As for the two formats: If you are indexing with java code, choose Java
Binary. If you are using a non-java language you can use JSON. The rare
case of JSON from java would be if your data was already in JSON format...
then it depends on whether solr is limiting you (do work on the indexers
and use java bin so it has less parsing to do) or your indexing machines
are limiting you (use JSON so your indexers don't have to do the
conversion). Like many things in search "It depends" :)

On Thu, Sep 29, 2022 at 4:07 AM Shankar R  wrote:

> Hi,
>  We are having nearly 70-80 millions of data which need to be indexed in
> solr 8.6.1.
>  We want to choose between Java BInary format or direct JSON format.
>  Our source data is DBMS which is a structured data.
>
> Regards
> Ravi
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)