Re: Solr 8.10.1 performance degradation vs Solr 6.6.1

2022-04-08 Thread Sergio García Maroto
After testing different options. It seems the scale function has some
performance issue.
Other functions are fine. I had to replace it by log() to get similar
functionallity.

Regards,
Sergio Maroto

On Wed, 6 Apr 2022 at 17:18, Sergio García Maroto 
wrote:

> Thanks Mike.
> It seems like Solr 8 is using different  parsing
>
> *Solr6*
> {!boost
> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private
> Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF
> ))^0.5) OR NationalitySFD:(Algeria) OR
> CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:*
> -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent)))
>  {!boost
> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private
> Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF
> ))^0.5) OR NationalitySFD:(Algeria) OR
> CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:*
> -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent)))
>  BoostedQuery(boost(+(CompanyTypeSFD:Private
> Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5)
> NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:*
> -CompanyStatusSFD:***System Delete***)
> +type_level:parent),sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0
>  boost(+(CompanyTypeSFD:Private
> Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5)
> NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:*
> -CompanyStatusSFD:***System Delete***)
> +type_level:parent),sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0)))
> 
>
> *Solr 8*
> {!boost
> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private
> Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF
> ))^0.5) OR NationalitySFD:(Algeria) OR
> CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:*
> -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent)))
>  {!boost
> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private
> Company") OR ((CompNameFreeTextS:(BASF ))^0.5 OR (CompAliasFreeTextS:(BASF
> ))^0.5) OR NationalitySFD:(Algeria) OR
> CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:*
> -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND type_level:(parent)))
>   name="parsedquery">FunctionScoreQuery(FunctionScoreQuery(+(CompanyTypeSFD:Private
> Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5)
> NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:*
> -CompanyStatusSFD:***System Delete***) +type_level:parent), scored by
> boost(sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0)
>   name="parsedquery_toString">FunctionScoreQuery(+(CompanyTypeSFD:Private
> Company ((CompNameFreeTextS:basf)^0.5 (CompAliasFreeTextS:basf)^0.5)
> NationalitySFD:Algeria CompWebS:http://newcompanywebsite) +(+(*:*
> -CompanyStatusSFD:***System Delete***) +type_level:parent), scored by
> boost(sum(scale(int(PeopleTotalSD),1.0,2.0),scale(int(AssignmentsTotalSD),1.0,2.0
> 
>
> On Wed, 6 Apr 2022 at 16:31, Mike Drob  wrote:
>
>> Can you try running with debug=query to see if the two are getting parsed
>> differently?
>>
>> On Wed, Apr 6, 2022 at 8:26 AM Sergio García Maroto 
>> wrote:
>>
>> > Forgot to mention.
>> > Solr 8 =  5 seconds
>> >  Solr 6 = 1 second
>> >
>> > On Wed, 6 Apr 2022 at 14:58, Sergio García Maroto 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I am in the process of upgrading Solr  6.6.1 to Solr 8.10.1.
>> > > In general performance it´s almost the same or even a bit better when
>> > > running performance load testing.
>> > >
>> > > There is a particular scenario where I see an important degradation.
>> > > That´s when I boost results base on a function. I boost results based
>> on
>> > > two fields.
>> > > If I take the  this part out both Solr 8 and Solr 6 are same
>> > performance. *{!boost
>> > > b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}*
>> > > Both servers have identical machines and data. Actually results coming
>> > > back  are the same number.
>> > >
>> > > q={!boost
>> > >
>> >
>> b=sum(scale(PeopleTotalSD,1,2),scale(AssignmentsTotalSD,1,2))}((CompanyTypeSFD:("Private
>> > > Company") OR ((CompNameFreeTextS:(kaiku))^0.5 OR
>> > > (CompAliasFreeTextS:(kaiku))^0.5) OR NationalitySFD:(Algeria) OR
>> > > CompWebS:(http\:\/\/newCompanyWebsite)) AND ((*:*
>> > > -CompanyStatusSFD:("\*\*\*System Delete\*\*\*")) AND
>> > > type_level:(parent)))&start=0&rows=7&fl=CompanyID&sort=score desc
>> > >
>> > > Any ideas if this is for some reason?
>> > >
>> > > Regards,
>> > > Sergio
>> > >
>> >
>>
>


Re: Solr as a dedicated data store?

2022-04-08 Thread dmitri maziuk

On 2022-04-07 11:51 PM, Shawn Heisey wrote:
...
As I understand it, ES offers reindex capability by storing the entire 
input document into a field in the index.  Which means that the index 
will be lot bigger than it needs to be, which is going to affect 
performance.  If the field is not indexed, then the performance impact 
may not be huge, but it will not be zero.  And it wouldn't really 
improve the speed of a full reindex, it just makes it possible to do a 
reindex without an external data source.


The same thing can be done with Solr, and it is something I would 
definitely say needs to be part of any index design where Solr will be a 
primary data store.  That capability should be available in Solr, but I 
do not think it should be enabled by default.


What would be the advantage over dumping the documents into a text file 
(xml, json) and doing a full re-import? In principle you could dump 
everything Solr needs into the file and only check if it's all there 
during the import; that plus the protocol overhead would be the only 
downside. And deleting the existing index will take a little extra time.


The upside if we can stick the files into git and have versions, it 
should compress really well, we can clone it to off-site storage etc. etc.


Dima


Verifying the replica.type parameter behavior

2022-04-08 Thread Olivia Crusoe
Hello,

I've been trying out the shards.preference=replica.type:PULL  as a parameter 
appended onto queries, as well as trying out including it in the search request 
handler. For context, we have a collection that is 2 shards, 2 TLOGs per shard, 
and n number of PULLs (can change depending on if we wish to add more replicas 
during higher periods of traffic). This is being tested in Solr 8.8.2.

In an effort to verify that the queries were being handled by only the PULL 
replicas, I've been looking at our Solr request logs, expecting to see only 
pull replica types handling our queries. Yet, I am seeing a number of 
"replica": "x:collectionname_shardx_replica_tx" included in the request logs, 
which seems to insinuate that the TLOG replicas are still serving queries.

I have two questions: 1.) am I right in assuming that setting the replica.type 
should be exclusively sending requests to PULL replicas? 2.) If that is true, 
why would I still be seeing TLOG types on the Solr request logs? Is there some 
type of routing done behind-the-scenes that is not visible in the request logs?

Thank you in advance for any guidance you can provide.

[cid:image001.png@01D84B2C.876A3B40]
Olivia Crusoe
Software Engineer Lead - Search



launched solr 8.10 fails to "get system information"/create core, error: 'Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "org.apache.http.client.ClientProtocolExce

2022-04-08 Thread PGNet Dev

i'm re-installing an instance of solr 8.10.1 on a dedicated box

it's launched,

ps ax | grep solr
  85664 ?Sl 1:26 /usr/lib/jvm/java-18-openjdk/bin/java 
-server -Xms512m -Xmx512m -XX:+UseG1GC -XX:+PerfDisableSharedMem 
-XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+AlwaysPreTouch 
-XX:+ExplicitGCInvokesConcurrent 
-Xlog:gc*:file=/var/log/solr/solr_gc.log:time,uptime:filecount=9,filesize=20M 
-Dsolr.jetty.inetaccess.includes=10.1.1.50, 127.0.0.1, 
-Dsolr.jetty.inetaccess.excludes= -Dsolr.log.level=DEBUG 
-Dsolr.log.dir=/var/log/solr -Djetty.port=8984 -DSTOP.PORT=7984 
-DSTOP.KEY=solrrocks -Dhost=solr.example.com -Duser.timezone=America/New_York 
-XX:-OmitStackTraceInFastThrow -XX:OnOutOfMemoryError=/srv/solr/bin/oom_solr.sh 
8984 /var/log/solr -Djetty.home=/srv/solr/server 
-Dsolr.solr.home=/data/solr/data -Dsolr.data.home= 
-Dsolr.install.dir=/srv/solr/solr 
-Dsolr.default.confdir=/srv/solr/server/solr/configsets/_default/conf 
-Dlog4j.configurationFile=/data/solr/log4j2.xml -Djetty.host=solr.example.com 
-Xss256k -Dsolr.jetty.keystore=/srv/ssl/solr/solr.example.com.server.EC.pfx 
-Dsolr.jetty.keystore.type=PKCS12 
-Dsolr.jetty.truststore=/srv/ssl/solr/solr.example.com.server.EC.pfx 
-Dsolr.jetty.truststore.type=PKCS12 -Dsolr.jetty.ssl.needClientAuth=false 
-Dsolr.jetty.ssl.wantClientAuth=false 
-Djavax.net.ssl.keyStore=/srv/ssl/solr/solr.example.com.server.EC.pfx 
-Djavax.net.ssl.keyStoreType=PKCS12 -Dsolr.ssl.checkPeerName=false 
-Djavax.net.ssl.trustStore=/srv/ssl/solr/solr.example.com.server.EC.pfx 
-Djavax.net.ssl.trustStoreType=PKCS12 -Dsolr.jetty.https.port=8984 
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
 -Dbasicauth=solradm:solrRocks -Dsolr.log.muteconsole -jar start.jar 
--module=https --lib=/srv/solr/server/solr-webapp/webapp/WEB-INF/lib/* 
--module=gzip

and responds, as usual/expected, in browser @

https://solr.example.com:8984/solr

where

host solr.example.com
solr.example.com has address 10.1.1.50

as well as at shell

sudo -u solr /srv/solr/bin/solr  version
8.10.1

but, checking status, FAILs,

sudo -u solr /srv/solr/bin/solr  status

Found 1 Solr nodes:

Solr process 85664 running on port 8984

ERROR: Failed to get system information from http://localhost:8984/solr due to: 
java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" 
because the return value of "org.apache.http.client.ClientProtocolException.getMessage()" 
is null

it's looking for

http://localhost:8984/solr

which, @ browser, is non responsive, as in

cat /etc/default/solr.in.sh
...
SOLR_HOST="solr.example.com"
SOLR_OPTS="$SOLR_OPTS -Djetty.host=solr.example.com"
...

and checking @

https://solr.example.com:8984/solr/#/
...
-Dhost=solr.example.com
...
-Djetty.host=solr.example.com
-Djetty.port=8984
...

also, and more problematic, creating a new core fails

sudo -u solr /srv/solr/bin/solr create \
 -c test \
 -p 8984

WARNING: Using _default configset with data driven schema 
functionality. NOT RECOMMENDED for production use.
 To turn off: bin/solr config -c dovecot -p 8984 
-action set-user-property -property update.autoCreateFields -value false

ERROR: Cannot invoke "String.contains(java.lang.CharSequence)" because 
the return value of "org.apache.http.client.ClientProtocolException.getMessage()" is null

java.lang.NullPointerException: Cannot invoke 
"String.contains(java.lang.CharSequence)" because the return value of 
"org.apache.http.client.ClientProtocolException.getMessage()" is null
at 
org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:761)
at 
org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:673)
at 
org.apache.solr.util.SolrCLI$CreateTool.runImpl(SolrCLI.java:2170)
at 
org.apache.solr.util.SolrCLI$ToolBase.runTool(SolrCLI.java:196)
at org.apache.solr.util.SolrCLI.main(SolrCLI.java:304)

i suspect i need to tell the instance config to "get system information" NOT 
from

http://localhost:8984/solr

but instead

https://solr.example.com:8984/solr

where's that set/configured ?
or is there a different cause at work here?


Re: Verifying the replica.type parameter behavior

2022-04-08 Thread Michael Gibney
`shards.preference` only affects the backend routing of requests to
individual cores/shards. These backend requests should have an additional
`distrib=false` param, and are the requests that are generally the most
resource-intensive, in that they do the initial per-shard domain-narrowing.

I'm fairly certain that "top-level" requests are logged as being associated
with some arbitrary shard (of the associated collection) on whatever node
the external request happens to hit. I suspect that the requests you're
seeing that appear to be associated with an unexpected shard are top-level
requests (without a `distrib=false` param). If so, then `shards.preference`
is likely working as intended. I'm curious whether you're able to confirm
that all `distrib=false` requests are all indeed associated with PULL
replicas?

On Fri, Apr 8, 2022 at 2:22 PM Olivia Crusoe  wrote:

> Hello,
>
>
>
> I’ve been trying out the shards.preference=replica.type:PULL  as a
> parameter appended onto queries, as well as trying out including it in the
> search request handler. For context, we have a collection that is 2 shards,
> 2 TLOGs per shard, and n number of PULLs (can change depending on if we
> wish to add more replicas during higher periods of traffic). This is being
> tested in Solr 8.8.2.
>
>
>
> In an effort to verify that the queries were being handled by only the
> PULL replicas, I’ve been looking at our Solr request logs, expecting to see
> only pull replica types handling our queries. Yet, I am seeing a number of 
> *"replica":
> "x:collectionname_shardx_replica_tx"* included in the request logs, which
> seems to insinuate that the TLOG replicas are still serving queries.
>
>
>
> I have two questions: 1.) am I right in assuming that setting the
> replica.type should be exclusively sending requests to PULL replicas? 2.)
> If that is true, why would I still be seeing TLOG types on the Solr request
> logs? Is there some type of routing done behind-the-scenes that is not
> visible in the request logs?
>
>
>
> Thank you in advance for any guidance you can provide.
>
>
>
> Olivia Crusoe
>
> Software Engineer Lead – Search
>
>
>


Re: Need help with DIH plugin SOLR

2022-04-08 Thread Neha Gupta

Thanks Dominique...will look into it.

On 06/04/2022 22:56, Dominique Bejean wrote:

Hi,

I suggest to take a look at Apache Nifi ETL in order to replace DIH. It can
read and write into Solr,

Dominique

Le mer. 6 avr. 2022 à 12:44, Jan Høydahl  a écrit :


Hi,

The upcoming 9.0 release does not have DIH. And it is unclear whether the
plugin on github will be updated to work with 9.0, if it does, you may of
course use it.
But the common recommendation is to replace DIH with some other DB
indexing tool outside of Solr. Either find some tool that supports Solr
OOTB, like Apache ManifoldCF, or perhaps better, roll your own client code
reading from DB and pushing documents to Solr.

Jan


6. apr. 2022 kl. 11:29 skrev Neha Gupta :

Ok Thanks a lot.

I have one more question even with SOLR 8.11, the data import handler is

coming but yes there is a message on GUI that it is deprecated and will be
removed in the future version.

Can we use that for indexing relational database as i am able to index a

part of my data with the default one which is coming with SOLR 8.11 without
any DIH package installed?


PS: - Our data is static and won't change in future. Need to re-index

only when indexed data in SOLR goes corrupt.


Thanks and Regards

Neha Gupta


On 05/04/2022 15:27, James Greene wrote:

Stand alone mode does not use zookeeper, you do not need to upload

configs

using zkcli.sh.



On Tue, Apr 5, 2022, 6:55 AM Neha Gupta  wrote:


Dear Solr Community,

Need your help.
I am running SOLR(8.11) as a standalone (on Windows) and want to index
from the relational database(Postgres) and as such i tried to install
DIH plugin by following the instructions given at: -
https://github.com/rohitbemax/dataimporthandler

I am stuck at step "Add the configurations and reload the collection"

sh zkcli.sh -z localhost:9983 -cmd putfile
"/configs/products.AUTOCREATED/data-config.xml" data-config.xml

I am getting error : -
*Error: Could not find or load main class org.apache.solr.cloud.ZkCLI*



Request you to please help me with this.


Thanks and Regards
Neha Gupta





Re: Need help with DIH plugin SOLR

2022-04-08 Thread Mike Drob
Hi Dominique,

Are there any guides available on using Nifi ETL with Solr? What do you
consider to be good references for it?

Thanks,
Mike

On Wed, Apr 6, 2022 at 3:56 PM Dominique Bejean 
wrote:

> Hi,
>
> I suggest to take a look at Apache Nifi ETL in order to replace DIH. It can
> read and write into Solr,
>
> Dominique
>
> Le mer. 6 avr. 2022 à 12:44, Jan Høydahl  a écrit :
>
> > Hi,
> >
> > The upcoming 9.0 release does not have DIH. And it is unclear whether the
> > plugin on github will be updated to work with 9.0, if it does, you may of
> > course use it.
> > But the common recommendation is to replace DIH with some other DB
> > indexing tool outside of Solr. Either find some tool that supports Solr
> > OOTB, like Apache ManifoldCF, or perhaps better, roll your own client
> code
> > reading from DB and pushing documents to Solr.
> >
> > Jan
> >
> > > 6. apr. 2022 kl. 11:29 skrev Neha Gupta :
> > >
> > > Ok Thanks a lot.
> > >
> > > I have one more question even with SOLR 8.11, the data import handler
> is
> > coming but yes there is a message on GUI that it is deprecated and will
> be
> > removed in the future version.
> > >
> > > Can we use that for indexing relational database as i am able to index
> a
> > part of my data with the default one which is coming with SOLR 8.11
> without
> > any DIH package installed?
> > >
> > >
> > > PS: - Our data is static and won't change in future. Need to re-index
> > only when indexed data in SOLR goes corrupt.
> > >
> > >
> > > Thanks and Regards
> > >
> > > Neha Gupta
> > >
> > >
> > > On 05/04/2022 15:27, James Greene wrote:
> > >> Stand alone mode does not use zookeeper, you do not need to upload
> > configs
> > >> using zkcli.sh.
> > >>
> > >>
> > >>
> > >> On Tue, Apr 5, 2022, 6:55 AM Neha Gupta 
> wrote:
> > >>
> > >>> Dear Solr Community,
> > >>>
> > >>> Need your help.
> > >>> I am running SOLR(8.11) as a standalone (on Windows) and want to
> index
> > >>> from the relational database(Postgres) and as such i tried to install
> > >>> DIH plugin by following the instructions given at: -
> > >>> https://github.com/rohitbemax/dataimporthandler
> > >>>
> > >>> I am stuck at step "Add the configurations and reload the collection"
> > >>>
> > >>> sh zkcli.sh -z localhost:9983 -cmd putfile
> > >>> "/configs/products.AUTOCREATED/data-config.xml" data-config.xml
> > >>>
> > >>> I am getting error : -
> > >>> *Error: Could not find or load main class
> org.apache.solr.cloud.ZkCLI*
> > >>>
> > >>>
> > >>>
> > >>> Request you to please help me with this.
> > >>>
> > >>>
> > >>> Thanks and Regards
> > >>> Neha Gupta
> > >>>
> >
> >
>


Re: Need help with DIH plugin SOLR

2022-04-08 Thread dmitri maziuk

On 2022-04-08 4:45 PM, Mike Drob wrote:

Hi Dominique,

Are there any guides available on using Nifi ETL with Solr? What do you
consider to be good references for it?


ETL is likely an overkill for DIH, about the only reason you'd use it is 
if you can use a scripting language.


Dima


Re: Solr as a dedicated data store?

2022-04-08 Thread James Greene
I think you are speaking to the point that the requirement to have all your
data rebuildable from source isn't a hard requirement as their are ways to
re-index without having access to the original source (you still need the
full docs stored in solr just not indexed). By looking at solr from that
pov it becomes more approachable as a primary data store.

On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk  wrote:

> On 2022-04-07 11:51 PM, Shawn Heisey wrote:
> ...
> > As I understand it, ES offers reindex capability by storing the entire
> > input document into a field in the index.  Which means that the index
> > will be lot bigger than it needs to be, which is going to affect
> > performance.  If the field is not indexed, then the performance impact
> > may not be huge, but it will not be zero.  And it wouldn't really
> > improve the speed of a full reindex, it just makes it possible to do a
> > reindex without an external data source.
> >
> > The same thing can be done with Solr, and it is something I would
> > definitely say needs to be part of any index design where Solr will be a
> > primary data store.  That capability should be available in Solr, but I
> > do not think it should be enabled by default.
> >
> What would be the advantage over dumping the documents into a text file
> (xml, json) and doing a full re-import? In principle you could dump
> everything Solr needs into the file and only check if it's all there
> during the import; that plus the protocol overhead would be the only
> downside. And deleting the existing index will take a little extra time.
>
> The upside if we can stick the files into git and have versions, it
> should compress really well, we can clone it to off-site storage etc. etc.
>
> Dima
>


Re: Solr as a dedicated data store?

2022-04-08 Thread David Hastings
As long as your documents are simple in structure. A key value or an array
for any given field, you’re good to go. Anything multi level, you’re out of
luck. Not sure how relevant this link is still but:
https://stackoverflow.com/questions/22192904/is-solr-support-complex-types-like-structure-for-multivalued-fields


It’s from 2017 but believe it still holds true however there are
possibilities with nested documents
https://solr.apache.org/guide/8_1/indexing-nested-documents.html

Admittedly I have not gotten too in depth myself with child documents for
more complex data structures. And yeah you could just store the complex
data structure into a single large text stored and non indexed field as
json and only index what you will be searching on.

Another option I’ve experimented with is two completely different cores or
even completely different solr servers (I use stand alone a lot) use one
for searching and use the result to pull the raw data from the other
“storage server” by an identifier.  This is actually surprisingly fast.

It’s a hack, you’re using the wrong tool for the job, but it can be done if
you REALLY want to and get creative.

Good luck. Curious to hear what you come up with
-dave


On Fri, Apr 8, 2022 at 8:36 PM James Greene 
wrote:

> I think you are speaking to the point that the requirement to have all your
> data rebuildable from source isn't a hard requirement as their are ways to
> re-index without having access to the original source (you still need the
> full docs stored in solr just not indexed). By looking at solr from that
> pov it becomes more approachable as a primary data store.
>
> On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk 
> wrote:
>
> > On 2022-04-07 11:51 PM, Shawn Heisey wrote:
> > ...
> > > As I understand it, ES offers reindex capability by storing the entire
> > > input document into a field in the index.  Which means that the index
> > > will be lot bigger than it needs to be, which is going to affect
> > > performance.  If the field is not indexed, then the performance impact
> > > may not be huge, but it will not be zero.  And it wouldn't really
> > > improve the speed of a full reindex, it just makes it possible to do a
> > > reindex without an external data source.
> > >
> > > The same thing can be done with Solr, and it is something I would
> > > definitely say needs to be part of any index design where Solr will be
> a
> > > primary data store.  That capability should be available in Solr, but I
> > > do not think it should be enabled by default.
> > >
> > What would be the advantage over dumping the documents into a text file
> > (xml, json) and doing a full re-import? In principle you could dump
> > everything Solr needs into the file and only check if it's all there
> > during the import; that plus the protocol overhead would be the only
> > downside. And deleting the existing index will take a little extra time.
> >
> > The upside if we can stick the files into git and have versions, it
> > should compress really well, we can clone it to off-site storage etc.
> etc.
> >
> > Dima
> >
>


Re: Solr as a dedicated data store?

2022-04-08 Thread dmitri maziuk

On 2022-04-08 7:36 PM, James Greene wrote:

I think you are speaking to the point that the requirement to have all your
data rebuildable from source isn't a hard requirement as their are ways to
re-index without having access to the original source (you still need the
full docs stored in solr just not indexed). By looking at solr from that
pov it becomes more approachable as a primary data store.


I may have a different definition of primary data store, one in which 
it's a store for primary data.


Dima