[ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837400#comment-15837400
 ] 

Ishan Chattopadhyaya edited comment on SOLR-5944 at 1/25/17 9:23 AM:
---------------------------------------------------------------------

I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the 
two main experiments I performed:

h2. Regular update vs. In-Place updates on branch

First add 100,000 documents. Each document contains an numeric id field, a 
numeric version field, a text field with around 1000 words (generated using 
lucene-test-framework's {{TestUtil.randomSimpleString()}}), a stored+indexed 
long field (called stored_l) and a non-stored, non-indexed long DV field 
(called inplace_dvo_l).

Then, there were 10 iterations of 25,000 updates to each of the two long 
fields. That is, 25k updates to stored_l, then 25k to inplace_dvo_l, and repeat 
this 10 times. Used a CUSC for sending these updates, using a configurable 
thread count.

Repeated this with different values of thread count to control the parallelism 
of requests. Recorded and plotted the cumulative times (in seconds) per field:
!regular-vs-dv-updates.png!

h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as 
above, but with the following change: only update the stored_l field in every 
iteration. Carried out this experiment on master as well as on jira/solr-5944 
branch. (Indexing times are in seconds.)
!master-vs-5944-regular-updates.png!

h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when 
the document contains text fields. (Hypothesis: speed of in-place updates is 
not proportional to document size)
# It seems that there is a very slight, but not significant, slowdown for 
regular updates (master vs branch).

h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these 
benchmarks. It takes in a git commit sha, checks out the repository, builds a 
package, starts zookeeper and solr, performs the benchmarks, stops and cleans 
up.

https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md

For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1

For the second experiment, I passed in an additional parameter 
{{-onlyRegularUpdates true}}.

My computer setup: Intel Core i7 5820K (6 cores, OC'd to 4.3 GHz), 32GB DDR4 
RAM, Samsung 950 Pro NVMe SSD.


was (Author: ichattopadhyaya):
I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the 
two main experiments I performed:

h2. Regular update vs. In-Place updates on branch

First add 100,000 documents. Each document contains an numeric id field, a 
numeric version field, a text field with around 1000 words (generated using 
lucene-test-framework's {{TestUtil.randomSimpleString()}}), a stored+indexed 
long field (called stored_l) and a non-stored, non-indexed long DV field 
(called inplace_dvo_l).

Then, there were 10 iterations of 25,000 updates to each of the two long 
fields. That is, 25k updates to stored_l, then 25k to inplace_dvo_l, and repeat 
this 10 times. Used a CUSC for sending these updates, using a configurable 
thread count.

Repeated this with different values of thread count to control the parallelism 
of requests. Recorded and plotted the cumulative times per field:
!regular-vs-dv-updates.png!

h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as 
above, but with the following change: only update the stored_l field in every 
iteration. Carried out this experiment on master as well as on jira/solr-5944 
branch.
!master-vs-5944-regular-updates.png!

h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when 
the document contains text fields. (Hypothesis: speed of in-place updates is 
not proportional to document size)
# It seems that there is a very slight, but not significant, slowdown for 
regular updates (master vs branch).

h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these 
benchmarks. It takes in a git commit sha, checks out the repository, builds a 
package, starts zookeeper and solr, performs the benchmarks, stops and cleans 
up.

https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md

For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1

For the second experiment, I passed in an additional parameter 
{{-onlyRegularUpdates true}}.

My computer setup: Intel Core i7 5820K (6 cores, OC'd to 4.3 GHz), 32GB DDR4 
RAM, Samsung 950 Pro NVMe SSD.

> Support updates of numeric DocValues
> ------------------------------------
>
>                 Key: SOLR-5944
>                 URL: https://issues.apache.org/jira/browse/SOLR-5944
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ishan Chattopadhyaya
>            Assignee: Shalin Shekhar Mangar
>         Attachments: defensive-checks.log.gz, 
> demo-why-dynamic-fields-cannot-be-inplace-updated-first-time.patch, 
> DUP.patch, hoss.62D328FA1DEA57FD.fail2.txt, hoss.62D328FA1DEA57FD.fail3.txt, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt, master-vs-5944-regular-updates.png, 
> regular-vs-dv-updates.png, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to