[
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837400#comment-15837400
]
Ishan Chattopadhyaya edited comment on SOLR-5944 at 1/25/17 9:23 AM:
---------------------------------------------------------------------
I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the
two main experiments I performed:
h2. Regular update vs. In-Place updates on branch
First add 100,000 documents. Each document contains an numeric id field, a
numeric version field, a text field with around 1000 words (generated using
lucene-test-framework's {{TestUtil.randomSimpleString()}}), a stored+indexed
long field (called stored_l) and a non-stored, non-indexed long DV field
(called inplace_dvo_l).
Then, there were 10 iterations of 25,000 updates to each of the two long
fields. That is, 25k updates to stored_l, then 25k to inplace_dvo_l, and repeat
this 10 times. Used a CUSC for sending these updates, using a configurable
thread count.
Repeated this with different values of thread count to control the parallelism
of requests. Recorded and plotted the cumulative times (in seconds) per field:
!regular-vs-dv-updates.png!
h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as
above, but with the following change: only update the stored_l field in every
iteration. Carried out this experiment on master as well as on jira/solr-5944
branch. (Indexing times are in seconds.)
!master-vs-5944-regular-updates.png!
h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when
the document contains text fields. (Hypothesis: speed of in-place updates is
not proportional to document size)
# It seems that there is a very slight, but not significant, slowdown for
regular updates (master vs branch).
h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these
benchmarks. It takes in a git commit sha, checks out the repository, builds a
package, starts zookeeper and solr, performs the benchmarks, stops and cleans
up.
https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md
For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1
For the second experiment, I passed in an additional parameter
{{-onlyRegularUpdates true}}.
My computer setup: Intel Core i7 5820K (6 cores, OC'd to 4.3 GHz), 32GB DDR4
RAM, Samsung 950 Pro NVMe SSD.
was (Author: ichattopadhyaya):
I did some multithreaded benchmarks on the jira/solr-5944 branch. Here are the
two main experiments I performed:
h2. Regular update vs. In-Place updates on branch
First add 100,000 documents. Each document contains an numeric id field, a
numeric version field, a text field with around 1000 words (generated using
lucene-test-framework's {{TestUtil.randomSimpleString()}}), a stored+indexed
long field (called stored_l) and a non-stored, non-indexed long DV field
(called inplace_dvo_l).
Then, there were 10 iterations of 25,000 updates to each of the two long
fields. That is, 25k updates to stored_l, then 25k to inplace_dvo_l, and repeat
this 10 times. Used a CUSC for sending these updates, using a configurable
thread count.
Repeated this with different values of thread count to control the parallelism
of requests. Recorded and plotted the cumulative times per field:
!regular-vs-dv-updates.png!
h2. Only regular updates: master branch vs. 5944 branch
To evaluate any impact to regular updates, I performed the same experiment as
above, but with the following change: only update the stored_l field in every
iteration. Carried out this experiment on master as well as on jira/solr-5944
branch.
!master-vs-5944-regular-updates.png!
h2. Conclusion
# It seems the in-place updates are much faster than regular updates, esp. when
the document contains text fields. (Hypothesis: speed of in-place updates is
not proportional to document size)
# It seems that there is a very slight, but not significant, slowdown for
regular updates (master vs branch).
h2. Reproducing these results
The solr-upgrade-tests (SOLR-8581) seemed to be easy to extend for these
benchmarks. It takes in a git commit sha, checks out the repository, builds a
package, starts zookeeper and solr, performs the benchmarks, stops and cleans
up.
https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md
For these tests, I used the following commits:
master: ca50e5b61c2d8bfb703169cea2fb0ab20fd24c6b
jira/solr-5944: fcf71e34f20ea74f99933b80d5bd43cd487751f1
For the second experiment, I passed in an additional parameter
{{-onlyRegularUpdates true}}.
My computer setup: Intel Core i7 5820K (6 cores, OC'd to 4.3 GHz), 32GB DDR4
RAM, Samsung 950 Pro NVMe SSD.
> Support updates of numeric DocValues
> ------------------------------------
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
> Issue Type: New Feature
> Reporter: Ishan Chattopadhyaya
> Assignee: Shalin Shekhar Mangar
> Attachments: defensive-checks.log.gz,
> demo-why-dynamic-fields-cannot-be-inplace-updated-first-time.patch,
> DUP.patch, hoss.62D328FA1DEA57FD.fail2.txt, hoss.62D328FA1DEA57FD.fail3.txt,
> hoss.62D328FA1DEA57FD.fail.txt, hoss.D768DD9443A98DC.fail.txt,
> hoss.D768DD9443A98DC.pass.txt, master-vs-5944-regular-updates.png,
> regular-vs-dv-updates.png, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch,
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt,
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt,
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be
> really nice to have Solr support this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]