[ https://issues.apache.org/jira/browse/SOLR-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729477#comment-17729477 ]
Jason Gerlowski edited comment on SOLR-16812 at 6/5/23 9:15 PM: ---------------------------------------------------------------- bq. It has 1100 docs. How often do we index/fetch more than 1100 docs? For me the relevant number isn't the number of documents; it's the size of the request/response in bytes. "films.json" is hardly half a megabyte. How often does a Solr response exceed that? Absolutely all the time. bq. Here is a benchmark from the wild. I appreciate that this golang ser-de experiment found CBOR to be faster than JSON in that one golang library. But a benchmark "from the wild" doesn't really tell the community anything about the performance of the CBOR code that was committed to Solr this morning. And neither does the JUnit test you linked to. (see the PR review [here|https://github.com/apache/solr/pull/1655#pullrequestreview-1462820682] for specific concerns) How does Solr's new CBOR support compare to Solr's support for JSON (for non-SolrJ users) or for javabin (for our current SolrJ users)? That's what I'm asking about, and it's still an open question as far as I can tell. You're right that Solr-CBOR vs. Solr-JSON should be a slam dunk, but it's an important sanity-check. And Solr-CBOR vs Solr-javabin is an important datapoint to inform how aggressively javabin users might want to switch to CBOR. bq. The point is most of these binary formats are much better than JSON. Sure. But that doesn't make them all the same. Binary formats have tradeoffs in performance, popularity, compatibility w/ various languages, etc. Some are going to be better for Solr on the whole than others. I'm sure you considered these tradeoffs in picking CBOR over other binary formats. I just want to hear a little more about that, if I can. "I have done benchmarks" Great! Meaning the JUnit tests that I commented on in your PR? Or something else? What did those look like? "Avro is not considered because there is no jackson support" [Avro does support Jackson|https://github.com/FasterXML/jackson-dataformats-binary], afaict? As do a number of other formats (Smile, etc.) bq. javabin must go(if possible) [...but] it's a non-trivial task Ugh, yeah. Very little in Solr these days is trivial. But at the same time - I think the project would suffer if we were to punt on this entirely. The scope here is waaay smaller, but this is the same dynamic that's given us 3 (or is it 4?) different faceting modules 😛 If you're unwilling to tackle javabin deprecation proper, would you be willing to at least put together a writeup of what the steps would be and what the hurdles are? was (Author: gerlowskija): bq. It has 1100 docs. How often do we index/fetch more than 1100 docs? For me the relevant number isn't the number of documents; it's the size of the request/response in bytes. "films.json" is hardly half a megabyte. How often does a Solr response exceed that? Absolutely all the time. bq. Here is a benchmark from the wild. I appreciate that this golang ser-de experiment found CBOR to be faster than JSON in that one golang library. But a benchmark "from the wild" doesn't really tell the community anything about the performance of the CBOR code that was committed to Solr this morning. And neither does the JUnit test you linked to. (see the PR review [here|https://github.com/apache/solr/pull/1655#pullrequestreview-1462820682] for specific concerns) How does Solr's new CBOR support compare to Solr's support for JSON (for non-SolrJ users) or for javabin (for our current SolrJ users)? That's what I'm asking about, and it's still an open question as far as I can tell. You're right that Solr-CBOR vs. Solr-JSON should be a slam dunk, but it's an important sanity-check. And Solr-CBOR vs Solr-javabin is an important datapoint to inform how aggressively javabin users might want to switch to CBOR. bq. The point is most of these binary formats are much better than JSON. Sure. But that doesn't make them all the same. Binary formats have tradeoffs in performance, popularity, compatibility w/ various languages, etc. Some are going to be better for Solr on the whole than others. I'm sure you considered these tradeoffs in picking CBOR over other binary formats. I just want to hear a little more about that, if I can. "I have done benchmarks" Great! Meaning the JUnit tests that I commented on in your PR? Or something else? What did those look like? "Avro is not considered because there is no jackson support" [Avro does support Jackson|https://github.com/FasterXML/jackson-dataformats-binary], afaict? As do a number of other formats (Smile, etc.) bq. javabin must go(if possible) [...but] it's a non-trivial task Ugh, yeah. Very little in Solr these days is trivial. But at the same time - I think the project would suffer if we were to punt on this entirely. The scope here is waaay smaller, but this is the same dynamic that's given us 3 (or is it 4?) different faceting modules! If you're unwilling to tackle javabin deprecation proper, would you be willing to at least put together a writeup of what the steps would be and what the hurdles are? > Support CBOR format for update/query > ------------------------------------ > > Key: SOLR-16812 > URL: https://issues.apache.org/jira/browse/SOLR-16812 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Noble Paul > Assignee: Noble Paul > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Javabin is quite efficient and fast . But non-java users have to use JSON > exclusively >  > [CBOR |http://example.com/] is a widely used format that is supported by most > languages. >  > Here is a benchmark of updating using CBOR vs. JSON our films.json > {code:java} > Payload Size (bytes) > ============ >  > json : 633600 > cbor : 290672 > javabin: 234520 > time taken to index > ==================== > JSON: 583ms > CBOR: 509ms > JAVABIN : 549 > time takes to query *:* 1100 docs > ================================== > json: 92 ms > javabin : 70ms > cbor : 63ms{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org