[ https://issues.apache.org/jira/browse/SOLR-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729423#comment-17729423 ]
Noble Paul edited comment on SOLR-16812 at 6/5/23 7:24 PM: ----------------------------------------------------------- Let's be clear about the objectives of this ticket. We use JSON to index/query Solr because we do not use java. So we need a more efficient method to interact with Solr(especially indexing, because we are write heavy). I wanted to pick up a format that has libraries in as many languages as possible (Go, python, C# etc) {quote}What does the CBOR performance look like generally? {quote} Here is a [benchmark|https://ugorji.net/blog/benchmarking-serialization-in-go] from the wild. The point is most of these binary formats are much better than JSON. {quote}"films.json" feels a little small to be testing this. {quote} It has 1100 docs. How often do we index/fetch more than 1100 docs? {quote}Can you elaborate at all on why you chose CBOR over other alternatives? {quote} I have done benchmarks and it concurs with the numbers we see in the wild. Avro is not considered because there is no jackson support . As we use jackson in the response side, it was an easy fit {quote}if we introduce a new binary format, then it should come with a plan to deprecate or replace javabin. {quote} I wish to see it happening. javabin must go(if possible). We need to do a lot of refactoring on our Solr/SolrJ code before it is possible. It's a non-trivial task. was (Author: noble.paul): Let's be clear about the objectives of this ticket. We use JSON to index/query Solr because we do not use java. So we need a more efficient method to interact with Solr(especially indexing, because we are write heavy). I wanted to pick up a format that has libraries in as many languages as possible (Go, python, C# etc) {quote}What does the CBOR performance look like generally? {quote} Here is a [benchmark|https://ugorji.net/blog/benchmarking-serialization-in-go] from the wild. The point is most of these binary formats are much better than JSON. {quote}"films.json" feels a little small to be testing this. {quote} It has 1100 docs. How often do we index/fetch more than 1100 docs? {quote}Can you elaborate at all on why you chose CBOR over other alternatives? {quote} I have done benchmarks and it concurs with the numbers we see in the wild. Avro is not considered because there is no jackson support {quote}if we introduce a new binary format, then it should come with a plan to deprecate or replace javabin. {quote} I wish to see it happening. javabin must go(if possible). We need to do a lot of refactoring on our Solr/SolrJ code before it is possible. It's a non-trivial task. > Support CBOR format for update/query > ------------------------------------ > > Key: SOLR-16812 > URL: https://issues.apache.org/jira/browse/SOLR-16812 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Noble Paul > Assignee: Noble Paul > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Javabin is quite efficient and fast . But non-java users have to use JSON > exclusively > > [CBOR |http://example.com/] is a widely used format that is supported by most > languages. > > Here is a benchmark of updating using CBOR vs. JSON our films.json > {code:java} > Payload Size (bytes) > ============ > > json : 633600 > cbor : 290672 > javabin: 234520 > time taken to index > ==================== > JSON: 583ms > CBOR: 509ms > JAVABIN : 549 > time takes to query *:* 1100 docs > ================================== > json: 92 ms > javabin : 70ms > cbor : 63ms{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org