[jira] [Comment Edited] (SOLR-16812) Support CBOR format for update/query

Noble Paul (Jira) Mon, 05 Jun 2023 12:25:08 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729423#comment-17729423
 ]


Noble Paul edited comment on SOLR-16812 at 6/5/23 7:24 PM:
-----------------------------------------------------------

Let's be clear about the objectives of this ticket.

We use JSON to index/query Solr because we do not use java. So we need a more 
efficient method to interact with Solr(especially indexing, because we are 
write heavy). I wanted to pick up a format that has libraries in as many 
languages as possible (Go, python, C# etc)
{quote}What does the CBOR performance look like generally?
{quote}
Here is a [benchmark|https://ugorji.net/blog/benchmarking-serialization-in-go] 
from the wild. The point is most of these binary formats are much better than 
JSON.
{quote}"films.json" feels a little small to be testing this.
{quote}
It has 1100 docs. How often do we index/fetch more than 1100 docs?
{quote}Can you elaborate at all on why you chose CBOR over other alternatives?
{quote}
I have done benchmarks and it concurs with the numbers we see in the wild. Avro 
is not considered because there is no jackson support . As we use jackson in 
the response side, it was an easy fit
{quote}if we introduce a new binary format, then it should come with a plan to 
deprecate or replace javabin.
{quote}
I wish to see it happening. javabin must go(if possible). We need to do a lot 
of refactoring on our Solr/SolrJ code before it is possible. It's a non-trivial 
task.


was (Author: noble.paul):
Let's be clear about the objectives of this ticket.

We use JSON to index/query Solr because we do not use java. So we need a more 
efficient method to interact with Solr(especially indexing, because we are 
write heavy). I wanted to pick up a format that has libraries in as many 
languages as possible (Go, python, C# etc)
{quote}What does the CBOR performance look like generally?
{quote}
Here is a [benchmark|https://ugorji.net/blog/benchmarking-serialization-in-go] 
from the wild. The point is most of these binary formats are much better than 
JSON.
{quote}"films.json" feels a little small to be testing this.
{quote}
It has 1100 docs. How often do we index/fetch more than 1100 docs?
{quote}Can you elaborate at all on why you chose CBOR over other alternatives?
{quote}
I have done benchmarks and it concurs with the numbers we see in the wild. Avro 
is not considered because there is no jackson support 
{quote}if we introduce a new binary format, then it should come with a plan to 
deprecate or replace javabin.
{quote}
I wish to see it happening. javabin must go(if possible). We need to do a lot 
of refactoring on our Solr/SolrJ code before it is possible. It's a non-trivial 
task.

> Support CBOR format for update/query
> ------------------------------------
>
>                 Key: SOLR-16812
>                 URL: https://issues.apache.org/jira/browse/SOLR-16812
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Javabin is quite efficient and fast . But non-java users have to use JSON 
> exclusively
>  
> [CBOR |http://example.com/] is a widely used format that is supported by most 
> languages. 
>  
> Here is a benchmark of updating using CBOR vs. JSON our films.json
> {code:java}
> Payload Size (bytes)
> ============
>  
> json : 633600
> cbor : 290672
> javabin: 234520
> time taken to index
> ====================
> JSON: 583ms
> CBOR: 509ms
> JAVABIN : 549
> time takes to query *:* 1100 docs
> ==================================
> json: 92 ms
> javabin : 70ms 
> cbor : 63ms{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Comment Edited] (SOLR-16812) Support CBOR format for update/query

Reply via email to