[ 
https://issues.apache.org/jira/browse/SOLR-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745047#comment-17745047
 ] 

Mark Robert Miller commented on SOLR-16812:
-------------------------------------------

If you took these benchmarks and data points for this case into a serious room, 
you’d be ignored or kicked out; it’s worth noting since I don’t see any comment 
acknowledging awareness of how misleading a cherry-pick they are. Not that 
you’d JSON complain, it comes off looking relatively fabulous. 

It’s never really the work itself that scares you, either terrible or 
excellent, it’s when there is a lack of communication that indicates one has 
some grasp and idea about what there is to be done and what they have done. You 
lay out those cards, and even the most rushed code or worst implementation 
becomes palatable. 

It’s when you just say, I’ve benchmarked this thing, here is a little data 
point, here is a big one, (two synthetic benchmarks that I’m happy to extend as 
a given for the sake of argument, as being of the type the JMH team themselves 
gold star review), conclusion therefore ABC. You can find the little code here 
and the big code there and try it yourself if you need.  Case closed, ship. 

Now that’s scary. 

Wait what? No mention at all about a realistic gauge of what should be done 
here and what was? Even the mention, and it’s like all the work saved but still 
the ghosts die. “Ok, at least they know what they are doing. I may not agree 
with it, but they know what they are doing.”

You could just go look at how someone involved in a real binary protocol 
project would approach any kind of even minimal comparison around performance, 
and this comparison would look like 99% of those Elastic vs Solr shootouts 
where a super fan pits a formula one-car against a NASCAR and does some super 
fan mechanical tweaking to make a “fair” comparison. We loaded each one up, we 
pulled back hard, let her rip, and you won’t believe what one defaults to a 
straight-up query cache. The winner of course.”

If you took this as a PR to anyone involved in any of these protocols, you 
would get back a Hossman level of bullet points and a professors level of 
projected content.

Don’t need benchmarks reasonable to the change to be on board with Solr getting 
out of the binary protocol business though. You could tell me it’s to something 
you properly benchmarked slower and I’d still be +1 on CBOR, CaptainJackProto 
Hack, or anything you named. 

> Support CBOR format for update/query
> ------------------------------------
>
>                 Key: SOLR-16812
>                 URL: https://issues.apache.org/jira/browse/SOLR-16812
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 9.3
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Javabin is quite efficient and fast . But non-java users have to use JSON 
> exclusively
>  
> [CBOR |http://example.com/] is a widely used format that is supported by most 
> languages. 
>  
> Here is a benchmark of updating using CBOR vs. JSON our films.json
> {code:java}
> Payload Size (bytes)
> ============
>  
> json : 633600
> cbor : 210439  
> javabin: 234520
> time taken to index
> ====================
> JSON: 330ms
> JAVABIN: 216ms
> CBOR: 200ms
> time takes to query *:* 1100 docs
> ==================================
> json: 85 ms
> javabin : 64ms 
> cbor : 53ms{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to