ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Micha
Hi,

my table has a sha-1 sum as partition key. Would in this case the
ByteOrdered partitioner be a better choice than the Murmur3partitioner,
since the keys are quite random?


cheers,
 Michael


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
Probably best to avoid bop even if you are aflready hashing keys yourself.
What do you do when checksuma collide? It is possible right?

On Saturday, February 11, 2017, Micha  wrote:

> Hi,
>
> my table has a sha-1 sum as partition key. Would in this case the
> ByteOrdered partitioner be a better choice than the Murmur3partitioner,
> since the keys are quite random?
>
>
> cheers,
>  Michael
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Jonathan Haddad
The odds of only using a sha1 as your partition key for every table you
ever create is low. You will regret BOP until the end of time.
On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo 
wrote:

> Probably best to avoid bop even if you are aflready hashing keys yourself.
> What do you do when checksuma collide? It is possible right?
>
> On Saturday, February 11, 2017, Micha  wrote:
>
> Hi,
>
> my table has a sha-1 sum as partition key. Would in this case the
> ByteOrdered partitioner be a better choice than the Murmur3partitioner,
> since the keys are quite random?
>
>
> cheers,
>  Michael
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
On Sat, Feb 11, 2017 at 10:54 AM, Jonathan Haddad  wrote:

> The odds of only using a sha1 as your partition key for every table you
> ever create is low. You will regret BOP until the end of time.
> On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo 
> wrote:
>
>> Probably best to avoid bop even if you are aflready hashing keys
>> yourself. What do you do when checksuma collide? It is possible right?
>>
>> On Saturday, February 11, 2017, Micha  wrote:
>>
>> Hi,
>>
>> my table has a sha-1 sum as partition key. Would in this case the
>> ByteOrdered partitioner be a better choice than the Murmur3partitioner,
>> since the keys are quite random?
>>
>>
>> cheers,
>>  Michael
>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
Yes, the odds are low.

https://en.wikipedia.org/wiki/Birthday_problem

This has already been addressed for RP:

https://issues.apache.org/jira/browse/CASSANDRA-1034

If you wanted to BOP and hash yourself you would have to make your primary
key something like (shavalue,actualvalue) to ensure two keys do not
overwrite each other.


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Micha
I think I was not clear enough...

I have *one* table for which the row data contains (among other values)
a sha-1 sum. There are no collisions.  I thought computing a murmur hash
for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
data more random than it already is.   So it's just one table where this
matters.


 Michael


Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> The odds of only using a sha1 as your partition key for every table you
> ever create is low. You will regret BOP until the end of time.
> On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo  > wrote:
> 
> Probably best to avoid bop even if you are aflready hashing keys
> yourself. What do you do when checksuma collide? It is possible right?
> 
> On Saturday, February 11, 2017, Micha  > wrote:
> 
> Hi,
> 
> my table has a sha-1 sum as partition key. Would in this case the
> ByteOrdered partitioner be a better choice than the
> Murmur3partitioner,
> since the keys are quite random?
> 
> 
> cheers,
>  Michael
> 
> 
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell
> check than usual.
> 


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
On Sat, Feb 11, 2017 at 1:47 PM, Micha  wrote:

> I think I was not clear enough...
>
> I have *one* table for which the row data contains (among other values)
> a sha-1 sum. There are no collisions.  I thought computing a murmur hash
> for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
> data more random than it already is.   So it's just one table where this
> matters.
>
>
>  Michael
>
>
> Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> > The odds of only using a sha1 as your partition key for every table you
> > ever create is low. You will regret BOP until the end of time.
> > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo  > > wrote:
> >
> > Probably best to avoid bop even if you are aflready hashing keys
> > yourself. What do you do when checksuma collide? It is possible
> right?
> >
> > On Saturday, February 11, 2017, Micha  > > wrote:
> >
> > Hi,
> >
> > my table has a sha-1 sum as partition key. Would in this case the
> > ByteOrdered partitioner be a better choice than the
> > Murmur3partitioner,
> > since the keys are quite random?
> >
> >
> > cheers,
> >  Michael
> >
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell
> > check than usual.
> >
>

The problem of using BOP is the partitioner is not set on the
table/keyspace level but it is set cluster wide. So if you have two tables
with different key distribution there is no way to balanced them out.

BOP I would almost consider it quasi supported at this point:

http://stackoverflow.com/questions/27939234/cassandra-byteorderedpartitioner

"no seriously your doing it wrong"

I have thought about this often, if you really need BOP, for example your
generating a web index and you want to co-locate data for the same domain
so you can scan it, Cassandra is a bad fit. I'm not convinced that a
secondary index/mv fills the need. Hbase seems a more logical choice (to
me). Where the data is logically ordered by key and the protocol splits
regions as they grow.


Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-11 Thread Kant Kodali
Saw this one today...

https://news.ycombinator.com/item?id=13624062

On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans 
wrote:

> On Mon, Jan 2, 2017 at 2:26 PM, Edward Capriolo 
> wrote:
> > Lets be clear:
> > What I am saying is avoiding being loose with the word "free"
> >
> > https://en.wikipedia.org/wiki/Free_software_license
> >
> > Many things with the JVM are free too. Most importantly it is free to
> use.
> >
> > https://www.java.com/en/download/faq/distribution.xml
> >
> > As it relates to this conversation: I am not aware of anyone running
> > Cassandra that has modified upstream JVM to make Cassandra run
> > better/differently *. Thus the license around the Oracle JVM is roughly
> > meaningless to the user/developer of cassandra.
> >
> > * The only group I know that took an action to modify upstream was Acunu.
> > They had released a modified Linux Kernel with a modified Apache
> Cassandra.
> > http://cloudtweaks.com/2011/02/data-storage-startup-acunu-
> raises-3-6-million-to-launch-its-first-product/.
> > That product no longer exists.
> >
> > "I don't how to read any of this.  It sounds like you're saying that a
> > JVM is something that cannot be produced as a Free Software project,"
> >
> > What I am saying is something like the JVM "could" be produced as a "free
> > software project". However, the argument that I was making is that the
> > popular viable languages/(including vms or runtime to use them) today
> > including Java, C#, Go, Swift are developed by the largest tech
> companies in
> > the world, and as such I do believe a platform would be viable.
> Specifically
> > I believe without Oracle driving Java OpenJDK would not be viable.
> >
> > There are two specific reasons.
> > 1) I do not see large costly multi-year initiatives like G1 happening
> > 2) Without guidance/leadership that sun/oracle I do not see new features
> > that change the language like lambda's and try multi-catch happening in a
> > sane way.
> >
> > I expanded upon #2 be discussing my experience with standards like c++
> 11,
> > 14,17 and attempting to take compiling working lambda code on linux GCC
> to
> > microsoft visual studio and having it not compile. In my opinion, Java
> only
> > wins because as a platform it is very portable as both source and binary
> > code. Without leadership on that front I believe that over time the
> language
> > would suffer.
>
> I realize that you're trying to be pragmatic about all of this, but
> what I don't think you realize, is that so am I.
>
> Java could change hands at any time (it has once already), or Oracle
> leadership could decide to go in a different direction.  Imagine for
> example that they relicensed it to exclude use by orientation or
> religion, Cassandra would implicitly carry these restrictions as well.
> Imagine that they decided to provide a back-door to the NSA, Cassandra
> would then also contain such a back-door.  These might sound
> hypothetical, but there is plenty of precedent here.
>
> OpenJDK benefits from the same resources and leadership from Oracle
> that you value, but is licensed and distributed in a way that
> safeguards us from a day when Oracle becomes less benevolent, (if that
> were to happen, some other giant company could assume the mantle of
> leadership).
>
> All I'm really suggesting is that we at least soften our requirement
> on the Oracle JVM, and perhaps perform some test runs in CI against
> OpenJDK.  Actively discouraging people from using the Free Software
> alternative here, one that is working well for many, isn't the
> behavior I'd normally expect from a Free Software project.
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Jonathan Haddad
The time it takes to calculate the hash is so insignificant that it doesn't
even remotely come close to justifying all the drawbacks.

You can, of course, benchmark it. I wouldn't bother though. BOP is
basically dead.

On Sat, Feb 11, 2017 at 10:47 AM Micha  wrote:

> I think I was not clear enough...
>
> I have *one* table for which the row data contains (among other values)
> a sha-1 sum. There are no collisions.  I thought computing a murmur hash
> for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
> data more random than it already is.   So it's just one table where this
> matters.
>
>
>  Michael
>
>
> Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> > The odds of only using a sha1 as your partition key for every table you
> > ever create is low. You will regret BOP until the end of time.
> > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo  > > wrote:
> >
> > Probably best to avoid bop even if you are aflready hashing keys
> > yourself. What do you do when checksuma collide? It is possible
> right?
> >
> > On Saturday, February 11, 2017, Micha  > > wrote:
> >
> > Hi,
> >
> > my table has a sha-1 sum as partition key. Would in this case the
> > ByteOrdered partitioner be a better choice than the
> > Murmur3partitioner,
> > since the keys are quite random?
> >
> >
> > cheers,
> >  Michael
> >
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell
> > check than usual.
> >
>