Re: How to perform queries on Cassandra?

Benjamin Black Sun, 11 Apr 2010 11:16:49 -0700

Row keys must be unique.  If your usernames are not unique and you
want to be able to query on them, you either need to figure out a way
to make them unique or treat the username rows themselves as indices,
which refer to a set of actually unique identifiers for users.


On Sun, Apr 11, 2010 at 11:12 AM, vineet daniel <vineetdan...@gmail.com> wrote:
> its not a problem its a scenario, which we need to handle. And all I am
> trying to do is to achieve what is not there with API i.e a workaroud.
>
> On Sun, Apr 11, 2010 at 11:06 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>> A system that permits multiple people to have the same username has a
>> serious problem.
>>
>> On Sun, Apr 11, 2010 at 6:12 AM, vineet daniel <vineetdan...@gmail.com>
>> wrote:
>> > How to handle same usernames. Otherwise seems fine to me.
>> >
>> > On Sun, Apr 11, 2010 at 6:17 PM, Dop Sun <su...@dopsun.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >> As far as I can see it, the Cassandra API currently supports criterias
>> >> on:
>> >>
>> >> Token – Key – Super Column Name (if applicable) - Column Names
>> >>
>> >>
>> >>
>> >> I guess Token is not usually used for the day to day queries, so, Key
>> >> and
>> >> Column Names are normally used for querying. For the user name and
>> >> password
>> >> case, I guess it can be done like this:
>> >>
>> >>
>> >>
>> >> Define a CF as UserAuth with type as Super, and Key is user name, while
>> >> password can be the SuperKeyName. So, while you receive the user name
>> >> and
>> >> password from the UI (or any other methods), it can be queried via:
>> >> multiget_slice or get_range_slices, if there are anything returned,
>> >> means
>> >> that the user name and password matches.
>> >>
>> >>
>> >>
>> >> If not using the super column name, and put the password as the column
>> >> name, the column name usually not used for these kind of discretionary
>> >> values (actually, I don’t see any definitive documents on how to use
>> >> the
>> >> column Names and Super Columns, flexibility is the good of Cassandra,
>> >> or is
>> >> it bad if abused? :P)
>> >>
>> >>
>> >>
>> >> Not sure whether this is the best way, but I guess it will work.
>> >>
>> >>
>> >>
>> >> Regards,
>> >>
>> >> Dop
>> >>
>> >>
>> >>
>> >> From: Lucifer Dignified [mailto:vineetdan...@gmail.com]
>> >> Sent: Sunday, April 11, 2010 5:33 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: How to perform queries on Cassandra?
>> >>
>> >>
>> >>
>> >> Hi Benjamin
>> >>
>> >> I'll try to make it more clear to you.
>> >> We have a user table with fields 'id', 'username', and 'password'. Now
>> >> if
>> >> use the ideal way to store key/value, like :
>> >> username : vineetdaniel
>> >> timestamp
>> >> password : <password>
>> >> timestamp
>> >>
>> >> second user :
>> >>
>> >> username: <seconduser>
>> >> timestamp
>> >> password:<password>
>> >>
>> >> and so on, here what i assume is that as we cannot make search on
>> >> values
>> >> (as confirmed by guys on cassandra forums) we are not able to perform
>> >> robust
>> >> 'where' queries. Now what i propose is this.
>> >>
>> >> Rather than using a static values for column names use values itself
>> >> and
>> >> unique key as identifier. So, the above example when put in as per me
>> >> would
>> >> be.
>> >>
>> >> vineetdaniel : vineetdaniel
>> >> timestamp
>> >>
>> >> <password>:<password>
>> >> timestamp
>> >>
>> >> second user
>> >> seconduser:seconduser
>> >> timestamp
>> >>
>> >> password:password
>> >> timestamp
>> >>
>> >> By using above methodology we can simply make search on keys itself
>> >> rather
>> >> than going into using different CF's. But to add further, this cannot
>> >> be
>> >> used for every situation. I am still exploring this, and soon will be
>> >> updating the group and my blog with information pertaining to this. As
>> >> cassandra is new, I think every idea or experience should be shared
>> >> with the
>> >> community.
>> >>
>> >> I hope I example is clear this time. Should you have any queries feel
>> >> free
>> >> to revert.
>> >>
>> >> On Sun, Apr 11, 2010 at 2:01 PM, Benjamin Black <b...@b3k.us> wrote:
>> >>
>> >> Sorry, I don't understand your example.
>> >>
>> >> On Sun, Apr 11, 2010 at 12:54 AM, Lucifer Dignified
>> >> <vineetdan...@gmail.com> wrote:
>> >> > Benjamin I quite agree to you, but what in case of duplicate
>> >> > usernames,
>> >> > suppose if I am not using unique names as in email id's . If we have
>> >> > duplicacy in usernames we cannot use it for key, so what should be
>> >> > the
>> >> > solution. I think keeping incremental numeric id as key and keeping
>> >> > the
>> >> > name
>> >> > and value same in the column family.
>> >> >
>> >> > Example :
>> >> > User1 has password as 123456
>> >> >
>> >> > Cassandra structure :
>> >> >
>> >> > 1 as key
>> >> >            user1 - column name
>> >> >            value - user1
>> >> >            123456 - column name
>> >> >             value - 123456
>> >> >
>> >> > I m thinking of doing it this way for my applicaton, this way i can
>> >> > run
>> >> > different sorts of queries too. Any feedback on this is welcome.
>> >> >
>> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black <b...@b3k.us> wrote:
>> >> >>
>> >> >> You would have a Column Family, not a column for that; let's call it
>> >> >> the Users CF.  You'd use username as the row key and have a column
>> >> >> called 'password'.  For your example query, you'd retrieve row key
>> >> >> 'usr2', column 'password'.  The general pattern is that you create
>> >> >> CFs
>> >> >> to act as indices for each query you want to perform.  There is no
>> >> >> equivalent to a relational store to perform arbitrary queries.  You
>> >> >> must structure things to permit the queries of interest.
>> >> >>
>> >> >>
>> >> >> b
>> >> >>
>> >> >> On Sat, Apr 10, 2010 at 8:34 PM, dir dir <sikerasa...@gmail.com>
>> >> >> wrote:
>> >> >> > I have already read the API spesification. Honestly I do not
>> >> >> > understand
>> >> >> > how to use it. Because there are not an examples.
>> >> >> >
>> >> >> > For example I have a column like this:
>> >> >> >
>> >> >> > UserName    Password
>> >> >> > usr1                abc
>> >> >> > usr2                xyz
>> >> >> > usr3                opm
>> >> >> >
>> >> >> > suppose I want query the user's password using SQL in RDBMS
>> >> >> >
>> >> >> >       Select Password From Users Where UserName = "usr2";
>> >> >> >
>> >> >> > Now I want to get the password using OODBMS DB4o Object Query  and
>> >> >> > Java
>> >> >> >
>> >> >> >      ObjectSet QueryResult = db.query(new Predicate()
>> >> >> >      {
>> >> >> >             public boolean match(Users Myusers)
>> >> >> >             {
>> >> >> >                  return Myuser.getUserName() == "usr2";
>> >> >> >             }
>> >> >> >      });
>> >> >> >
>> >> >> > After we get the Users instance in the QueryResult, hence we can
>> >> >> > get
>> >> >> > the
>> >> >> > usr2's password.
>> >> >> >
>> >> >> > How we perform this query using Cassandra API and Java??
>> >> >> > Would you tell me please??  Thank You.
>> >> >> >
>> >> >> > Dir.
>> >> >> >
>> >> >> >
>> >> >> > On Sat, Apr 10, 2010 at 11:06 AM, Paul Prescod <p...@prescod.net>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> No. Cassandra has an API.
>> >> >> >>
>> >> >> >> http://wiki.apache.org/cassandra/API
>> >> >> >>
>> >> >> >> On Fri, Apr 9, 2010 at 8:00 PM, dir dir <sikerasa...@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > Does Cassandra has a default query language such as SQL in
>> >> >> >> > RDBMS
>> >> >> >> > and Object Query in OODBMS?  Thank you.
>> >> >> >> >
>> >> >> >> > Dir.
>> >> >> >> >
>> >> >> >> > On Sat, Apr 10, 2010 at 7:01 AM, malsmith
>> >> >> >> > <malsm...@treehousesystems.com>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> It's sort of an interesting problem - in RDBMS one relatively
>> >> >> >> >> simple
>> >> >> >> >> approach would be calculate a rectangle that is X km by Y km
>> >> >> >> >> with
>> >> >> >> >> User
>> >> >> >> >> 1's
>> >> >> >> >> location at the center.  So the rectangle is UserX - 10KmX ,
>> >> >> >> >> UserY-10KmY to
>> >> >> >> >> UserX+10KmX , UserY+10KmY
>> >> >> >> >>
>> >> >> >> >> Then you could query the database for all other users where
>> >> >> >> >> that
>> >> >> >> >> each
>> >> >> >> >> user
>> >> >> >> >> considered is curUserX > UserX-10Km and curUserX < UserX+10KmX
>> >> >> >> >> and
>> >> >> >> >> curUserY
>> >> >> >> >> > UserY-10KmY and curUserY < UserY+10KmY
>> >> >> >> >> * Not the 10KmX and 10KmY are really a translation from
>> >> >> >> >> Kilometers
>> >> >> >> >> to
>> >> >> >> >> degrees of  lat and longitude  (that you can find on a google
>> >> >> >> >> search)
>> >> >> >> >>
>> >> >> >> >> With the right indexes this query actually runs pretty well.
>> >> >> >> >>
>> >> >> >> >> Translating that to Cassandra seems a bit complex at first -
>> >> >> >> >> but
>> >> >> >> >> you
>> >> >> >> >> could
>> >> >> >> >> try something like pre-calculating a grid with the right
>> >> >> >> >> resolution
>> >> >> >> >> (like a
>> >> >> >> >> square of 5KM per side) and assign every user to a particular
>> >> >> >> >> grid
>> >> >> >> >> ID.
>> >> >> >> >> That
>> >> >> >> >> way you just calculate with grid ID User1 is in then do a
>> >> >> >> >> direct
>> >> >> >> >> key
>> >> >> >> >> lookup
>> >> >> >> >> to get a list of the users in that same grid id.
>> >> >> >> >>
>> >> >> >> >> A second approach would be to have to column families -- one
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> a
>> >> >> >> >> Latitude to a list of users who are at that latitude and a
>> >> >> >> >> second
>> >> >> >> >> that
>> >> >> >> >> maps
>> >> >> >> >> users who are at a particular longitude.  You could do the
>> >> >> >> >> same
>> >> >> >> >> rectange
>> >> >> >> >> calculation above then do a get_slice range lookup to get a
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> from range of latitude and a second list from the range of
>> >> >> >> >> longitudes.
>> >> >> >> >> You would then need to do a in-memory nested loop to find the
>> >> >> >> >> list
>> >> >> >> >> of
>> >> >> >> >> users
>> >> >> >> >> that are in both lists.  This second approach could cause some
>> >> >> >> >> trouble
>> >> >> >> >> depending on where you search and how many users you really
>> >> >> >> >> have
>> >> >> >> >> --
>> >> >> >> >> some
>> >> >> >> >> latitudes and longitudes have many many people in them
>> >> >> >> >>
>> >> >> >> >> So, it seems some version of a chunking / grid id thing would
>> >> >> >> >> be
>> >> >> >> >> the
>> >> >> >> >> better approach.   If you let people zoom in or zoom out - you
>> >> >> >> >> could
>> >> >> >> >> just
>> >> >> >> >> have different column families for each level of zoom.
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> I'm stuck on a stopped train so -- here is even more code:
>> >> >> >> >>
>> >> >> >> >> static Decimal GetLatitudeMiles(Decimal lat)
>> >> >> >> >> {
>> >> >> >> >> Decimal f = 0.0M;
>> >> >> >> >> lat = Math.Abs(lat);
>> >> >> >> >> f = 68.99M;
>> >> >> >> >>          if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
>> >> >> >> >> else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
>> >> >> >> >> else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
>> >> >> >> >> else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
>> >> >> >> >> else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
>> >> >> >> >> else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
>> >> >> >> >> else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
>> >> >> >> >> else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
>> >> >> >> >> else if (lat >= 80.0M) { f = 69.38M; }
>> >> >> >> >>
>> >> >> >> >> return f;
>> >> >> >> >> }
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Decimal MilesPerDegreeLatitude =
>> >> >> >> >> GetLatitudeMiles(zList[0].Latitude);
>> >> >> >> >> Decimal MilesPerDegreeLongitude = ((Decimal)
>> >> >> >> >> Math.Abs(Math.Cos((Double)
>> >> >> >> >> zList[0].Latitude))) * 24900.0M / 360.0M;
>> >> >> >> >>                         dRadius = 10.0M  // ten miles
>> >> >> >> >> Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
>> >> >> >> >> Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
>> >> >> >> >>
>> >> >> >> >> ps.TopLatitude = zList[0].Latitude - deltaLat;
>> >> >> >> >> ps.TopLongitude = zList[0].Longitude - deltaLong;
>> >> >> >> >> ps.BottomLatitude = zList[0].Latitude + deltaLat;
>> >> >> >> >> ps.BottomLongitude = zList[0].Longitude + deltaLong;
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
>> >> >> >> >>
>> >> >> >> >> 2010/4/9 Onur AKTAS <onur.ak...@live.com>:
>> >> >> >> >> > ...
>> >> >> >> >> > I'm trying to find out how do you perform queries with
>> >> >> >> >> > calculations
>> >> >> >> >> > on
>> >> >> >> >> > the
>> >> >> >> >> > fly without inserting the data as calculated from the
>> >> >> >> >> > beginning.
>> >> >> >> >> > Lets say we have latitude and longitude coordinates of all
>> >> >> >> >> > users
>> >> >> >> >> > and
>> >> >> >> >> > we
>> >> >> >> >> > have
>> >> >> >> >> >  Distance(from_lat, from_long, to_lat, to_long) function
>> >> >> >> >> > which
>> >> >> >> >> > gives distance between lat/longs pairs in kilometers.
>> >> >> >> >>
>> >> >> >> >> I'm not an expert, but I think that it boils down to
>> >> >> >> >> "MapReduce"
>> >> >> >> >> and
>> >> >> >> >> "Hadoop".
>> >> >> >> >>
>> >> >> >> >> I don't think that there's any top-down tutorial on those two
>> >> >> >> >> words,
>> >> >> >> >> you'll have to research yourself starting here:
>> >> >> >> >>
>> >> >> >> >>  * http://en.wikipedia.org/wiki/MapReduce
>> >> >> >> >>
>> >> >> >> >>  * http://hadoop.apache.org/
>> >> >> >> >>
>> >> >> >> >>  * http://wiki.apache.org/cassandra/HadoopSupport
>> >> >> >> >>
>> >> >> >> >> I don't think it is all documented in any one place yet...
>> >> >> >> >>
>> >> >> >> >>  Paul Prescod
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >
>
>

Re: How to perform queries on Cassandra?

Reply via email to