Mike are you stuck on a train too? :-)
On Apr 9, 2010, at 8:51 PM, Mike Gallamore <mike.e.gallam...@googlemail.com
> wrote:
I apologize in advance if this goes into esoteric algorithms a bit
too much but I think this will get to an interesting idea to solve
your problem. My background is physics particularly computer
simulations of complex systems. Anyways in cosmology an interesting
algorithm is called an n-body tree code (its been around for at
least 20 years so a lot is available online about it). Since every
object with mass (well in general relativity actually anything with
energy but I digress) interacts with every other object with mass,
you end up with the "n-body" problem. The number of interactions in
a system goes as n(n-1) ~= n^2 where n is the number of elements.
This lead to a nightmare to do simulations of large systems, say two
galaxies colliding. 1 billion X 1 billion minus one is huge and
effectively incalculable since you would have to calculate this each
time you wanted to increment the simulation a tiny bit ahead in
time. How do you get a reasonable approximation to the solution? The
answer or at least one of them is n-body "tree codes".
You take advantage of the fact that the the force that one star
feels from another falls off as 1/r^2 and importantly two stars far
away from the first star but close together relatively have roughly
the same magnitude and direction of the "r" vector. So you can
simply clump them together, ie sum there masses, and the force is GM1
(M"sum")/r^2. To do this efficiently numerically you break down the
system using binary search trees. Thinking in 2D just to keep it
simple, you divide the space into top left, top right, bottom left
bottom right as a first approximation. Then continually do that
until you end up with each element in its own box. When you figure
out the forces you are going to apply to the system you just take
the distance to the middle of the box that contains the ones you are
going to consider together (the closer to the star in question the
smaller the boxes need to be because the direction of r changes
quicker the closer the boxes are to the star, but farther away you
can use larger and larger boxes (which would contain a 2D tree like
structure descending to the point where each of the stars contained
are trapped in there own little box), sum the number of stars in the
box and presto.
How would this help you? Well if you encoded the "box hierachy", say
1 for top left, 2 for top right, 3 for bottom left, 4 for bottom
right, then you could specify the box that someone is in based on a
string like "14234". To find the set of stars/points/whatever that
are at least x away you just would have to do a range search for all
the points with their location "string" larger than or equal to the
location sting corresponding to the closest corner of the biggest
box such that its corner is at least "x" units away. Quite good as a
first approximation and the search algorithm should run as O(nlog
(n)) which is a logirithmic decrease in computation time. Ie the 1
billion times 1 billion -1 problem becomes 1 billion times ~9, much
much nicer. Really difficult thing to explain without looking over a
diagram in person I admit but hopefully it makes sense if you look
up the algorithm online.
On 04/09/2010 05:01 PM, malsmith wrote:
It's sort of an interesting problem - in RDBMS one relatively
simple approach would be calculate a rectangle that is X km by Y km
with User 1's location at the center. So the rectangle is UserX -
10KmX , UserY-10KmY to UserX+10KmX , UserY+10KmY
Then you could query the database for all other users where that
each user considered is curUserX > UserX-10Km and curUserX < UserX
+10KmX and curUserY > UserY-10KmY and curUserY < UserY+10KmY
* Not the 10KmX and 10KmY are really a translation from Kilometers
to degrees of lat and longitude (that you can find on a google
search)
With the right indexes this query actually runs pretty well.
Translating that to Cassandra seems a bit complex at first - but
you could try something like pre-calculating a grid with the right
resolution (like a square of 5KM per side) and assign every user to
a particular grid ID. That way you just calculate with grid ID
User1 is in then do a direct key lookup to get a list of the users
in that same grid id.
A second approach would be to have to column families -- one that
maps a Latitude to a list of users who are at that latitude and a
second that maps users who are at a particular longitude. You
could do the same rectange calculation above then do a get_slice
range lookup to get a list of users from range of latitude and a
second list from the range of longitudes. You would then need to
do a in-memory nested loop to find the list of users that are in
both lists. This second approach could cause some trouble
depending on where you search and how many users you really have --
some latitudes and longitudes have many many people in them
So, it seems some version of a chunking / grid id thing would be
the better approach. If you let people zoom in or zoom out - you
could just have different column families for each level of zoom.
I'm stuck on a stopped train so -- here is even more code:
static Decimal GetLatitudeMiles(Decimal lat)
{
Decimal f = 0.0M;
lat = Math.Abs(lat);
f = 68.99M;
if (lat >= 0.0M && lat < 10.0M) { f = 68.71M; }
else if (lat >= 10.0M && lat < 20.0M) { f = 68.73M; }
else if (lat >= 20.0M && lat < 30.0M) { f = 68.79M; }
else if (lat >= 30.0M && lat < 40.0M) { f = 68.88M; }
else if (lat >= 40.0M && lat < 50.0M) { f = 68.99M; }
else if (lat >= 50.0M && lat < 60.0M) { f = 69.12M; }
else if (lat >= 60.0M && lat < 70.0M) { f = 69.23M; }
else if (lat >= 70.0M && lat < 80.0M) { f = 69.32M; }
else if (lat >= 80.0M) { f = 69.38M; }
return f;
}
Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos
((Double) zList[0].Latitude))) * 24900.0M / 360.0M;
dRadius = 10.0M // ten miles
Decimal deltaLat = dRadius / MilesPerDegreeLatitude;
Decimal deltaLong = dRadius / MilesPerDegreeLongitude;
ps.TopLatitude = zList[0].Latitude - deltaLat;
ps.TopLongitude = zList[0].Longitude - deltaLong;
ps.BottomLatitude = zList[0].Latitude + deltaLat;
ps.BottomLongitude = zList[0].Longitude + deltaLong;
On Fri, 2010-04-09 at 16:30 -0700, Paul Prescod wrote:
2010/4/9 Onur AKTAS <onur.ak...@live.com>:
> ...
> I'm trying to find out how do you perform queries with
calculations on the
> fly without inserting the data as calculated from the beginning.
> Lets say we have latitude and longitude coordinates of all users
and we have
> Distance(from_lat, from_long, to_lat, to_long) function which
> gives distance between lat/longs pairs in kilometers.
I'm not an expert, but I think that it boils down to "MapReduce"
and "Hadoop".
I don't think that there's any top-down tutorial on those two words,
you'll have to research yourself starting here:
* http://en.wikipedia.org/wiki/MapReduce
* http://hadoop.apache.org/
* http://wiki.apache.org/cassandra/HadoopSupport
I don't think it is all documented in any one place yet...
Paul Prescod