Thanks for the note. Ideally I would not like to keep track of what is
the oldest indexed date,
because this means that I'm already creating a bit of infrastructure on
top of my database,
with attendant referential integrity problems.
But I suppose I'll be forced to do that. In addition, I'll h
i think what he means is...do you know what day the 'oldest' day is? eg
if you have a rolling window of say 2 weeks, structure your query so
that your slice range only goes back 2 weeks, rather than to the
beginning of time. this would avoid iterating over all the tombstones
from prior to the
> I'm not sure I entirely follow. By the oldest data, do you mean the
> primary key corresponding to the limit of the time horizon? Unfortunately,
> unique IDs and the timstamps do not correlate in the sense that
> chronologically
> "newer" entries might have a smaller sequential ID. That's because
Thanks Peter,
I'm not sure I entirely follow. By the oldest data, do you mean the
primary key corresponding to the limit of the time horizon? Unfortunately,
unique IDs and the timstamps do not correlate in the sense that
chronologically
"newer" entries might have a smaller sequential ID. That's
Brandon,
it won't work in my application, as I need a few indexes on attributes
of the job. In addition, a large portion of queries is based on key-value
lookup, and that key is the unique job ID. I really can't have data packed
in one row per day.
Thanks,
Maxim
On 11/13/2011 8:34 PM, Brandon
> I do limit the number of rows I'm asking for in Pycassa. Queries on primary
> keys still work fine,
Is it feasable in your situation to keep track of the oldest possible
data (for example, if there is a single sequential writer that rotates
old entries away it could keep a record of what the old
On Sun, Nov 13, 2011 at 7:25 PM, Maxim Potekhin wrote:
> Each row represents a computational task (a job) executed on the grid or in
> the cloud. It naturally has a timestamp as one of its attributes,
> representing the time of the last update. This timestamp
> is used to group the data into "buck
Brandon,
thanks for the note.
Each row represents a computational task (a job) executed on the grid or
in the cloud. It naturally has a timestamp as one of its attributes,
representing the time of the last update. This timestamp
is used to group the data into "buckets" each representing one da
On Sun, Nov 13, 2011 at 6:55 PM, Maxim Potekhin wrote:
> Thanks to all for valuable insight!
>
> Two comments:
> a) this is not actually time series data, but yes, each item has
> a timestamp and thus chronological attribution.
>
> b) so, what do you practically recommend? I need to delete
> half
Thanks to all for valuable insight!
Two comments:
a) this is not actually time series data, but yes, each item has
a timestamp and thus chronological attribution.
b) so, what do you practically recommend? I need to delete
half a million to a million entries daily, then insert fresh data.
What's
Deletions in Cassandra imply the use of tombstones (see
http://wiki.apache.org/cassandra/DistributedDeletes) and under some
circumstances reads can turn O(n) with respect to the amount of
columns deleted, depending. It sounds like this is what you're seeing.
For example, suppose you're inserting a
On Sun, Nov 13, 2011 at 5:57 PM, Maxim Potekhin wrote:
> I've done more experimentation and the behavior persists: I start with a
> normal dataset which is searcheable by a secondary index. I select by that
> index the entries that match a certain criterion, then delete those. I tried
> two method
I've done more experimentation and the behavior persists: I start with a
normal dataset which is searcheable by a secondary index. I select by
that index the entries that match a certain criterion, then delete
those. I tried two methods of deletion -- individual cf.remove() as well
as batch rem
Hello,
My data load comes in batches representing one day in the life of a
large computing facility.
I index the data by the day it was produced, to be able to quickly pull
data for a specific day
within the last year or two. There are 6 other indexes.
When it comes to retiring the data, I in
14 matches
Mail list logo