DF) [mailto:alaa.zuba...@pdf.com]
> *Sent:* lundi 11 mai 2015 20:32
> *To:* user@cassandra.apache.org
> *Subject:* CQL Data Model question
>
>
>
> Hi,
>
>
>
> I am trying to port an Oracle Table to Cassandra.
>
> the table is a wide table (931 columns) and could h
Porting an SQL data model to Cassandra is an anti-pattern - don't do it!
Instead, focus on developing a new data model that capitalizes on the key
strengths of Cassandra - distributed, scalable, fast writes, fast direct
access. Complex and ad-hoc queries are anti-patterns as well. I'll leave it
to
From: Alaa Zubaidi (PDF) [mailto:alaa.zuba...@pdf.com]
Sent: lundi 11 mai 2015 20:32
To: user@cassandra.apache.org
Subject: CQL Data Model question
Hi,
I am trying to port an Oracle Table to Cassandra.
the table is a wide table (931 columns) and could have millions of rows.
name, filter1, filter2
Hi,
I am trying to port an Oracle Table to Cassandra.
the table is a wide table (931 columns) and could have millions of rows.
name, filter1, filter2filter30, data1, data2...data900
The user would retrieve multiple rows from this table and filter (30 filter
columns) by one or more (up to 3)
Hi Cass,
just a hint from the off - if I got it right you have:
Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)
Assuming your events to write come in by wall clock time, the first
table design will have a hotspot on a specific node getting al
, velocity and variety. It doesn’t look like
your data has the volume or velocity that a standard RDBMS cannot handle.
Mohammed
From: Kai Wang [mailto:dep...@gmail.com]
Sent: Thursday, February 19, 2015 6:06 AM
To: user@cassandra.apache.org
Subject: Re: Data tiered compaction and data model question
ohammed
>>>
>>>
>>>
>>> *From:* cass savy [mailto:casss...@gmail.com]
>>> *Sent:* Wednesday, February 18, 2015 4:21 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Data tiered compaction and data model question
>>>
>>>
>>&
t in a day? What is
>> the worst-case scenario?
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* cass savy [mailto:casss...@gmail.com]
>> *Sent:* Wednesday, February 18, 2015 4:21 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Dat
avy [mailto:casss...@gmail.com]
> *Sent:* Wednesday, February 18, 2015 4:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* Data tiered compaction and data model question
>
>
>
> We want to track events in log Cf/table and should be able to query for
> events that occurred in r
What is the maximum number of events that you expect in a day? What is the
worst-case scenario?
Mohammed
From: cass savy [mailto:casss...@gmail.com]
Sent: Wednesday, February 18, 2015 4:21 PM
To: user@cassandra.apache.org
Subject: Data tiered compaction and data model question
We want to track
We want to track events in log Cf/table and should be able to query for
events that occurred in range of mins or hours for given day. Multiple
events can occur in a given minute. Listed 2 table designs and leaning
towards table 1 to avoid large wide row. Please advice on
*Table 1*: not very wid
I think there is not an extremely simple solution to your problem. You
will probably need to use multiple tables to get the view you need. One
keyed just by file UUID, which tracks some basic metadata about the file
including the last modified time. Another as a materialized view of the
most rece
t;
>
>
>
> -Original Message-
> From: y2k...@gmail.com on behalf of Jimmy Lin
> Sent: Thu 11-Jul-13 13:09
> To: user@cassandra.apache.org
> Subject: Re: data model question : finding out the n most recent changes
> items
>
> what I mean is, I really just w
-Original Message-
From: y2k...@gmail.com on behalf of Jimmy Lin
Sent: Thu 11-Jul-13 13:09
To: user@cassandra.apache.org
Subject: Re: data model question : finding out the n most recent changes items
what I mean is, I really just want the last modified date instead of series
of timestamp and still
what I mean is, I really just want the last modified date instead of series
of timestamp and still able to sort or order by it.
(maybe I should rephrase my question as how to sort or order by last
modified column in a row)
CREATE TABLE user_file (
user_id uuid,
modified_date timest
What you described this sounds like the most appropriate:
CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);
If you normally need more information about the file then either store that as
addit
I have an application that need to find out the n most recent modified
files for a given user id. I started out few tables but still couldn't get
what i want, I hope someone get point to some right direction...
See my tables below.
#1 won't work, because file_id's timeuuid contains creation time,
ssage-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
Sent: Martes, 07 de Mayo de 2013 05:52 p.m.
To: user@cassandra.apache.org
Subject: Re: CQL3 Data Model Question
Playorm is not yet on CQL3 and cassandra doesn't work well with +10,000
CF's as we went down that path and cassandra can'
27;t believe that
>>we really have any hotspots from what I can tell.
>>
>>Dean
>>
>>From: Keith Wright mailto:kwri...@nanigans.com>>
>>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>>mailto:user@cassand
ndra.apache.org<mailto:user@cassandra.apache.org>"
>mailto:user@cassandra.apache.org>>
>Date: Tuesday, May 7, 2013 2:02 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>mailto:user@cassandra.apache.org>>
>Subject: CQL3 Data Model Q
right mailto:kwri...@nanigans.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Date: Tuesday, May 7, 2013 2:02 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:use
Hi all,
I was hoping you could provide some assistance with a data modeling
question (my apologies if a similar question has already been posed). I have
time based data that I need to store on a per customer (aka app id ) basis so
that I can easily return it in sorted order by event time.
> Isn't kafka too young for production using purpose ?
The best way to advance the project is to use it and contribute your experience
and time.
btw, checking out kafka is a great idea. There are people around having Fun
Times with Kafka in production
Cheers
-
Aaron Morton
Fre
Isn't kafka too young for production using purpose ?
Clearly that would fit much better my needs but I can't afford early stage
project not ready for production. Is it ?
Le 30 avr. 2012 à 14:28, samal a écrit :
>
>
> On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis wrote:
> Hi Samal,
>
> Th
On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis wrote:
> Hi Samal,
>
> Thanks for the TTL feature, I wasn't aware of it's existence.
>
> Day's partitioning will be less wider than month partitionning (about 30
> times less give or take ;-) )
> Per day it should have something like 100 000 message
Hi Samal,
Thanks for the TTL feature, I wasn't aware of it's existence.
Day's partitioning will be less wider than month partitionning (about 30 times
less give or take ;-) )
Per day it should have something like 100 000 messages stored, most of it would
be retrieved so deleted before the TTL f
On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis wrote:
> Hi Aaron,
>
> Thank you for your answer, I was beginning to think that my question would
> never be answered ;-)
>
> Actually, this is what I was going for, except one thing, instead of
> partitioning row per month, I though about partition
Hi Aaron,
Thank you for your answer, I was beginning to think that my question would
never be answered ;-)
Actually, this is what I was going for, except one thing, instead of
partitioning row per month, I though about partitioning per day, like that
everyday I launch the cleaning tool, and it
Message Queue is often not a great use case for Cassandra. For information on
how to handle high delete workloads see
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
It hard to create a model without some idea of the data load, but I would
suggest you start with:
CF: Us
Hi everyone !
I'm fairly new to cassandra and I'm not quite yet familiarized with column
oriented NoSQL model.
I have worked a while on it, but I can't seems to find the best model for what
I'm looking for.
I have a Erlang software that let user connecting and communicate with each
others, whe
x column data value as column name and book_id as value?
>>
>> You do not need a different CF for each custom secondary index. Try putting
>> the name of the index in the row key.
>>
>> What will you recommend?
>>
>> Take another look at the queries you *need* to support. Then b
. Then build a small
> proof of concept to see if Cassandra will work for you.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/04/2012, at 6:46 AM, Data Craftsman wrote:
>
> Howdy,
&g
ill you recommend?
Take another look at the queries you *need* to support. Then build a small
proof of concept to see if Cassandra will work for you.
Hope that helps.
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 6/04/2012, at 6:46 AM, Data Craftsm
Will 1500 bytes row size be large or small for Cassandra from your
understanding?
performance degradation starts at 500MB rows, its very slow if you hit
this limit.
Thanks!
Better than mine, as it considered later additions of services!
Will update my code,
Thanks
*Tamar Fraenkel *
Senior Software Engineer, TOK Media
[image: Inline image 1]
ta...@tok-media.com
Tel: +972 2 6409736
Mob: +972 54 8356490
Fax: +972 2 5612956
On Mon, Mar 12, 2012 at 11:
Alternate would be to add another row to your user CF specific for Facebook
ids. Column ID would be the Facebook identifier and value would be your
internal uuid.
Consider when you want to add another service like twitter. Will you then
add another CF per service or just another row specific now
In this case, where you know the query upfront, I add a custom secondary index
using another CF to support the query. It's a little easier here because the
data wont change.
UserLookupCF (using composite types for the key value)
row_key: e.g. "facebook:12345" or "twitter:12345"
col_name : e.g
Hi!
Thanks for the response.
>From what I read, secondary indices are good only for columns with few
possible values. Is this a good fit for my case? I have unique facebook id
for every user.
Thanks
*Tamar Fraenkel *
Senior Software Engineer, TOK Media
[image: Inline image 1]
ta...@tok-media.com
Either you do that or you could think about using a secondary index on the
fb user name in your primary cf.
See http://www.datastax.com/docs/1.0/ddl/indexes
Cheers
Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel :
Hi!
I need some advise:
I have user CF, which has a UUID key which is my internal u
Hi!
I need some advise:
I have user CF, which has a UUID key which is my internal user id.
One of the column is facebook_id of the user (if exist).
I need to have the reverse mapping from facebook_id to my UUID.
My intention is to add a CF for the mapping from Facebook Id to my id:
user_by_fbid =
> 1. regarding time slicing, if at any point of time I am interested in what
> happened in the last T minutes, then I will need to query more than one row
> of the DimentionUpdates, right?
Yerp.
Sometimes that's is what's needed.
> 2. What did you mean by "You will also want to partition the
Hi!
Thank you very much for your response!
I have couple of questions regarding it, some are just to make sure I understood
you:
1. regarding time slicing, ifat any point of time I am interested in what
happened in the last T minutes, then I will need to query more than one row of
the Dimention
In general if you are collecting data over time you should consider
partitioning the row's to avoid creating very large rows. Also if you have a
common request you want to support consider modeling it directly rather than
using secondary indexes.
Assuming my understanding of the problem is in
On Sat, Jan 21, 2012 at 7:49 PM, Jean-Nicolas Boulay Desjardins <
jnbdzjn...@gmail.com> wrote:
> Milind Parikh, Rainbird is back by Twitter... My worry is that you
> might not be around in the future... Also, do you have evidence that
> your system is better? Because Rainbird is used by Twitter.
>
Hi
It may be my lack of knowledge but both has to do with counting, which is not
what I need.
What is wrong with the two models I suggested?
Tamar
Sent from my iPod
On Jan 22, 2012, at 2:49 AM, Jean-Nicolas Boulay Desjardins
wrote:
> Milind Parikh, Rainbird is back by Twitter... My worry is
Milind Parikh, Rainbird is back by Twitter... My worry is that you
might not be around in the future... Also, do you have evidence that
your system is better? Because Rainbird is used by Twitter.
On Sat, Jan 21, 2012 at 6:55 PM, Milind Parikh wrote:
>
> I used rainbird as inspiration for Countand
I used rainbird as inspiration for Countandra (& some of publicly available
data structures from rainbird preso). That said, there are significant
differences between the two architectures. Additiomally as Cassandra begins
to provide triggets, some very interesting things will become possible in
Co
But What about: Rainbird?
On Sat, Jan 21, 2012 at 10:52 AM, R. Verlangen wrote:
>
> A couple of days ago I came across Countandra ( http://countandra.org/ ). It
> seems that it might be a solution for you.
>
> Gr. Robin
>
>
> 2012/1/20 Tamar Fraenkel
>>
>> Hi!
>>
>> I am a newbie to Cassandra a
A couple of days ago I came across Countandra ( http://countandra.org/ ).
It seems that it might be a solution for you.
Gr. Robin
2012/1/20 Tamar Fraenkel
> **
>
> Hi!
>
> I am a newbie to Cassandra and seeking some advice regarding the data
> model I should use to best address my needs.
>
>
Hi!
I am a newbie to Cassandra and seeking some advice regarding the data model I
should use to best address my needs.
For simplicity, what I want to accomplish is:
I have a system that has users (potentially ~10,000 per day) and they perform
actions in the system (total of ~50,000 a day).
Each Use
Hello Aaron,
Thanks for your reply. I will try it.
Greetings,
Pablo
2010/12/2 Aaron Morton
> I say yes to all your questions about what you can do with Solr.
>
> Some background the on the technology...
>
> Lucene is a Java library for doing full text search
> http://lucene.apache.org/java/doc
I say yes to all your questions about what you can do with Solr. Some background the on the technology...Lucene is a Java library for doing full text search http://lucene.apache.org/java/docs/index.htmlSolr turns lucene into a HTTP server and adds a bunch of other features such as making it easier
Hello Aaron and Jake,
Thank you for your replay. I've worked with cassandra for 6 month but I
never use Lucandra. I will try Lucandra, but I must ask (before start), Is
possible reach my searching/pagination/sorting requeriments with Lucandra?
Thank you in advance,
Pablo
2010/12/2 Jake Luciani
You can also run Solr with Cassandra as the backend:
https://github.com/tjake/Lucandra/tree/solandra
-Jake
On Thu, Dec 2, 2010 at 6:27 AM, aaron morton wrote:
> Have you considered using Solr / lucene for the search? It has a lot more
> search features, and it really good at faceted navigatio
Have you considered using Solr / lucene for the search? It has a lot more
search features, and it really good at faceted navigation through a product
catalogue. It sounds like it would be a better fit for this task.
You can build facets for your price ranges, do the product name thing and
filt
Hello,
I need to store "products" data (product.name, product.price, product.state
and product.owner) in Cassandra 0.7 rc1.
The problem is that I need to get "products" where product.price > XX AND
product.price < XX AND product.name = XXX AND product.state = XXX. Also I
need return the products
On Thu, Apr 15, 2010 at 6:01 PM, Sonny Heer wrote:
> Need a way to have two different types of indexes.
>
> Key: aTextKey
> ColumnName: aTextColumnName:55
> Value: ""
>
> Key: aTextKey
> ColumnName: 55:aTextColumnName
> Value: ""
>
> All the valuable information is stored in the column name itself
Need a way to have two different types of indexes.
Key: aTextKey
ColumnName: aTextColumnName:55
Value: ""
Key: aTextKey
ColumnName: 55:aTextColumnName
Value: ""
All the valuable information is stored in the column name itself.
Above two can be in different column families...
Queries:
Given a ke
Erez,
To make this work you have to make your model fit Cassandra, not the
other way around. As a rule, you either do complex queries via client
code to process the results of several, simpler queries or via a CF
you create to act as an index. Yes, this means you have to write data
to each index
Do you mean on the client? It really depends on how many items you're
sorting. In terms of computer runtime, client-side will always likely be
faster but if you take into account bandwidth speeds having a pre-sorted
list will be better for large lists.
Creating 0-padded numbers is pretty straightf
You are correct Chris.
I am a newbie too in this field.
I like the Cassandra/NoSQL way and I am trying to see if it can fit my
model.
Thanks,
Erez
On Thu, Mar 25, 2010 at 11:03 AM, Christopher Brind <
christopher.br...@googlemail.com> wrote:
> Hi,
>
> I wondered if you were eluding to something
Hi,
I wondered if you were eluding to something more complex. You'd probably
want to create a index using something along the lines that Peter suggested.
:)
But I'm a Cassandra / Column DB newbie, so my experience ends just about ...
here. :)
Cheers,
Chris
On 25 March 2010 08:59, Erez Efrati
I am not clear how does this work when I want to increase the count of
user-1.
Thanks
Erez
On Thu, Mar 25, 2010 at 12:57 AM, Peter Chang wrote:
> If there's not much overhead, I recommend client side as well.
>
> Otherwise, you can only sort on column. Therefore, you could create some
> sort of
Hi Chris,
So, if I get it right, you suggest that I pull all the columns for in a
single row and do the sorting client side?
The user-friends-messages was just an example and maybe not the best I could
come up with cause I agree that there are not too many friends in general
that send you messages
Peter,
Do you think 0-padding the entries would be more efficient than just
implementing your own comparator?
On Wed, Mar 24, 2010 at 10:57 PM, Peter Chang wrote:
> If there's not much overhead, I recommend client side as well.
> Otherwise, you can only sort on column. Therefore, you could creat
If there's not much overhead, I recommend client side as well.
Otherwise, you can only sort on column. Therefore, you could create some
sort of inverted index based on the message count.
User 1 sent 50 messages.
User 2 sent 10 messages.
User 3 sent 25 messages.
Then store a separate index that l
Hi Erez,
Don't know how many friends a user in your system is likely to have, but are
they likely to have received so many messages from friends that you can't
sort it in your client app?
See:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#sort(java.util.List)
Assuming the us
Hi,
I can't figure out how to use model the following using column family and
the way the columns are sorted (by their name).
Lets say I have a list of users and for each user I wish to display a list
of all the friends he has ordered by the number of messages they sent him so
far (desc from most
68 matches
Mail list logo