Re: 答复: 答复: A difficult data model with C*

Carlos Alonso Thu, 10 Nov 2016 05:39:08 -0800

Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.


Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 10 November 2016 at 11:50, Benjamin Roth <benjamin.r...@jaumo.com> wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso <i...@mrcalonso.com>:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vla...@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond....@outlook.com <diamond....@outlook.com>>* wrote ----
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vla...@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen<play>>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond....@outlook.com <diamond....@outlook.com>>* wrote ----
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>>     Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vla...@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben
>>> <diamond....@outlook.com <diamond....@outlook.com>>* wrote ----
>>>
>>>
>>> Hi guys,
>>>
>>> We are maintaining a system for an on-line video service. ALL users'
>>> viewing records of every movie are stored in C*. So she/he can continue to
>>> enjoy the movie from the last point next time. The table is designed as
>>> below:
>>> CREATE TABLE recent (
>>> user_name text,
>>> vedio_id text,
>>> position int,
>>> last_time timestamp,
>>> PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>> It worked well before. However, the records increase every day and the
>>> last ten items may be adequate for the business. The current model use
>>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>>> business prefer to order by the last_time desc. If we use last_time as
>>> cluster key, there will be many records for a singe movie and the recent
>>> one is actually desired. So how to model that? Do you have any suggestions?
>>> Thanks!
>>>
>>>
>>> BRs,
>>> BEN
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

Re: 答复: 答复: A difficult data model with C*

Reply via email to