Some questions to updating and tombstone

2016-11-14 Thread Lu, Boying
Hi, All,

Will the Cassandra generates a new tombstone when updating a column by using 
CQL update statement?

And is there any way to get the number of tombstones of a column family since 
we want to void generating
too many tombstones within gc_grace_period?

Thanks

Boying


Re: Some questions to updating and tombstone

2016-11-14 Thread Vladimir Yudovin
Hi Boying,



UPDATE write new value with new time stamp. Old value is not tombstone, but 
remains until compaction. gc_grace_period is not related to this.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying  
wrote 




Hi, All,

 

Will the Cassandra generates a new tombstone when updating a column by using 
CQL update statement?

 

And is there any way to get the number of tombstones of a column family since 
we want to void generating

too many tombstones within gc_grace_period?

 

Thanks

 

Boying









Re: Too High resident memory of cassandra 2.2.8

2016-11-14 Thread Jeff Jirsa
nodetool cfstats will show it per table.

 

The bloom filter / compression data is typically (unless you have very unusual 
settings in your schema) 1-3GB each per TB of data, so with 235’ish GB/server, 
it’s unlikely bloom filter or compression data.

 

The memTable is AT LEAST 1MB per columnfamily/table, so if you know how many 
tables you have, that may be an initial lower bound guess.

 

 

 

From: ankit tyagi 
Reply-To: "user@cassandra.apache.org" 
Date: Sunday, November 13, 2016 at 11:33 PM
To: "user@cassandra.apache.org" 
Subject: Re: Too High resident memory of cassandra 2.2.8

 

Hi Jeff,

Below is the output of nodetool staus command.

 

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens   OwnsHost ID 
  Rack

UN  192.168.68.156  235.79 GB  256  ?   
e7b1a44d-0cd2-4b60-b322-4f989933fc51  rack1

UN  192.168.68.157  234.65 GB  256  ?   
70406f0b-3620-401e-beaa-15deb4b799ce  rack1

UN  192.168.69.146   256  ?   d32e1e4d-ec86-4c3f-9397-11f37ff7b4d3  
rack1

UN  192.168.69.147  242.77 GB  256  ?   
646d9416-a467-4526-9656-959aa98404d0  rack1

UN  192.168.69.148  249.84 GB  256  ?   
9b0ab632-75f4-4781-a987-a00b8246ae97  rack1

UN  192.168.69.149  240.62 GB  256  ?   
406c4d3e-0933-4cba-935f-bfba16e6d878  rack1

 

 

is there any command to find out the size of offheap memtable. 

 

On Mon, Nov 14, 2016 at 12:30 PM, Jeff Jirsa  wrote:

Cassandra keeps certain data structures offheap, including bloom filters 
(scales with data size), compression metadata (scales with data size), and 
potentially memtables (scales with # of keyspaces/tables).

 

How much data on your node? Onheap or offheap memtables?

 

 

 

From: ankit tyagi 
Reply-To: "user@cassandra.apache.org" 
Date: Sunday, November 13, 2016 at 10:55 PM
To: "user@cassandra.apache.org" 
Subject: Too High resident memory of cassandra 2.2.8

 

Hi, 

 

we are using cassandra 2.2.8 version in production. we are seeing resident 
memory of cassndra process is very high 40G while heap size is only 8GB.

 

root  23339  1 80 Nov11 ?2-09:38:08 /opt/java8/bin/java -ea 
-javaagent:bin/../lib/jamm-0.3.0.jar -XX:+CMSClassUnloadingEnabled 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -XX:+PerfDisableSharedMem 
-XX:CompileCommandFile=bin/../conf/hotspot_compiler -XX:CMSWaitDuration=1 
-XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways 
-XX:CMSWaitDuration=1 -XX:+UseCondCardMark -XX:+PrintGCDetails 
-XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure 
-Xloggc:bin/../logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 
-XX:GCLogFileSize=10M -Djava.net.preferIPv4Stack=true 
-Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC 
-Djava.library.path=bin/../lib/sigar-bin 
-javaagent:/myntra/currentCassandra/lib/agent-1.2.jar=statsd.myntra.com:8125 
-Dlogback.configurationFile=logback.xml -Dcassandra.logdir=bin/../logs 
-Dcassandra.storagedir=bin/../data -cp 
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/agent-1.2.jar:bin/../lib/airline-0.6.jar:bin/../lib/antlr-runtime-3.5.2.jar:bin/../lib/apache-cassandra-2.2.8.jar:bin/../lib/apache-cassandra-clientutil-2.2.8.jar:bin/../lib/apache-cassandra-thrift-2.2.8.jar:bin/../lib/cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang3-3.1.jar:bin/../lib/commons-math3-3.2.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.4.jar:bin/../lib/crc32ex-0.1.1.jar:bin/../lib/disruptor-3.0.1.jar:bin/../lib/ecj-4.4.2.jar:bin/../lib/guava-16.0.jar:bin/../lib/high-scale-lib-1.0.6.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.3.0.jar:bin/../lib/java-statsd-client-3.1.0.jar:bin/../lib/javax.inject.jar:bin/../lib/jbcrypt-0.3m.jar:bin/../lib/jcl-over-slf4j-1.7.7.jar:bin/../lib/jna-4.0.0.jar:bin/../lib/joda-time-2.4.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.9.2.jar:bin/../lib/log4j-over-slf4j-1.7.7.jar:bin/../lib/logback-classic-1.1.3.jar:bin/../lib/logback-core-1.1.3.jar:bin/../lib/lz4-1.3.0.jar:bin/../lib/metrics-core-2.2.0.jar:bin/../lib/metrics-core-3.1.0.jar:bin/../lib/metrics-jvm-3.1.0.jar:bin/../lib/metrics-logback-3.1.0.jar:bin/../lib/metrics-statsd-2.3.0.jar:bin/../lib/netty-all-4.0.23.Final.jar:bin/../lib/ohc-core-0.3.4.jar:bin/../lib/ohc-core-j8-0.3.4.jar:bin/../lib/reporter-config3-3.0.0.jar:bin/../lib/reporter-conf

Re: Some questions to updating and tombstone

2016-11-14 Thread Anuj Wadehra
Hi Boying,
I agree with Vladimir.If compaction is not compacting the two sstables with 
updates soon, disk space issues will be wasted. For example, if the updates are 
not closer in time, first update might be in a big table by the time second 
update is being written in a new small table. STCS wont compact them together 
soon.
Just adding column values with new timestamp shouldnt create any tombstones. 
But if data is not merged for long, disk space issues may arise. If you are 
STCS,just  yo get an idea about the extent of the problem you can run major 
compaction and see the amount of disk space created with that( dont do this in 
production as major compaction has its own side effects).
Which compaction strategy are you using? Are these updates done with TTL?
Thanks
Anuj 
 
  On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin 
wrote:   Hi Boying,

UPDATE write new value with new time stamp. Old value is not tombstone, but 
remains until compaction. gc_grace_period is not related to this.

Best regards, Vladimir Yudovin, 
Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying  wrote 




Hi, All,


 


Will the Cassandra generates a new tombstone when updating a column by using 
CQL update statement?


 


And is there any way to get the number of tombstones of a column family since 
we want to void generating


too many tombstones within gc_grace_period?


 


Thanks


 


Boying



  


Tomstones impact on repairs both anti-entropy and read repair

2016-11-14 Thread K F
Hi Folks,
I have a table that has lot of tombstones generated and has caused inconsistent 
data across various datacenters. we run anti-entropy repairs and also have 
read_repair_chance tuned-up during our non busy hours. But yet when we try to 
compare data residing in various replicas across DCs, we see inconsistency.
My question to the community is will tombstone cause issues in data consistency 
across the DCs.
Thanks.

Storing videos in cassandra

2016-11-14 Thread raghavendra vutti
Hi,

 Just wanted to know How does hulu or netflix store videos in cassandra.

Do they just use references to the video files in the form of URL's and
store in the DB??

could someone please me on this.


Thanks,
Raghavendra.


Re: Storing videos in cassandra

2016-11-14 Thread Oskar Kjellin
The actual video is not stored in Cassandra. You need to use a proper origin 
like s3. 

Although you can probably store it in Cassandra, it's not a good idea. 

Sent from my iPhone

> On 14 nov. 2016, at 18:02, raghavendra vutti  
> wrote:
> 
> Hi,
> 
>  Just wanted to know How does hulu or netflix store videos in cassandra.
> 
> Do they just use references to the video files in the form of URL's and store 
> in the DB??
> 
> could someone please me on this.
> 
> 
> Thanks,
> Raghavendra.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar
The video can be written to floppy diskettes, and the serial numbers of the
diskettes can be written to cassandra.

On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
wrote:

> The actual video is not stored in Cassandra. You need to use a proper
> origin like s3.
>
> Although you can probably store it in Cassandra, it's not a good idea.
>
> Sent from my iPhone
>
> > On 14 nov. 2016, at 18:02, raghavendra vutti <
> raghu9raghaven...@gmail.com> wrote:
> >
> > Hi,
> >
> >  Just wanted to know How does hulu or netflix store videos in cassandra.
> >
> > Do they just use references to the video files in the form of URL's and
> store in the DB??
> >
> > could someone please me on this.
> >
> >
> > Thanks,
> > Raghavendra.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar
Another solution could be to print the raw bytes to paper, and write the
page numbers to cassandra. Playback will be challenging with this method
however, unless interns are available to transcribe the papers back to a
digital format.

On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  wrote:

> The video can be written to floppy diskettes, and the serial numbers of
> the diskettes can be written to cassandra.
>
> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
> wrote:
>
>> The actual video is not stored in Cassandra. You need to use a proper
>> origin like s3.
>>
>> Although you can probably store it in Cassandra, it's not a good idea.
>>
>> Sent from my iPhone
>>
>> > On 14 nov. 2016, at 18:02, raghavendra vutti <
>> raghu9raghaven...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> >  Just wanted to know How does hulu or netflix store videos in cassandra.
>> >
>> > Do they just use references to the video files in the form of URL's and
>> store in the DB??
>> >
>> > could someone please me on this.
>> >
>> >
>> > Thanks,
>> > Raghavendra.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>


Re: Storing videos in cassandra

2016-11-14 Thread l...@airstreamcomm.net
We store videos and files in Cassandra by chunking them into small portions and 
saving them as blobs.  As for video you could track the file byte offset of 
each chunk and request the relevant pieces when scrubbing to a particular 
portion of the video.  

> On Nov 14, 2016, at 11:02 AM, raghavendra vutti  
> wrote:
> 
> Hi,
> 
>  Just wanted to know How does hulu or netflix store videos in cassandra.
> 
> Do they just use references to the video files in the form of URL's and store 
> in the DB??
> 
> could someone please me on this.
> 
> 
> Thanks,
> Raghavendra.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 




Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar
Excuse me? I did not make fun of anyone. I gave valid suggestions that are
all theoretically possible.

If it came off in a condescending way, i am genuinely sorry.

On 14 Nov 2016 11:22 pm, "Jon Haddad"  wrote:

> You’ve asked a lot of questions on this mailing list, and you’ve gotten
> help on a ton of beginner issues.  Making fun of someone for asking similar
> beginner questions is not cool at all.  Cut it out.
>
>
>
> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
>
> Another solution could be to print the raw bytes to paper, and write the
> page numbers to cassandra. Playback will be challenging with this method
> however, unless interns are available to transcribe the papers back to a
> digital format.
>
> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  wrote:
>
>> The video can be written to floppy diskettes, and the serial numbers of
>> the diskettes can be written to cassandra.
>>
>> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin 
>> wrote:
>>
>>> The actual video is not stored in Cassandra. You need to use a proper
>>> origin like s3.
>>>
>>> Although you can probably store it in Cassandra, it's not a good idea.
>>>
>>> Sent from my iPhone
>>>
>>> > On 14 nov. 2016, at 18:02, raghavendra vutti <
>>> raghu9raghaven...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> >  Just wanted to know How does hulu or netflix store videos in
>>> cassandra.
>>> >
>>> > Do they just use references to the video files in the form of URL's
>>> and store in the DB??
>>> >
>>> > could someone please me on this.
>>> >
>>> >
>>> > Thanks,
>>> > Raghavendra.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>
>>
>
>


Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
You’ve asked a lot of questions on this mailing list, and you’ve gotten help on 
a ton of beginner issues.  Making fun of someone for asking similar beginner 
questions is not cool at all.  Cut it out.



> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
> 
> Another solution could be to print the raw bytes to paper, and write the page 
> numbers to cassandra. Playback will be challenging with this method however, 
> unless interns are available to transcribe the papers back to a digital 
> format.
> 
> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  > wrote:
> The video can be written to floppy diskettes, and the serial numbers of the 
> diskettes can be written to cassandra.
> 
> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin  > wrote:
> The actual video is not stored in Cassandra. You need to use a proper origin 
> like s3.
> 
> Although you can probably store it in Cassandra, it's not a good idea.
> 
> Sent from my iPhone
> 
> > On 14 nov. 2016, at 18:02, raghavendra vutti  > > wrote:
> >
> > Hi,
> >
> >  Just wanted to know How does hulu or netflix store videos in cassandra.
> >
> > Do they just use references to the video files in the form of URL's and 
> > store in the DB??
> >
> > could someone please me on this.
> >
> >
> > Thanks,
> > Raghavendra.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 



Re: Storing videos in cassandra

2016-11-14 Thread l...@airstreamcomm.net
Seconded.  It is completely unhelpful to spam this list.  Please stop.

> On Nov 14, 2016, at 12:21 PM, Jon Haddad  wrote:
> 
> You’ve asked a lot of questions on this mailing list, and you’ve gotten help 
> on a ton of beginner issues.  Making fun of someone for asking similar 
> beginner questions is not cool at all.  Cut it out.
> 
> 
> 
>> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
>> 
>> Another solution could be to print the raw bytes to paper, and write the 
>> page numbers to cassandra. Playback will be challenging with this method 
>> however, unless interns are available to transcribe the papers back to a 
>> digital format.
>> 
>> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar  wrote:
>>> The video can be written to floppy diskettes, and the serial numbers of the 
>>> diskettes can be written to cassandra.
>>> 
 On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin  
 wrote:
 The actual video is not stored in Cassandra. You need to use a proper 
 origin like s3.
 
 Although you can probably store it in Cassandra, it's not a good idea.
 
 Sent from my iPhone
 
 > On 14 nov. 2016, at 18:02, raghavendra vutti 
 >  wrote:
 >
 > Hi,
 >
 >  Just wanted to know How does hulu or netflix store videos in cassandra.
 >
 > Do they just use references to the video files in the form of URL's and 
 > store in the DB??
 >
 > could someone please me on this.
 >
 >
 > Thanks,
 > Raghavendra.
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
>>> 
>> 
> 


Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
Think about it like this.  You just started using Cassandra for the first time. 
 You have a question, you find there’s a mailing list, and you ask.  You have 
zero experience with the DB and are an outsider to a community.  You ask 
anyways, because it’s where the Apache website says to go.  You get back 2 
sarcastic responses which aren’t helpful at all.  You, Ali, are the first 
contact with the community and it's a negative one.  Your joke, however funny 
it is, excludes someone who isn’t on the inside.  They don’t get the elbow to 
the ribs, haha, we’re just having fun, they get the “wow, all I did was ask a 
question and I got made fun of” feeling.

Everyone is a beginner, and an outsider, at some point.  Please keep that in 
mind no-one has any understanding to the intent on your jokes when all they 
have is a 2 sentence response that is obviously not meant to be helpful.

Jon

> On Nov 14, 2016, at 10:25 AM, Ali Akhtar  wrote:
> 
> Excuse me? I did not make fun of anyone. I gave valid suggestions that are 
> all theoretically possible.
> 
> If it came off in a condescending way, i am genuinely sorry.
> 
> 
> On 14 Nov 2016 11:22 pm, "Jon Haddad"  > wrote:
> You’ve asked a lot of questions on this mailing list, and you’ve gotten help 
> on a ton of beginner issues.  Making fun of someone for asking similar 
> beginner questions is not cool at all.  Cut it out.
> 
> 
> 
>> On Nov 14, 2016, at 10:13 AM, Ali Akhtar > > wrote:
>> 
>> Another solution could be to print the raw bytes to paper, and write the 
>> page numbers to cassandra. Playback will be challenging with this method 
>> however, unless interns are available to transcribe the papers back to a 
>> digital format.
>> 
>> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar > > wrote:
>> The video can be written to floppy diskettes, and the serial numbers of the 
>> diskettes can be written to cassandra.
>> 
>> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin > > wrote:
>> The actual video is not stored in Cassandra. You need to use a proper origin 
>> like s3.
>> 
>> Although you can probably store it in Cassandra, it's not a good idea.
>> 
>> Sent from my iPhone
>> 
>> > On 14 nov. 2016, at 18:02, raghavendra vutti > > > wrote:
>> >
>> > Hi,
>> >
>> >  Just wanted to know How does hulu or netflix store videos in cassandra.
>> >
>> > Do they just use references to the video files in the form of URL's and 
>> > store in the DB??
>> >
>> > could someone please me on this.
>> >
>> >
>> > Thanks,
>> > Raghavendra.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> 
>> 
> 



Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
While Cassandra *can* be used this way, I don’t recommend it.  It’s going to be 
far cheaper and easier to maintain to store data in an Object store like S3, 
like Oskar recommended.

> On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote:
> 
> We store videos and files in Cassandra by chunking them into small portions 
> and saving them as blobs.  As for video you could track the file byte offset 
> of each chunk and request the relevant pieces when scrubbing to a particular 
> portion of the video.  
> 
>> On Nov 14, 2016, at 11:02 AM, raghavendra vutti 
>>  wrote:
>> 
>> Hi,
>> 
>> Just wanted to know How does hulu or netflix store videos in cassandra.
>> 
>> Do they just use references to the video files in the form of URL's and 
>> store in the DB??
>> 
>> could someone please me on this.
>> 
>> 
>> Thanks,
>> Raghavendra.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 



Re: Storing videos in cassandra

2016-11-14 Thread Ali Akhtar
I am truly sorry, Raghavendra. It didn't occur to me that you could be a
beginner.

On Mon, Nov 14, 2016 at 11:46 PM, Jon Haddad 
wrote:

> Think about it like this.  You just started using Cassandra for the first
> time.  You have a question, you find there’s a mailing list, and you ask.
> You have zero experience with the DB and are an outsider to a community.
> You ask anyways, because it’s where the Apache website says to go.  You get
> back 2 sarcastic responses which aren’t helpful at all.  You, Ali, are the
> first contact with the community and it's a negative one.  Your joke,
> however funny it is, excludes someone who isn’t on the inside.  They don’t
> get the elbow to the ribs, haha, we’re just having fun, they get the “wow,
> all I did was ask a question and I got made fun of” feeling.
>
> Everyone is a beginner, and an outsider, at some point.  Please keep that
> in mind no-one has any understanding to the intent on your jokes when all
> they have is a 2 sentence response that is obviously not meant to be
> helpful.
>
> Jon
>
> On Nov 14, 2016, at 10:25 AM, Ali Akhtar  wrote:
>
> Excuse me? I did not make fun of anyone. I gave valid suggestions that are
> all theoretically possible.
>
> If it came off in a condescending way, i am genuinely sorry.
>
> On 14 Nov 2016 11:22 pm, "Jon Haddad"  wrote:
>
>> You’ve asked a lot of questions on this mailing list, and you’ve gotten
>> help on a ton of beginner issues.  Making fun of someone for asking similar
>> beginner questions is not cool at all.  Cut it out.
>>
>>
>>
>> On Nov 14, 2016, at 10:13 AM, Ali Akhtar  wrote:
>>
>> Another solution could be to print the raw bytes to paper, and write the
>> page numbers to cassandra. Playback will be challenging with this method
>> however, unless interns are available to transcribe the papers back to a
>> digital format.
>>
>> On Mon, Nov 14, 2016 at 11:06 PM, Ali Akhtar 
>> wrote:
>>
>>> The video can be written to floppy diskettes, and the serial numbers of
>>> the diskettes can be written to cassandra.
>>>
>>> On Mon, Nov 14, 2016 at 11:00 PM, Oskar Kjellin >> > wrote:
>>>
 The actual video is not stored in Cassandra. You need to use a proper
 origin like s3.

 Although you can probably store it in Cassandra, it's not a good idea.

 Sent from my iPhone

 > On 14 nov. 2016, at 18:02, raghavendra vutti <
 raghu9raghaven...@gmail.com> wrote:
 >
 > Hi,
 >
 >  Just wanted to know How does hulu or netflix store videos in
 cassandra.
 >
 > Do they just use references to the video files in the form of URL's
 and store in the DB??
 >
 > could someone please me on this.
 >
 >
 > Thanks,
 > Raghavendra.
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >

>>>
>>>
>>
>>
>


Re: Storing videos in cassandra

2016-11-14 Thread Benjamin Roth
Some time ago, I stumbled across this:
https://github.com/chrislusf/seaweedfs
It is an open source implementation of Facebooks Haystack design. Have no
experience yet but we will evaluate it as a blob-store to replace our
Mogile-FS installation which stores over one billion images. From my point
of view it looks very promising and probably much more resource-friendly
for this use case.

Maybe that helps ...

2016-11-14 19:52 GMT+01:00 Jon Haddad :

> While Cassandra *can* be used this way, I don’t recommend it.  It’s going
> to be far cheaper and easier to maintain to store data in an Object store
> like S3, like Oskar recommended.
>
> > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote:
> >
> > We store videos and files in Cassandra by chunking them into small
> portions and saving them as blobs.  As for video you could track the file
> byte offset of each chunk and request the relevant pieces when scrubbing to
> a particular portion of the video.
> >
> >> On Nov 14, 2016, at 11:02 AM, raghavendra vutti <
> raghu9raghaven...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Just wanted to know How does hulu or netflix store videos in cassandra.
> >>
> >> Do they just use references to the video files in the form of URL's and
> store in the DB??
> >>
> >> could someone please me on this.
> >>
> >>
> >> Thanks,
> >> Raghavendra.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Storing videos in cassandra

2016-11-14 Thread Paulo Motta
For the record, there is an interesting use case of globo.com using
Cassandra to store video payload and stream live video at scale (in
particular, the FIFA World Cup + Olympics), but it's a pretty
non-conventional/advanced use case:
-
https://leandromoreira.com.br/2015/04/26/fifa-2014-world-cup-live-stream-architecture/
-
https://www.javacodegeeks.com/2016/06/cassandra-heart-globos-live-streaming-platform.html

2016-11-14 16:56 GMT-02:00 Benjamin Roth :

> Some time ago, I stumbled across this: https://github.com/
> chrislusf/seaweedfs
> It is an open source implementation of Facebooks Haystack design. Have no
> experience yet but we will evaluate it as a blob-store to replace our
> Mogile-FS installation which stores over one billion images. From my point
> of view it looks very promising and probably much more resource-friendly
> for this use case.
>
> Maybe that helps ...
>
> 2016-11-14 19:52 GMT+01:00 Jon Haddad :
>
>> While Cassandra *can* be used this way, I don’t recommend it.  It’s going
>> to be far cheaper and easier to maintain to store data in an Object store
>> like S3, like Oskar recommended.
>>
>> > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote:
>> >
>> > We store videos and files in Cassandra by chunking them into small
>> portions and saving them as blobs.  As for video you could track the file
>> byte offset of each chunk and request the relevant pieces when scrubbing to
>> a particular portion of the video.
>> >
>> >> On Nov 14, 2016, at 11:02 AM, raghavendra vutti <
>> raghu9raghaven...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Just wanted to know How does hulu or netflix store videos in cassandra.
>> >>
>> >> Do they just use references to the video files in the form of URL's
>> and store in the DB??
>> >>
>> >> could someone please me on this.
>> >>
>> >>
>> >> Thanks,
>> >> Raghavendra.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Storing videos in cassandra

2016-11-14 Thread Michael Shuler
Forward thinking, I would also suggest not storing the full URL, just
the video ID of some sort. The application code can create the URL as
needed, using the ID. If the full URL is stored in Cassandra and some
day in the future, the video file storage system needs to be changed,
this would require updating all the records. One could also use multiple
storage systems, based on if the ID has some characteristic..

-- 
Michael

On 11/14/2016 12:00 PM, Oskar Kjellin wrote:
> The actual video is not stored in Cassandra. You need to use a proper origin 
> like s3. 
> 
> Although you can probably store it in Cassandra, it's not a good idea. 
> 
> 
>> On 14 nov. 2016, at 18:02, raghavendra vutti  
>> wrote:
>>
>> Hi,
>>
>>  Just wanted to know How does hulu or netflix store videos in cassandra.
>>
>> Do they just use references to the video files in the form of URL's and 
>> store in the DB??
>>
>> could someone please me on this.
>>
>>
>> Thanks,
>> Raghavendra.



Re: Storing videos in cassandra

2016-11-14 Thread Johan Edstrom
URI comes in pretty handy ; 

video://videoprovider:codecSomething:myConverter:videoId 


Or XRI but what Michael said.

> On Nov 14, 2016, at 11:59 AM, Michael Shuler  wrote:
> 
> Forward thinking, I would also suggest not storing the full URL, just
> the video ID of some sort. The application code can create the URL as
> needed, using the ID. If the full URL is stored in Cassandra and some
> day in the future, the video file storage system needs to be changed,
> this would require updating all the records. One could also use multiple
> storage systems, based on if the ID has some characteristic..
> 
> -- 
> Michael
> 
> On 11/14/2016 12:00 PM, Oskar Kjellin wrote:
>> The actual video is not stored in Cassandra. You need to use a proper origin 
>> like s3. 
>> 
>> Although you can probably store it in Cassandra, it's not a good idea. 
>> 
>> 
>>> On 14 nov. 2016, at 18:02, raghavendra vutti  
>>> wrote:
>>> 
>>> Hi,
>>> 
>>> Just wanted to know How does hulu or netflix store videos in cassandra.
>>> 
>>> Do they just use references to the video files in the form of URL's and 
>>> store in the DB??
>>> 
>>> could someone please me on this.
>>> 
>>> 
>>> Thanks,
>>> Raghavendra.
> 



Re: cassandra python driver routing requests to one node?

2016-11-14 Thread Alex Popescu
I'm wondering if what you are seeing is
https://datastax-oss.atlassian.net/browse/PYTHON-643 (that could still be a
sign of a potential data hotspot)

On Sun, Nov 13, 2016 at 10:57 PM, Andrew Bialecki <
andrew.biale...@klaviyo.com> wrote:

> We're using the "default" TokenAwarePolicy. Our nodes are spread across
> different racks within one datacenter. I've turned on debug logging for the
> Python driver, but it doesn't look like it logs which Casandra node each
> request goes to, but maybe I haven't got the right logging set to debug.
>
> On Mon, Nov 14, 2016 at 12:39 AM, Ben Slater 
> wrote:
>
>> What load balancing policies are you using in your client code (
>> https://datastax.github.io/python-driver/api/cassandra/policies.html)?
>>
>> Cheers
>> Ben
>>
>> On Mon, 14 Nov 2016 at 16:22 Andrew Bialecki 
>> wrote:
>>
>>> We have an odd situation where all of a sudden of our cluster started
>>> seeing a disproportionate number of writes go to one node. We're using the
>>> Python driver version 3.7.1. I'm not sure if this is a driver issue or
>>> possibly a network issue causing requests to get routed in an odd way. It's
>>> not absolute, there are requests going to all nodes.
>>>
>>> Tried restarting the problematic node, no luck (those are the quiet
>>> periods). Tried restarting the clients, also no luck. Checked nodetool
>>> status and ownership is even across the cluster.
>>>
>>> Curious if anyone's seen this behavior before. Seems like the next step
>>> will be to debug the client and see why it's choosing that node.
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> --
>>> AB
>>>
>>
>
>
> --
> AB
>



-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Cassandra 3.6 Repair issue with Reaper

2016-11-14 Thread Abhishek Aggarwal
Hi All,

we tried sequential repair on very small table having only 20 Rows using
the reaper tool. But the repair got stuck while generating the snapshot.


Same when we tried with Parallel repair then run was working fine in the
begining for few segments but later it got stuck in the compaction and
never got completed and one of the node was shown down to other nodes due
to gossip issue and we had to do rolling restart of all the nodes.


In both the cases 2 nodes out of 6 nodes are getting stuck either in
compaction or in generating snapshot.


Abhishek Aggarwal

*Senior Software Engineer*
*M*: +91 8861212073 , 8588840304
*T*: 0124 6600600 *EXT*: 12128
ASF Center -A, ASF Center Udyog Vihar Phase IV,
Download Our App
[image: A]

[image:
A]

[image:
W]



High system CPU during high write workload

2016-11-14 Thread Abhishek Gupta
Hi,

We are seeing an issue where the system CPU is shooting off to a figure or
> 90% when the cluster is subjected to a relatively high write workload i.e
4k wreq/secs.

2016-11-14T13:27:47.900+0530 Process summary
  process cpu=695.61%
  application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High
System CPU *
  other: cpu=19.49%
  heap allocation rate *403mb*/s
[000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129
[000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34
[000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56
[000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79
[000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78
[000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41

On doing strace it was found that the following system call is consuming
all the system CPU
 timeout 10s strace -f -p 5954 -c -q
% time seconds  usecs/call callserrors syscall
-- --- --- - - 

*88.33 1712.798399   16674102723 22191 futex* 3.98   77.098730
   4356 17700   read
 3.27   63.474795  394253   16129 restart_syscall
 3.23   62.601530   29768  2103   epoll_wait

On searching we found the following bug with the RHEL 6.6, CentOS 6.6
kernel seems to be a probable cause for the issue:

https://docs.datastax.com/en/landing_page/doc/landing_page/
troubleshooting/cassandra/fetuxWaitBug.html

The patch fix mentioned in the doc is also not present in our kernel.

sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref
- [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347]
{CVE-2010-0623}

Can some who has faced and resolved this issue help us here.

Thanks,
Abhishek


Re: High system CPU during high write workload

2016-11-14 Thread Ben Bromhead
Hi Abhishek

The article with the futex bug description lists the solution, which is to
upgrade to a version of RHEL or CentOS that have the specified patch.

What help do you specifically need? If you need help upgrading the OS I
would look at the documentation for RHEL or CentOS.

Ben

On Mon, 14 Nov 2016 at 22:48 Abhishek Gupta 
wrote:

Hi,

We are seeing an issue where the system CPU is shooting off to a figure or
> 90% when the cluster is subjected to a relatively high write workload i.e
4k wreq/secs.

2016-11-14T13:27:47.900+0530 Process summary
  process cpu=695.61%
  application cpu=676.11% (*user=200.63% sys=475.49%) **<== Very High
System CPU *
  other: cpu=19.49%
  heap allocation rate *403mb*/s
[000533] user= 1.43% sys= 6.91% alloc= 2216kb/s - SharedPool-Worker-129
[000274] user= 0.38% sys= 7.78% alloc= 2415kb/s - SharedPool-Worker-34
[000292] user= 1.24% sys= 6.77% alloc= 2196kb/s - SharedPool-Worker-56
[000487] user= 1.24% sys= 6.69% alloc= 2260kb/s - SharedPool-Worker-79
[000488] user= 1.24% sys= 6.56% alloc= 2064kb/s - SharedPool-Worker-78
[000258] user= 1.05% sys= 6.66% alloc= 2250kb/s - SharedPool-Worker-41

On doing strace it was found that the following system call is consuming
all the system CPU
 timeout 10s strace -f -p 5954 -c -q
% time seconds  usecs/call callserrors syscall
-- --- --- - - 

*88.33 1712.798399   16674102723 22191 futex* 3.98   77.098730
   4356 17700   read
 3.27   63.474795  394253   16129 restart_syscall
 3.23   62.601530   29768  2103   epoll_wait

On searching we found the following bug with the RHEL 6.6, CentOS 6.6
kernel seems to be a probable cause for the issue:

https://docs.datastax.com/en/landing_page/doc/landing_page/troubleshooting/cassandra/fetuxWaitBug.html

The patch fix mentioned in the doc is also not present in our kernel.

sudo rpm -q --changelog kernel-`uname -r` | grep futex | grep ref
- [kernel] futex_lock_pi() key refcnt fix (Danny Feng) [566347]
{CVE-2010-0623}

Can some who has faced and resolved this issue help us here.

Thanks,
Abhishek


-- 
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer


Re: Cassandra 3.6 Repair issue with Reaper

2016-11-14 Thread Alexander Dejanovski
Hi Abhishek,

Can you check if you're getting the same behavior on this cluster using
nodetool commands to start repair ? (don't forget to add --full in order to
make sure you're not running incremental repair, if that's indeed what
you're doing with reaper).
Could you also open an issue on github for this, with the useful logs you
can get from your Cassandra nodes ?
https://github.com/thelastpickle/cassandra-reaper/issues

Thanks,

On Tue, Nov 15, 2016 at 7:00 AM Abhishek Aggarwal <
abhishek.aggarwa...@snapdeal.com> wrote:

> Hi All,
>
> we tried sequential repair on very small table having only 20 Rows using
> the reaper tool. But the repair got stuck while generating the snapshot.
>
>
> Same when we tried with Parallel repair then run was working fine in the
> begining for few segments but later it got stuck in the compaction and
> never got completed and one of the node was shown down to other nodes due
> to gossip issue and we had to do rolling restart of all the nodes.
>
>
> In both the cases 2 nodes out of 6 nodes are getting stuck either in
> compaction or in generating snapshot.
>
>
> Abhishek Aggarwal
>
> *Senior Software Engineer*
> *M*: +91 8861212073 <+91%2088612%2012073> , 8588840304
> *T*: 0124 6600600 *EXT*: 12128
> ASF Center -A, ASF Center Udyog Vihar Phase IV,
> Download Our App
> [image: A]
> 
>  [image:
> A]
> 
>  [image:
> W]
> 
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: cassandra python driver routing requests to one node?

2016-11-14 Thread Andrew Bialecki
Is the node selection based on key deterministic across multiple clients?
If it is, that sounds plausible. For this particular workload it's
definitely possible to have a hot key / spot, but it was surprising it
wasn't three nodes that got hot, it was just one.

On Mon, Nov 14, 2016 at 6:26 PM, Alex Popescu  wrote:

> I'm wondering if what you are seeing is https://datastax-oss.
> atlassian.net/browse/PYTHON-643 (that could still be a sign of a
> potential data hotspot)
>
> On Sun, Nov 13, 2016 at 10:57 PM, Andrew Bialecki <
> andrew.biale...@klaviyo.com> wrote:
>
>> We're using the "default" TokenAwarePolicy. Our nodes are spread across
>> different racks within one datacenter. I've turned on debug logging for the
>> Python driver, but it doesn't look like it logs which Casandra node each
>> request goes to, but maybe I haven't got the right logging set to debug.
>>
>> On Mon, Nov 14, 2016 at 12:39 AM, Ben Slater 
>> wrote:
>>
>>> What load balancing policies are you using in your client code (
>>> https://datastax.github.io/python-driver/api/cassandra/policies.html)?
>>>
>>> Cheers
>>> Ben
>>>
>>> On Mon, 14 Nov 2016 at 16:22 Andrew Bialecki <
>>> andrew.biale...@klaviyo.com> wrote:
>>>
 We have an odd situation where all of a sudden of our cluster started
 seeing a disproportionate number of writes go to one node. We're using the
 Python driver version 3.7.1. I'm not sure if this is a driver issue or
 possibly a network issue causing requests to get routed in an odd way. It's
 not absolute, there are requests going to all nodes.

 Tried restarting the problematic node, no luck (those are the quiet
 periods). Tried restarting the clients, also no luck. Checked nodetool
 status and ownership is even across the cluster.

 Curious if anyone's seen this behavior before. Seems like the next step
 will be to debug the client and see why it's choosing that node.

 [image: Inline image 1]


 --
 AB

>>>
>>
>>
>> --
>> AB
>>
>
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>
>
>
>


-- 
AB