date:20181220

Re: TWCS sstables gets merged following node removal

2018-12-20 Thread Gil Ganz

interesting jeff, thank you. ok so this is regarding new data merging with
old data, what about old sstables that were suddenly merged on many nodes
(as if i ran alter table to size tiered)? i do not have the sstables
themselfs now, but it is definitly something that happened, one day we had
sstables grouped by windows all working as planned, and a week later all
the sstables had timestamp of last couple of days, all this happened on
multiple tables configured with twcs. is there something you know that
might cause such a thing? if i understand correctly once i have an sstable
with a max timestamp tha is older then the defined window, it should never
be part of a compaction set.

dor - thanks, when is that version planned to be released?
gil


On Wed, Dec 19, 2018 at 8:38 PM Jeff Jirsa  wrote:

> Yes it can cause issues.
> No there's no way to disable it in any current release (I think it finally
> landed to disable it in 4.0, but dont have the JIRA handy)
>
> https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to 3.11.1
> and higher to let people consciously say "ignore overlaps from read repair
> and just drop expired data when it's expired".
>
>
>
> On Wed, Dec 19, 2018 at 3:40 AM Gil Ganz  wrote:
>
>> sounds like the foreground read repair can cause issues to twcs (mix old
>> and new data in same sstable), is there a way to disable the foreground
>> read repair? is that indeed the case that it's problematic?
>>
>>
>>
>> On Mon, Dec 17, 2018 at 9:21 AM Gil Ganz  wrote:
>>
>>> hey jeff, attaching more information.
>>> so this the situation before - 3 nodes in the cluster (3.11.3 in this
>>> case but i saw same thing in 2.1 and 3.0), there is a script writing one
>>> row every minute and another script doing nodetool flush every 10 minute.
>>> window is defined as two hours, so after a few days this is how the
>>> directory listing looks :
>>>
>>> drwxr-xr-x 2 cassandra cassandra 4096 Dec 11 10:38 backups
>>> -rw-r--r-- 1 cassandra cassandra  646 Dec 12 05:25 mc-171-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra  104 Dec 12 05:25 mc-171-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra   56 Dec 12 05:25 mc-171-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra 3561 Dec 12 05:25 mc-171-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra   10 Dec 12 05:25
>>> mc-171-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   59 Dec 12 05:25
>>> mc-171-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra 4893 Dec 12 05:25
>>> mc-171-big-Statistics.db
>>> -rw-r--r-- 1 cassandra cassandra   92 Dec 12 05:25 mc-171-big-TOC.txt
>>> -rw-r--r-- 1 cassandra cassandra  565 Dec 12 05:25 mc-172-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra   96 Dec 12 05:25 mc-172-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra   56 Dec 12 05:25 mc-172-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra 3475 Dec 12 05:25 mc-172-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra   10 Dec 12 05:25
>>> mc-172-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   59 Dec 12 05:25
>>> mc-172-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra 4865 Dec 12 05:25
>>> mc-172-big-Statistics.db
>>> -rw-r--r-- 1 cassandra cassandra   92 Dec 12 05:25 mc-172-big-TOC.txt
>>> -rw-r--r-- 1 cassandra cassandra  637 Dec 12 05:25 mc-173-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra  104 Dec 12 05:25 mc-173-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra   56 Dec 12 05:25 mc-173-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra 3678 Dec 12 05:25 mc-173-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra   10 Dec 12 05:25
>>> mc-173-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   59 Dec 12 05:25
>>> mc-173-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra   92 Dec 12 05:25 mc-173-big-TOC.txt
>>> -rw-r--r-- 1 cassandra cassandra 4888 Dec 12 05:25
>>> mc-173-big-Statistics.db
>>> .
>>> .
>>> -rw-r--r-- 1 cassandra cassandra  340 Dec 15 20:10 mc-873-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra   64 Dec 15 20:10 mc-873-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra   56 Dec 15 20:10 mc-873-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra 1910 Dec 15 20:10 mc-873-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra   10 Dec 15 20:10
>>> mc-873-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra   51 Dec 15 20:10
>>> mc-873-big-CompressionInfo.db
>>> -rw-r--r-- 1 cassandra cassandra 4793 Dec 15 20:10
>>> mc-873-big-Statistics.db
>>> -rw-r--r-- 1 cassandra cassandra   92 Dec 15 20:10 mc-873-big-TOC.txt
>>> .
>>> .
>>> .
>>> -rw-r--r-- 1 cassandra cassandra   24 Dec 17 06:50 mc-1150-big-Filter.db
>>> -rw-r--r-- 1 cassandra cassandra   51 Dec 17 06:50 mc-1150-big-Index.db
>>> -rw-r--r-- 1 cassandra cassandra   56 Dec 17 06:50 mc-1150-big-Summary.db
>>> -rw-r--r-- 1 cassandra cassandra   10 Dec 17 06:50
>>> mc-1150-big-Digest.crc32
>>> -rw-r--r-- 1 cassandra cassandra  226 Dec 17 06:50 mc-1150-big-Data.db
>>> -rw-r--r-- 1 cassandra cassandra   43 Dec 17 06:50
>>> mc-1150-big-CompressionInfo.db
>>> -rw-r

Re: Optimizing for connections

2018-12-20 Thread Rahul Singh

See inline

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Dec 9, 2018, 2:02 PM -0500, Devaki, Srinivas , wrote:
> Hi Guys,
>
> Have a couple of questions regarding the connections to cassandra,
>
> 1. What are the recommended number of connections per cassandra node?

Depends on hardware.

> 2. Is it a good idea to create coordinator nodes(with `num_token: 0`) and 
> whitelisting only those hosts from client side? so that I can isolate main 
> worker don't need to work on connection threads

Defeats the purpose of having a masterless system.

> 3. does the request time on client side include connect time?

Who is measuring?


> 4. Is there any hard limit on number of connections that can be set on 
> cassandra?
>

Read : 
https://stackoverflow.com/questions/33562374/cassandra-throttling-workload

> Thanks a lot for your help
>

Re: Alter table

2018-12-20 Thread Rahul Singh

If you use collections such as a map you could get by with just 
upserts. A collection in a column gives you the ability to have “flexible” 
schema for your “documents” as in mongo while the regular fields can act as 
“records” as in a more
Traditional table.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Dec 17, 2018, 4:45 PM -0500, Mark Furlong , wrote:
> Why would I want to use alter table vs upserts with the new document format?
>
> Mark Furlong
> Sr. Database Administrator
> mfurl...@ancestry.com
> M: 801-859-7427
> O: 801-705-7115
> 1300 W Traverse Pkwy
> Lehi, UT 84043
>
>
>
>
>

Re: C* as fluent data storage, 10MB/sec/node?

2018-12-20 Thread Rahul Singh

Agree with JEFF in twcs. Also look
At https://github.com/paradoxical-io/cassieq for reference. Good ideas for a 
queue on Cassandra.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Nov 28, 2018, 5:33 PM -0500, Adam Smith , wrote:
> Thanks for the excellent advice, this was extremely helpful! Did not know 
> about TWCS... curing a lot of headache.
>
> Adam
>
> > Am Mi., 28. Nov. 2018 um 20:47 Uhr schrieb Jeff Jirsa :
> > > Probably fine as long as there’s some concept of time in the partition 
> > > key to keep them from growing unbounded.
> > >
> > > Use TWCS, TTLs and something like 5-10 minute buckets. Don’t use RF=1, 
> > > but you can write at CL ONE. TWCS will largely just drop whole sstables 
> > > as they expire (especially with 3.11 and the more aggressive expiration 
> > > logic there)
> > >
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Nov 28, 2018, at 11:24 AM, Adam Smith  
> > > > wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I need to use C* somehow as fluent data storage - maybe this is 
> > > > different to the queue antipattern? Lots of data come in 
> > > > (10MB/sec/node), remains for e.g. 1 hour and should then be evicted. It 
> > > > is somehow not critical when data would occasionally disappear/get lost.
> > > >
> > > > Thankful for any advice!
> > > >
> > > > Is this nowadays possible without suffering too much from compactation? 
> > > > I would not have ranged tombstones, and depending on a possible 
> > > > solution only using point deletes (PK+CK). There is only one CK, could 
> > > > also be empty.
> > > >
> > > > 1) The data is usually 1 MB. Can I just update with empty data? PK + CK 
> > > > would remain, but I would not carry about that. Would this create 
> > > > tombstones or is equivalent to a DELETE?
> > > >
> > > > 2) Like 1) and later then set a TTL == small amount of data to be 
> > > > deleted then? And hopefully small compactation?
> > > >
> > > > 3) Simply setting TTL 1h and hoping the best, because I am wrong with 
> > > > my worries?
> > > >
> > > > 4) Any optimization strategies like setting the RF to 1? Which 
> > > > compactation strategy is advised?
> > > >
> > > > 5) Are there any recent performance benchmarks for one of the scenarios?
> > > >
> > > > What else could I do?
> > > >
> > > > Thanks a lot!
> > > > Adam
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: user-h...@cassandra.apache.org
> > >

Re: TWCS sstables gets merged following node removal

Re: Optimizing for connections

Re: Alter table

Re: C* as fluent data storage, 10MB/sec/node?

4 matches

Site Navigation

Mail list logo

Footer information