Re: TWCS sstables gets merged following node removal
interesting jeff, thank you. ok so this is regarding new data merging with old data, what about old sstables that were suddenly merged on many nodes (as if i ran alter table to size tiered)? i do not have the sstables themselfs now, but it is definitly something that happened, one day we had sstables grouped by windows all working as planned, and a week later all the sstables had timestamp of last couple of days, all this happened on multiple tables configured with twcs. is there something you know that might cause such a thing? if i understand correctly once i have an sstable with a max timestamp tha is older then the defined window, it should never be part of a compaction set. dor - thanks, when is that version planned to be released? gil On Wed, Dec 19, 2018 at 8:38 PM Jeff Jirsa wrote: > Yes it can cause issues. > No there's no way to disable it in any current release (I think it finally > landed to disable it in 4.0, but dont have the JIRA handy) > > https://issues.apache.org/jira/browse/CASSANDRA-13418 was added to 3.11.1 > and higher to let people consciously say "ignore overlaps from read repair > and just drop expired data when it's expired". > > > > On Wed, Dec 19, 2018 at 3:40 AM Gil Ganz wrote: > >> sounds like the foreground read repair can cause issues to twcs (mix old >> and new data in same sstable), is there a way to disable the foreground >> read repair? is that indeed the case that it's problematic? >> >> >> >> On Mon, Dec 17, 2018 at 9:21 AM Gil Ganz wrote: >> >>> hey jeff, attaching more information. >>> so this the situation before - 3 nodes in the cluster (3.11.3 in this >>> case but i saw same thing in 2.1 and 3.0), there is a script writing one >>> row every minute and another script doing nodetool flush every 10 minute. >>> window is defined as two hours, so after a few days this is how the >>> directory listing looks : >>> >>> drwxr-xr-x 2 cassandra cassandra 4096 Dec 11 10:38 backups >>> -rw-r--r-- 1 cassandra cassandra 646 Dec 12 05:25 mc-171-big-Index.db >>> -rw-r--r-- 1 cassandra cassandra 104 Dec 12 05:25 mc-171-big-Filter.db >>> -rw-r--r-- 1 cassandra cassandra 56 Dec 12 05:25 mc-171-big-Summary.db >>> -rw-r--r-- 1 cassandra cassandra 3561 Dec 12 05:25 mc-171-big-Data.db >>> -rw-r--r-- 1 cassandra cassandra 10 Dec 12 05:25 >>> mc-171-big-Digest.crc32 >>> -rw-r--r-- 1 cassandra cassandra 59 Dec 12 05:25 >>> mc-171-big-CompressionInfo.db >>> -rw-r--r-- 1 cassandra cassandra 4893 Dec 12 05:25 >>> mc-171-big-Statistics.db >>> -rw-r--r-- 1 cassandra cassandra 92 Dec 12 05:25 mc-171-big-TOC.txt >>> -rw-r--r-- 1 cassandra cassandra 565 Dec 12 05:25 mc-172-big-Index.db >>> -rw-r--r-- 1 cassandra cassandra 96 Dec 12 05:25 mc-172-big-Filter.db >>> -rw-r--r-- 1 cassandra cassandra 56 Dec 12 05:25 mc-172-big-Summary.db >>> -rw-r--r-- 1 cassandra cassandra 3475 Dec 12 05:25 mc-172-big-Data.db >>> -rw-r--r-- 1 cassandra cassandra 10 Dec 12 05:25 >>> mc-172-big-Digest.crc32 >>> -rw-r--r-- 1 cassandra cassandra 59 Dec 12 05:25 >>> mc-172-big-CompressionInfo.db >>> -rw-r--r-- 1 cassandra cassandra 4865 Dec 12 05:25 >>> mc-172-big-Statistics.db >>> -rw-r--r-- 1 cassandra cassandra 92 Dec 12 05:25 mc-172-big-TOC.txt >>> -rw-r--r-- 1 cassandra cassandra 637 Dec 12 05:25 mc-173-big-Index.db >>> -rw-r--r-- 1 cassandra cassandra 104 Dec 12 05:25 mc-173-big-Filter.db >>> -rw-r--r-- 1 cassandra cassandra 56 Dec 12 05:25 mc-173-big-Summary.db >>> -rw-r--r-- 1 cassandra cassandra 3678 Dec 12 05:25 mc-173-big-Data.db >>> -rw-r--r-- 1 cassandra cassandra 10 Dec 12 05:25 >>> mc-173-big-Digest.crc32 >>> -rw-r--r-- 1 cassandra cassandra 59 Dec 12 05:25 >>> mc-173-big-CompressionInfo.db >>> -rw-r--r-- 1 cassandra cassandra 92 Dec 12 05:25 mc-173-big-TOC.txt >>> -rw-r--r-- 1 cassandra cassandra 4888 Dec 12 05:25 >>> mc-173-big-Statistics.db >>> . >>> . >>> -rw-r--r-- 1 cassandra cassandra 340 Dec 15 20:10 mc-873-big-Index.db >>> -rw-r--r-- 1 cassandra cassandra 64 Dec 15 20:10 mc-873-big-Filter.db >>> -rw-r--r-- 1 cassandra cassandra 56 Dec 15 20:10 mc-873-big-Summary.db >>> -rw-r--r-- 1 cassandra cassandra 1910 Dec 15 20:10 mc-873-big-Data.db >>> -rw-r--r-- 1 cassandra cassandra 10 Dec 15 20:10 >>> mc-873-big-Digest.crc32 >>> -rw-r--r-- 1 cassandra cassandra 51 Dec 15 20:10 >>> mc-873-big-CompressionInfo.db >>> -rw-r--r-- 1 cassandra cassandra 4793 Dec 15 20:10 >>> mc-873-big-Statistics.db >>> -rw-r--r-- 1 cassandra cassandra 92 Dec 15 20:10 mc-873-big-TOC.txt >>> . >>> . >>> . >>> -rw-r--r-- 1 cassandra cassandra 24 Dec 17 06:50 mc-1150-big-Filter.db >>> -rw-r--r-- 1 cassandra cassandra 51 Dec 17 06:50 mc-1150-big-Index.db >>> -rw-r--r-- 1 cassandra cassandra 56 Dec 17 06:50 mc-1150-big-Summary.db >>> -rw-r--r-- 1 cassandra cassandra 10 Dec 17 06:50 >>> mc-1150-big-Digest.crc32 >>> -rw-r--r-- 1 cassandra cassandra 226 Dec 17 06:50 mc-1150-big-Data.db >>> -rw-r--r-- 1 cassandra cassandra 43 Dec 17 06:50 >>> mc-1150-big-CompressionInfo.db >>> -rw-r
Re: Optimizing for connections
See inline Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Dec 9, 2018, 2:02 PM -0500, Devaki, Srinivas , wrote: > Hi Guys, > > Have a couple of questions regarding the connections to cassandra, > > 1. What are the recommended number of connections per cassandra node? Depends on hardware. > 2. Is it a good idea to create coordinator nodes(with `num_token: 0`) and > whitelisting only those hosts from client side? so that I can isolate main > worker don't need to work on connection threads Defeats the purpose of having a masterless system. > 3. does the request time on client side include connect time? Who is measuring? > 4. Is there any hard limit on number of connections that can be set on > cassandra? > Read : https://stackoverflow.com/questions/33562374/cassandra-throttling-workload > Thanks a lot for your help >
Re: Alter table
If you use collections such as a map you could get by with just upserts. A collection in a column gives you the ability to have “flexible” schema for your “documents” as in mongo while the regular fields can act as “records” as in a more Traditional table. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Dec 17, 2018, 4:45 PM -0500, Mark Furlong , wrote: > Why would I want to use alter table vs upserts with the new document format? > > Mark Furlong > Sr. Database Administrator > mfurl...@ancestry.com > M: 801-859-7427 > O: 801-705-7115 > 1300 W Traverse Pkwy > Lehi, UT 84043 > > > > >
Re: C* as fluent data storage, 10MB/sec/node?
Agree with JEFF in twcs. Also look At https://github.com/paradoxical-io/cassieq for reference. Good ideas for a queue on Cassandra. Rahul Singh Chief Executive Officer m 202.905.2818 Anant Corporation 1010 Wisconsin Ave NW, Suite 250 Washington, D.C. 20007 We build and manage digital business technology platforms. On Nov 28, 2018, 5:33 PM -0500, Adam Smith , wrote: > Thanks for the excellent advice, this was extremely helpful! Did not know > about TWCS... curing a lot of headache. > > Adam > > > Am Mi., 28. Nov. 2018 um 20:47 Uhr schrieb Jeff Jirsa : > > > Probably fine as long as there’s some concept of time in the partition > > > key to keep them from growing unbounded. > > > > > > Use TWCS, TTLs and something like 5-10 minute buckets. Don’t use RF=1, > > > but you can write at CL ONE. TWCS will largely just drop whole sstables > > > as they expire (especially with 3.11 and the more aggressive expiration > > > logic there) > > > > > > > > > > > > -- > > > Jeff Jirsa > > > > > > > > > > On Nov 28, 2018, at 11:24 AM, Adam Smith > > > > wrote: > > > > > > > > Hi All, > > > > > > > > I need to use C* somehow as fluent data storage - maybe this is > > > > different to the queue antipattern? Lots of data come in > > > > (10MB/sec/node), remains for e.g. 1 hour and should then be evicted. It > > > > is somehow not critical when data would occasionally disappear/get lost. > > > > > > > > Thankful for any advice! > > > > > > > > Is this nowadays possible without suffering too much from compactation? > > > > I would not have ranged tombstones, and depending on a possible > > > > solution only using point deletes (PK+CK). There is only one CK, could > > > > also be empty. > > > > > > > > 1) The data is usually 1 MB. Can I just update with empty data? PK + CK > > > > would remain, but I would not carry about that. Would this create > > > > tombstones or is equivalent to a DELETE? > > > > > > > > 2) Like 1) and later then set a TTL == small amount of data to be > > > > deleted then? And hopefully small compactation? > > > > > > > > 3) Simply setting TTL 1h and hoping the best, because I am wrong with > > > > my worries? > > > > > > > > 4) Any optimization strategies like setting the RF to 1? Which > > > > compactation strategy is advised? > > > > > > > > 5) Are there any recent performance benchmarks for one of the scenarios? > > > > > > > > What else could I do? > > > > > > > > Thanks a lot! > > > > Adam > > > > > > - > > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: user-h...@cassandra.apache.org > > >