Re: Work in Progress - Awesome Cassandra Resources w/ Outline

2018-08-09 Thread Horia Mocioi
Hello Rahul, Great compilation of resources. Maybe add this one on the Blogs category? https://lostechies.com/ryansv ihla/tags This one is also quite good, I would say https://academy.datastax.com/s upport-blog/deeper-dive-diagnosing-dse-performance-issues-ttop-and- multidump And since now ther

Re: Compression Tuning Tutorial

2018-08-09 Thread Kyrylo Lebediev
Thank you Jon, great article as usually! One topic that was discussed in the article is filesystem cache which is traditionally leveraged for data caching in Cassandra (with row-caching disabled by default). IIRC mmap() is used. Some RDBMS and NoSQL DB's as well use direct I/O + async I/O + m

RE: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassandra

2018-08-09 Thread Durity, Sean R
DataStax Enterprise 6.0 has a new bulk loader tool. DSE is a commercial product, but maybe your needs are worth the investigation. Sean Durity From: Rahul Singh Sent: Tuesday, August 07, 2018 9:37 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassa

Re: about cassandra..

2018-08-09 Thread Elliott Sims
Deflate instead of LZ4 will probably give you somewhat better compression at the cost of a lot of CPU. Larger chunk length might also help, but in most cases you probably won't see much benefit above 64K (and it will increase I/O load). On Wed, Aug 8, 2018 at 11:18 PM, Eunsu Kim wrote: > Hi all

Re: about cassandra..

2018-08-09 Thread Jeff Jirsa
Agreed about deflate. Also you can adjust your chunk size, which may help ratios as well, especially if you expect your data to compress well - often larger chunks will compress better, but it depends on the nature of your data. In the near future, look for work from Sushma @ Instagram to make av

Re: Compression Tuning Tutorial

2018-08-09 Thread Jonathan Haddad
There's a discussion about direct I/O here you might find interesting: https://issues.apache.org/jira/browse/CASSANDRA-14466 I suspect the main reason is that O_DIRECT wasn't added till Java 10, and while it could be used with some workarounds, there's a lot of entropy around changing something li

Re: Huge daily outbound network traffic

2018-08-09 Thread Behnam B.Marandi
I don't have any external process or planed repair in that time period. In case of network, I can see outbound network on Cassandra node network interface but couldn't find any way to check the VPC network to make sure it is not going out of network. Maybe the only way is analysing VPC Flow Log. B.