Thank you, Scott and everyone for all the feedback you have given me.

Answering Scott’s questions,

– The CEP includes a section labeled "Setup used to generate results above", 
but I don't see a summary of results, graphs, or benchmark details. Could you 
point me to them or upload to the Confluence wiki?

I have updated the CEP document with results and workloads we used. Please let 
me know if you need additional information.

– How could the Apache Cassandra project exercise this feature with 
acceleration in our CI? I doubt our current hardware supports this feature, 
which would make it untestable as part of the project's release process if so.

There are a few options available like Intel Developer Cloud which provides 
virtual access to Intel's latest hardware, including QAT-enabled processors, 
allowing developers to test and optimize applications without needing physical 
hardware.

 – Are there cloud providers that support it; and if so who and which instance 
types? I see that AWS offers QAT in the r7i instance family – but only for 
"metal" instances which are much more expensive and uncommon to use: 
https://aws.amazon.com/ec2/instance-types/r7i/.

Currently, US-based cloud providers like GCP and AWS do not offer QAT-enabled 
VMs, with AWS providing QAT support only in metal instances within the r7i 
family, like you mentioned. However, cloud service providers in other 
geographies, such as Alibaba Cloud, do offer QAT VM access, providing a more 
flexible option for utilizing QAT in the cloud.

– Outside of cloud providers, how does one determine if a given processor 
supports QAT (i.e., a specific list of chips)? I've tried browsing the QAT 
website but the PDFs don't seem to contain this information, and I don't see 
QAT as a filterable attribute on ark.apple.com.

To determine if a given processor supports QAT, the lspci command can be used, 
which is documented in Intel's QAT Library 
Requirements<https://intel.github.io/quickassist/qatlib/requirements.html#supported-devices>.
The command is:
echo `(lspci -d 8086:4940 && lspci -d 8086:4941 && lspci -d 8086:4942 && lspci 
-d 8086:4943 && lspci -d 8086:4944 && lspci -d 8086:4945 && lspci -d 8086:4946 
&& lspci -d 8086:4947) | wc -l` supported devices found.
This will help identify supported devices on the system.

– Can you describe Intel's commitment to long-term maintenance/support of this 
feature?

Valid point, Intel is dedicated to the long-term evolution and maintenance of 
its technologies, like QAT and in maintaining and testing its associated 
software, be it in-tree or out-of-tree.

 -I'd feel more comfortable with this proposal in-tree if it were to take 
advantage of a standard and widely-available instruction set or extension (like 
AES-NI, AVX, or NEON).
 But if it requires specific chip models or a dedicated PCIe card, metal-only 
cloud instances, and a custom Linux driver to enable, it would seem most 
appropriate for the feature to live out of tree but possible for users of QAT 
to enable via adding the jar to their classpath.

Regarding integration, QAT kernel drivers are integrated in-tree, meaning they 
are part of the standard Linux kernel distribution. The drivers will be 
maintained and updated alongside the kernel. Also, user-space libraries for QAT 
are available through standard package repositories, making them easy to 
install and manage.


Based on all the feedback, looks like the general consensus is to have the 
capability pluggable for now. We will modify the code and come back with the 
changes in a few weeks.

Thank you,
Shylaja


From: C. Scott Andreas <sc...@paradoxica.net>
Sent: Monday, June 09, 2025 8:51 PM
To: dev@cassandra.apache.org
Cc: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-49: Hardware-accelerated compression

Shylaja, thanks for your proposal and messages on this thread.

I share Jeff's questions and have a couple more.

I appreciate that there is a software fallback, but want to ensure that members 
of the development community are able to test this feature; and to understand 
Intel's long-term commitment to evolving and maintaining a vendor- and 
model-specific codec family in Apache Cassandra.

– The CEP includes a section labeled "Setup used to generate results above", 
but I don't see a summary of results, graphs, or benchmark details. Could you 
point me to them or upload to the Confluence wiki?

– How could the Apache Cassandra project exercise this feature with 
acceleration in our CI? I doubt our current hardware supports this feature, 
which would make it untestable as part of the project's release process if so.

– Are there cloud providers that support it; and if so who and which instance 
types? I see that AWS offers QAT in the r7i instance family – but only for 
"metal" instances which are much more expensive and uncommon to use: 
https://aws.amazon.com/ec2/instance-types/r7i/.

– Outside of cloud providers, how does one determine if a given processor 
supports QAT (i.e., a specific list of chips)? I've tried browsing the QAT 
website but the PDFs don't seem to contain this information, and I don't see 
QAT as a filterable attribute on ark.apple.com.

– Is there a member of the Cassandra community (or Intel directly) who commits 
to run the database in a production capacity using one or more QAT codecs?

– Can you describe Intel's commitment to long-term maintenance/support of this 
feature?

I'd feel more comfortable with this proposal in-tree if it were to take 
advantage of a standard and widely-available instruction set or extension (like 
AES-NI, AVX, or NEON).

But if it requires specific chip models or a dedicated PCIe card, metal-only 
cloud instances, and a custom Linux driver to enable, it would seem most 
appropriate for the feature to live out of tree but possible for users of QAT 
to enable via adding the jar to their classpath.

– Scott

On Jun 9, 2025, at 3:13 PM, "Kokoori, Shylaja" 
<shylaja.koko...@intel.com<mailto:shylaja.koko...@intel.com>> wrote:


Hi Jeff,
Thank you very much for your response. I understand your concern. Here are some 
details, hope it helps.
Customers have access to Intel Product SKUs with or without Intel® QAT.  Our SW 
Architecture abstracts the use of the Intel® QAT Hardware such that if it is 
not present on a given Intel CPU, or the Customer is using a competitive Intel 
Architecture CPU, the QATZip Library shown below can call the SW Compression 
Library instead of calling for a Hardware response of the Intel® QAT 
Accelerator.  Therefore customers can use the same SW architecture whether or 
not the HW accelerator is present.
<image001.png>

See the following Technical Paper on using Intel® QAT and the QATZip Library.

This paper discusses why QATzip exists, its applications, value for developers, 
and reviews the performance gains.
Intel® QuickAssist Technology - Deliver Compression Efficiencies in the Cloud 
with Intel® QAT and QATzip Solution 
Brief<https://www.intel.com/content/www/us/en/content-details/767068/intel-quickassist-technology-deliver-compression-efficiencies-in-the-cloud-with-intel-qat-and-qatzip-solution-brief.html>

Thank you,
Shylaja
From: Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>>
Sent: Thursday, June 05, 2025 12:13 PM
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Cc: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-49: Hardware-accelerated compression

One perpetual challenge with customizing codebase for  dedicated hardware is 
ongoing support / testing / maintenance followed by ensuring vendor agnostic / 
neutral access

QAT is one of those things that’s great for a set of people paying for it, but 
I don’t know any current contributors who have access - does anyone not at 
Intel actually have access to QAT or does intel expect to use this in 
production yourselves (are you running a cluster where this fix improves your 
life or are you proposing this as a way to benefit your customers, or both)?





On Jun 5, 2025, at 11:52 AM, Kokoori, Shylaja 
<shylaja.koko...@intel.com<mailto:shylaja.koko...@intel.com>> wrote:

Hi everyone,

We would like to propose hardware accelerated compression in Cassandra, CEP-49: 
Hardware-accelerated 
compression<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-49%3A+Hardware-accelerated+compression>

As load on Cassandra servers increase, performance of compress/decompress 
operations starts becoming a bottleneck.
Our goal via this CEP is to offload these operations to dedicated hardware 
accelerators and free up the CPUs.

We'd really appreciate your feedback on this proposal.

Thank you,
Shylaja


Reply via email to