Hi Mohammed,
Zitat von Mohammed Guller <moham...@glassbeam.com>:
Hi -
We have an ETL application that reads all rows from Cassandra
(2.1.2), filters them and stores a small subset in an RDBMS. Our
application is using Datastax's Java driver (2.1.4) to fetch data
from the C* nodes. Since the Java driver supports automatic paging,
I was under the impression that SELECT queries should not cause an
OOM error on the C* nodes. However, even with just 16GB data on each
nodes, the C* nodes start throwing OOM error as soon as the
application starts iterating through the rows of a table.
The application code looks something like this:
Statement stmt = new SimpleStatement("SELECT x,y,z FROM
cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
row = rs.one()
process(row)
}
Even after we reduced the page size to 1000, the C* nodes still
crash. C* is running on M3.xlarge machines (4-cores, 15GB).
I've been running a few tests to determine the effect of
setFetchSize() on heap pressure on the Cassandra nodes and came to the
conclusion that a limit of "500" is much more helpful than values
above "1000"... with too high values, we managed to put that much
pressure on the nodes that we had to restart them.
This, btw, leaves a lot of operational risk for production use. I've
i.e. found no way to influence time-outs or fetch size with the
Datastax JDBC driver, with according consequences on the queries
(time-outs) and C* node behavior (esp. heap pressure). Hence,
operating a C* cluster needs a lot of trust in the skills of the
"users" (developers/maintainers of the client-side solutions) and
their tools :( .
Regards,
Jens