Hi,
here are some snippets of code in scala which should get you started.
Jirka H.
loop {lastRow =>val query = lastRow match {case Some(row) =>
nextPageQuery(row, upperLimit)case None =>
initialQuery(lowerLimit)}session.execute(query).all}
private def nextPageQuery(row: Row, upperLimit: String): String = {val
tokenPart = "token(%s) > token(0x%s) and token(%s) <
%s".format(rowKeyName, hex(row.getBytes(rowKeyName)), rowKeyName,
upperLimit)basicQuery.format(tokenPart)}
private def initialQuery(lowerLimit: String): String = {val tokenPart =
"token(%s) >= %s".format(rowKeyName,
lowerLimit)basicQuery.format(tokenPart)}private def calculateRanges:
(BigDecimal, BigDecimal, IndexedSeq[(BigDecimal, BigDecimal)]) =
{tokenRange match {case Some((start, end)) =>Logger.info("Token range
given: {}", "<" + start.underlying.toPlainString + ", " +
end.underlying.toPlainString + ">")val tokenSpaceSize = end - startval
rangeSize = tokenSpaceSize / concurrencyval ranges = for (i <- 0 until
concurrency) yield (start + (i * rangeSize), start + ((i + 1) *
rangeSize))(tokenSpaceSize, rangeSize, ranges)case None =>val
tokenSpaceSize = partitioner.max - partitioner.minval rangeSize =
tokenSpaceSize / concurrencyval ranges = for (i <- 0 until concurrency)
yield (partitioner.min + (i * rangeSize), partitioner.min + ((i + 1) *
rangeSize))(tokenSpaceSize, rangeSize, ranges)}}
private val basicQuery = {"select %s, %s, %s, writetime(%s) from %s
where %s%s limit
%d%s".format(rowKeyName,columnKeyName,columnValueName,columnValueName,columnFamily,"%s",
// templatewhereCondition,pageSize,if (cqlAllowFiltering) " allow
filtering" else "")}
case object Murmur3 extends Partitioner {override val min =
BigDecimal(-2).pow(63)override val max = BigDecimal(2).pow(63) - 1}case
object Random extends Partitioner {override val min =
BigDecimal(0)override val max = BigDecimal(2).pow(127) - 1}
On 02/11/2015 02:21 PM, Ja Sam wrote:
> Your answer looks very promising
>
> How do you calculate start and stop?
>
> On Wed, Feb 11, 2015 at 12:09 PM, Jiri Horky <[email protected]
> <mailto:[email protected]>> wrote:
>
> The fastest way I am aware of is to do the queries in parallel to
> multiple cassandra nodes and make sure that you only ask them for keys
> they are responsible for. Otherwise, the node needs to resend your
> query
> which is much slower and creates unnecessary objects (and thus GC
> pressure).
>
> You can manually take advantage of the token range information, if the
> driver does not get this into account for you. Then, you can play with
> concurrency and batch size of a single query against one node.
> Basically, what you/driver should do is to transform the query to
> series
> of "SELECT * FROM TABLE WHERE TOKEN IN (start, stop)".
>
> I will need to look up the actual code, but the idea should be
> clear :)
>
> Jirka H.
>
>
> On 02/11/2015 11:26 AM, Ja Sam wrote:
> > Is there a simple way (or even a complicated one) how can I speed up
> > SELECT * FROM [table] query?
> > I need to get all rows form one table every day. I split tables, and
> > create one for each day, but still query is quite slow (200 millions
> > of records)
> >
> > I was thinking about run this query in parallel, but I don't know if
> > it is possible
>
>