Data won't change much but queries will be different.
I am not working on the rendering tool myself so I don't know much details
about it.

Also as suggested by you I tried to fetch data in size of 500 or 1000 with
java driver auto pagination.
It fails when the number of records are high (around 100000) with following
error:

Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: localhost/127.0.0.1:9042
(com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
server response))


On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> How often does the data change?
>
> I would still recommend a caching of some kind, but without knowing more
> details (how often the data is changing, what you're doing with the 1m rows
> after getting them, etc) I can't recommend a solution.
>
> I did see your other thread. I would also vote for elasticsearch / solr ,
> they are more suited for the kind of analytics you seem to be doing.
> Cassandra is more for storing data, it isn't all that great for complex
> queries / analytics.
>
> If you want to stick to cassandra, you might have better luck if you made
> your range columns part of the primary key, so something like PRIMARY
> KEY(caseId, x, y)
>
> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
> wrote:
>
>> The rendering tool renders a portion a very large image. It may fetch
>> different data each time from billions of rows.
>> So I don't think I can cache such large results. Since same results will
>> rarely fetched again.
>>
>> Also do you know how I can do 2d range queries using Cassandra. Some
>> other users suggested me using Solr.
>> But is there any way I can achieve that without using any other
>> technology.
>>
>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>>
>>> Sorry, meant to say "that way when you have to render, you can just
>>> display the latest cache."
>>>
>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>>> I would probably do this in a background thread and cache the results,
>>>> that way when you have to render, you can just cache the latest results.
>>>>
>>>> I don't know why Cassandra can't seem to be able to fetch large batch
>>>> sizes, I've also run into these timeouts but reducing the batch size to 2k
>>>> seemed to work for me.
>>>>
>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta <meme...@cs.stonybrook.edu
>>>> > wrote:
>>>>
>>>>> We have UI interface which needs this data for rendering.
>>>>> So efficiency of pulling this data matters a lot. It should be fetched
>>>>> within a minute.
>>>>> Is there a way to achieve such efficiency
>>>>>
>>>>>
>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it
>>>>>> seems like the difference would only be a few minutes. Do you have to do
>>>>>> this all the time, or only once in a while?
>>>>>>
>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta <
>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>
>>>>>>> yes it works for 1000 but not more than that.
>>>>>>> How can I fetch all rows using this efficiently?
>>>>>>>
>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ?
>>>>>>>>
>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta <
>>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>>
>>>>>>>>> Hi Jens,
>>>>>>>>>
>>>>>>>>> I have tried with fetch size of 10000 still its not giving any
>>>>>>>>> results.
>>>>>>>>> My expectations were that Cassandra can handle a million rows
>>>>>>>>> easily.
>>>>>>>>>
>>>>>>>>> Is there any mistake in the way I am defining the keys or querying
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Mehak
>>>>>>>>>
>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <jens.ran...@tink.se>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set it
>>>>>>>>>> too high, and you don't have too many tombstones, that should do it.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Jens
>>>>>>>>>>
>>>>>>>>>> –
>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta <
>>>>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have requirement to fetch million row as result of my query
>>>>>>>>>>> which is giving timeout errors.
>>>>>>>>>>> I am fetching results by selecting clustering columns, then why
>>>>>>>>>>> the queries are taking so long. I can change the timeout settings 
>>>>>>>>>>> but I
>>>>>>>>>>> need the data to fetched faster as per my requirement.
>>>>>>>>>>>
>>>>>>>>>>> My table definition is:
>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id
>>>>>>>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc 
>>>>>>>>>>> varchar, w
>>>>>>>>>>> double, h double, normalized varchar, type varchar, filehost 
>>>>>>>>>>> varchar,
>>>>>>>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
>>>>>>>>>>> varchar,
>>>>>>>>>>> image_mpp_x double, image_mpp_y double, image_width double, 
>>>>>>>>>>> image_height
>>>>>>>>>>> double, objective double, cancer_type varchar,  Area float, 
>>>>>>>>>>> submit_date
>>>>>>>>>>> timestamp, points list<double>,  PRIMARY KEY 
>>>>>>>>>>> ((image_caseid),Area,uuid));*
>>>>>>>>>>>
>>>>>>>>>>> Here each row is uniquely identified on the basis of unique
>>>>>>>>>>> uuid. But since my data is generally queried based upon 
>>>>>>>>>>> *image_caseid
>>>>>>>>>>> *I have made it partition key.
>>>>>>>>>>> I am currently using Java Datastax api to fetch the results. But
>>>>>>>>>>> the query is taking a lot of time resulting in timeout errors:
>>>>>>>>>>>
>>>>>>>>>>>  Exception in thread "main"
>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All 
>>>>>>>>>>> host(s)
>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out 
>>>>>>>>>>> waiting for
>>>>>>>>>>> server response))
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
>>>>>>>>>>>  at QueryDB.queryArea(TestQuery.java:59)
>>>>>>>>>>>  at TestQuery.main(TestQuery.java:35)
>>>>>>>>>>> Caused by:
>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All 
>>>>>>>>>>> host(s)
>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out 
>>>>>>>>>>> waiting for
>>>>>>>>>>> server response))
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>>>>>>>  at
>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>>>>>>>  at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>>>>>  at
>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>>>>>>>
>>>>>>>>>>> Also when I try the same query on console even while using limit
>>>>>>>>>>> of 2000 rows:
>>>>>>>>>>>
>>>>>>>>>>> cqlsh:images> select count(*) from results where
>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 
>>>>>>>>>>> limit 2000;
>>>>>>>>>>> errors={}, last_host=127.0.0.1
>>>>>>>>>>>
>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>> Mehak
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to