Re: Timeout error in fetching million rows as results using clustering keys

Ali Akhtar Wed, 18 Mar 2015 01:10:03 -0700

Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
like the difference would only be a few minutes. Do you have to do this all
the time, or only once in a while?


On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
wrote:

> yes it works for 1000 but not more than that.
> How can I fetch all rows using this efficiently?
>
> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
>> Have you tried a smaller fetch size, such as 5k - 2k ?
>>
>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
>> wrote:
>>
>>> Hi Jens,
>>>
>>> I have tried with fetch size of 10000 still its not giving any results.
>>> My expectations were that Cassandra can handle a million rows easily.
>>>
>>> Is there any mistake in the way I am defining the keys or querying them.
>>>
>>> Thanks
>>> Mehak
>>>
>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <jens.ran...@tink.se>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Try setting fetchsize before querying. Assuming you don't set it too
>>>> high, and you don't have too many tombstones, that should do it.
>>>>
>>>> Cheers,
>>>> Jens
>>>>
>>>> –
>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta <meme...@cs.stonybrook.edu
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have requirement to fetch million row as result of my query which is
>>>>> giving timeout errors.
>>>>> I am fetching results by selecting clustering columns, then why the
>>>>> queries are taking so long. I can change the timeout settings but I need
>>>>> the data to fetched faster as per my requirement.
>>>>>
>>>>> My table definition is:
>>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id
>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar, w
>>>>> double, h double, normalized varchar, type varchar, filehost varchar,
>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid 
>>>>> varchar,
>>>>> image_mpp_x double, image_mpp_y double, image_width double, image_height
>>>>> double, objective double, cancer_type varchar,  Area float, submit_date
>>>>> timestamp, points list<double>,  PRIMARY KEY ((image_caseid),Area,uuid));*
>>>>>
>>>>> Here each row is uniquely identified on the basis of unique uuid. But
>>>>> since my data is generally queried based upon *image_caseid *I have
>>>>> made it partition key.
>>>>> I am currently using Java Datastax api to fetch the results. But the
>>>>> query is taking a lot of time resulting in timeout errors:
>>>>>
>>>>>  Exception in thread "main"
>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
>>>>> for
>>>>> server response))
>>>>>  at
>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>>>>>  at
>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
>>>>>  at
>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
>>>>>  at
>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
>>>>>  at QueryDB.queryArea(TestQuery.java:59)
>>>>>  at TestQuery.main(TestQuery.java:35)
>>>>> Caused by:
>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting 
>>>>> for
>>>>> server response))
>>>>>  at
>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>  at
>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>  at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>
>>>>> Also when I try the same query on console even while using limit of
>>>>> 2000 rows:
>>>>>
>>>>> cqlsh:images> select count(*) from results where
>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 limit 
>>>>> 2000;
>>>>> errors={}, last_host=127.0.0.1
>>>>>
>>>>> Thanks and Regards,
>>>>> Mehak
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Timeout error in fetching million rows as results using clustering keys

Reply via email to