Re: Timeout error in fetching million rows as results using clustering keys

Ali Akhtar Wed, 18 Mar 2015 02:35:48 -0700

What's your memory / CPU usage at? And how much ram + cpu do you have on
this server?




On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
wrote:

> Currently there is only single node which I am calling directly with
> around 150000 rows. Full data will be in around billions per node.
> The code is working only for size 100/200. Also the consecutive fetching
> is taking around 5-10 secs.
>
> I have a parallel script which is inserting the data while I am reading
> it. When I stopped the script it worked for 500/1000 but not more than
> that.
>
>
>
> On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:
>
>>  If even 500-1000 isn't working, then your cassandra node might not be
>> up.
>>
>> 1) Try running nodetool status from shell on your cassandra server, make
>> sure the nodes are up.
>>
>> 2) Are you calling this on the same server where cassandra is running?
>> Its trying to connect to localhost . If you're running it on a different
>> server, try passing in the direct ip of your cassandra server.
>>
>> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta <meme...@cs.stonybrook.edu>
>> wrote:
>>
>>> Data won't change much but queries will be different.
>>> I am not working on the rendering tool myself so I don't know much
>>> details about it.
>>>
>>> Also as suggested by you I tried to fetch data in size of 500 or 1000
>>> with java driver auto pagination.
>>> It fails when the number of records are high (around 100000) with
>>> following error:
>>>
>>> Exception in thread "main"
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
>>> server response))
>>>
>>>
>>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com>
>>> wrote:
>>>
>>>> How often does the data change?
>>>>
>>>> I would still recommend a caching of some kind, but without knowing
>>>> more details (how often the data is changing, what you're doing with the 1m
>>>> rows after getting them, etc) I can't recommend a solution.
>>>>
>>>> I did see your other thread. I would also vote for elasticsearch / solr
>>>> , they are more suited for the kind of analytics you seem to be doing.
>>>> Cassandra is more for storing data, it isn't all that great for complex
>>>> queries / analytics.
>>>>
>>>> If you want to stick to cassandra, you might have better luck if you
>>>> made your range columns part of the primary key, so something like PRIMARY
>>>> KEY(caseId, x, y)
>>>>
>>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <meme...@cs.stonybrook.edu
>>>> > wrote:
>>>>
>>>>> The rendering tool renders a portion a very large image. It may fetch
>>>>> different data each time from billions of rows.
>>>>> So I don't think I can cache such large results. Since same results
>>>>> will rarely fetched again.
>>>>>
>>>>> Also do you know how I can do 2d range queries using Cassandra. Some
>>>>> other users suggested me using Solr.
>>>>> But is there any way I can achieve that without using any other
>>>>> technology.
>>>>>
>>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry, meant to say "that way when you have to render, you can just
>>>>>> display the latest cache."
>>>>>>
>>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I would probably do this in a background thread and cache the
>>>>>>> results, that way when you have to render, you can just cache the latest
>>>>>>> results.
>>>>>>>
>>>>>>> I don't know why Cassandra can't seem to be able to fetch large
>>>>>>> batch sizes, I've also run into these timeouts but reducing the batch 
>>>>>>> size
>>>>>>> to 2k seemed to work for me.
>>>>>>>
>>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta <
>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>
>>>>>>>> We have UI interface which needs this data for rendering.
>>>>>>>> So efficiency of pulling this data matters a lot. It should be
>>>>>>>> fetched within a minute.
>>>>>>>> Is there a way to achieve such efficiency
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows,
>>>>>>>>> it seems like the difference would only be a few minutes. Do you have 
>>>>>>>>> to do
>>>>>>>>> this all the time, or only once in a while?
>>>>>>>>>
>>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta <
>>>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>>>
>>>>>>>>>> yes it works for 1000 but not more than that.
>>>>>>>>>> How can I fetch all rows using this efficiently?
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta <
>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jens,
>>>>>>>>>>>>
>>>>>>>>>>>> I have tried with fetch size of 10000 still its not giving any
>>>>>>>>>>>> results.
>>>>>>>>>>>> My expectations were that Cassandra can handle a million rows
>>>>>>>>>>>> easily.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any mistake in the way I am defining the keys or
>>>>>>>>>>>> querying them.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Mehak
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <
>>>>>>>>>>>> jens.ran...@tink.se> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set
>>>>>>>>>>>>> it too high, and you don't have too many tombstones, that should 
>>>>>>>>>>>>> do it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Jens
>>>>>>>>>>>>>
>>>>>>>>>>>>> –
>>>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta <
>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have requirement to fetch million row as result of my query
>>>>>>>>>>>>>> which is giving timeout errors.
>>>>>>>>>>>>>> I am fetching results by selecting clustering columns, then
>>>>>>>>>>>>>> why the queries are taking so long. I can change the timeout 
>>>>>>>>>>>>>> settings but I
>>>>>>>>>>>>>> need the data to fetched faster as per my requirement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My table definition is:
>>>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid,
>>>>>>>>>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  
>>>>>>>>>>>>>> double, y
>>>>>>>>>>>>>> double, loc varchar, w double, h double, normalized varchar, 
>>>>>>>>>>>>>> type varchar,
>>>>>>>>>>>>>> filehost varchar, filename varchar, image_uuid uuid, image_uri 
>>>>>>>>>>>>>> varchar,
>>>>>>>>>>>>>> image_caseid varchar, image_mpp_x double, image_mpp_y double, 
>>>>>>>>>>>>>> image_width
>>>>>>>>>>>>>> double, image_height double, objective double, cancer_type 
>>>>>>>>>>>>>> varchar,  Area
>>>>>>>>>>>>>> float, submit_date timestamp, points list<double>,  PRIMARY KEY
>>>>>>>>>>>>>> ((image_caseid),Area,uuid));*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here each row is uniquely identified on the basis of unique
>>>>>>>>>>>>>> uuid. But since my data is generally queried based upon 
>>>>>>>>>>>>>> *image_caseid
>>>>>>>>>>>>>> *I have made it partition key.
>>>>>>>>>>>>>> I am currently using Java Datastax api to fetch the results.
>>>>>>>>>>>>>> But the query is taking a lot of time resulting in timeout 
>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Exception in thread "main"
>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: 
>>>>>>>>>>>>>> All host(s)
>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out 
>>>>>>>>>>>>>> waiting for
>>>>>>>>>>>>>> server response))
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
>>>>>>>>>>>>>>  at QueryDB.queryArea(TestQuery.java:59)
>>>>>>>>>>>>>>  at TestQuery.main(TestQuery.java:35)
>>>>>>>>>>>>>> Caused by:
>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: 
>>>>>>>>>>>>>> All host(s)
>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out 
>>>>>>>>>>>>>> waiting for
>>>>>>>>>>>>>> server response))
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also when I try the same query on console even while using
>>>>>>>>>>>>>> limit of 2000 rows:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cqlsh:images> select count(*) from results where
>>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 
>>>>>>>>>>>>>> limit 2000;
>>>>>>>>>>>>>> errors={}, last_host=127.0.0.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Mehak
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Timeout error in fetching million rows as results using clustering keys

Reply via email to