What's your memory / CPU usage at? And how much ram + cpu do you have on this server?
On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <meme...@cs.stonybrook.edu> wrote: > Currently there is only single node which I am calling directly with > around 150000 rows. Full data will be in around billions per node. > The code is working only for size 100/200. Also the consecutive fetching > is taking around 5-10 secs. > > I have a parallel script which is inserting the data while I am reading > it. When I stopped the script it worked for 500/1000 but not more than > that. > > > > On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > >> If even 500-1000 isn't working, then your cassandra node might not be >> up. >> >> 1) Try running nodetool status from shell on your cassandra server, make >> sure the nodes are up. >> >> 2) Are you calling this on the same server where cassandra is running? >> Its trying to connect to localhost . If you're running it on a different >> server, try passing in the direct ip of your cassandra server. >> >> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >> wrote: >> >>> Data won't change much but queries will be different. >>> I am not working on the rendering tool myself so I don't know much >>> details about it. >>> >>> Also as suggested by you I tried to fetch data in size of 500 or 1000 >>> with java driver auto pagination. >>> It fails when the number of records are high (around 100000) with >>> following error: >>> >>> Exception in thread "main" >>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >>> tried for query failed (tried: localhost/127.0.0.1:9042 >>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for >>> server response)) >>> >>> >>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> >>> wrote: >>> >>>> How often does the data change? >>>> >>>> I would still recommend a caching of some kind, but without knowing >>>> more details (how often the data is changing, what you're doing with the 1m >>>> rows after getting them, etc) I can't recommend a solution. >>>> >>>> I did see your other thread. I would also vote for elasticsearch / solr >>>> , they are more suited for the kind of analytics you seem to be doing. >>>> Cassandra is more for storing data, it isn't all that great for complex >>>> queries / analytics. >>>> >>>> If you want to stick to cassandra, you might have better luck if you >>>> made your range columns part of the primary key, so something like PRIMARY >>>> KEY(caseId, x, y) >>>> >>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <meme...@cs.stonybrook.edu >>>> > wrote: >>>> >>>>> The rendering tool renders a portion a very large image. It may fetch >>>>> different data each time from billions of rows. >>>>> So I don't think I can cache such large results. Since same results >>>>> will rarely fetched again. >>>>> >>>>> Also do you know how I can do 2d range queries using Cassandra. Some >>>>> other users suggested me using Solr. >>>>> But is there any way I can achieve that without using any other >>>>> technology. >>>>> >>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>> wrote: >>>>> >>>>>> Sorry, meant to say "that way when you have to render, you can just >>>>>> display the latest cache." >>>>>> >>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I would probably do this in a background thread and cache the >>>>>>> results, that way when you have to render, you can just cache the latest >>>>>>> results. >>>>>>> >>>>>>> I don't know why Cassandra can't seem to be able to fetch large >>>>>>> batch sizes, I've also run into these timeouts but reducing the batch >>>>>>> size >>>>>>> to 2k seemed to work for me. >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta < >>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>> >>>>>>>> We have UI interface which needs this data for rendering. >>>>>>>> So efficiency of pulling this data matters a lot. It should be >>>>>>>> fetched within a minute. >>>>>>>> Is there a way to achieve such efficiency >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, >>>>>>>>> it seems like the difference would only be a few minutes. Do you have >>>>>>>>> to do >>>>>>>>> this all the time, or only once in a while? >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>> >>>>>>>>>> yes it works for 1000 but not more than that. >>>>>>>>>> How can I fetch all rows using this efficiently? >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Jens, >>>>>>>>>>>> >>>>>>>>>>>> I have tried with fetch size of 10000 still its not giving any >>>>>>>>>>>> results. >>>>>>>>>>>> My expectations were that Cassandra can handle a million rows >>>>>>>>>>>> easily. >>>>>>>>>>>> >>>>>>>>>>>> Is there any mistake in the way I am defining the keys or >>>>>>>>>>>> querying them. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Mehak >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil < >>>>>>>>>>>> jens.ran...@tink.se> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set >>>>>>>>>>>>> it too high, and you don't have too many tombstones, that should >>>>>>>>>>>>> do it. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Jens >>>>>>>>>>>>> >>>>>>>>>>>>> – >>>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have requirement to fetch million row as result of my query >>>>>>>>>>>>>> which is giving timeout errors. >>>>>>>>>>>>>> I am fetching results by selecting clustering columns, then >>>>>>>>>>>>>> why the queries are taking so long. I can change the timeout >>>>>>>>>>>>>> settings but I >>>>>>>>>>>>>> need the data to fetched faster as per my requirement. >>>>>>>>>>>>>> >>>>>>>>>>>>>> My table definition is: >>>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, >>>>>>>>>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x >>>>>>>>>>>>>> double, y >>>>>>>>>>>>>> double, loc varchar, w double, h double, normalized varchar, >>>>>>>>>>>>>> type varchar, >>>>>>>>>>>>>> filehost varchar, filename varchar, image_uuid uuid, image_uri >>>>>>>>>>>>>> varchar, >>>>>>>>>>>>>> image_caseid varchar, image_mpp_x double, image_mpp_y double, >>>>>>>>>>>>>> image_width >>>>>>>>>>>>>> double, image_height double, objective double, cancer_type >>>>>>>>>>>>>> varchar, Area >>>>>>>>>>>>>> float, submit_date timestamp, points list<double>, PRIMARY KEY >>>>>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here each row is uniquely identified on the basis of unique >>>>>>>>>>>>>> uuid. But since my data is generally queried based upon >>>>>>>>>>>>>> *image_caseid >>>>>>>>>>>>>> *I have made it partition key. >>>>>>>>>>>>>> I am currently using Java Datastax api to fetch the results. >>>>>>>>>>>>>> But the query is taking a lot of time resulting in timeout >>>>>>>>>>>>>> errors: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>>> waiting for >>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>>>>> Caused by: >>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>>> waiting for >>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>>>>> at >>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also when I try the same query on console even while using >>>>>>>>>>>>>> limit of 2000 rows: >>>>>>>>>>>>>> >>>>>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 >>>>>>>>>>>>>> limit 2000; >>>>>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >