4g also seems small for the kind of load you are trying to handle (billions of rows) etc.
I would also try adding more nodes to the cluster. On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > Yeah, it may be that the process is being limited by swap. This page: > > > https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42 > > Lines 42 - 48 list a few settings that you could try out for increasing / > reducing the memory limits (assuming you're on linux). > > Also, are you using an SSD? If so make sure the IO scheduler is noop or > deadline . > > On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta <meme...@cs.stonybrook.edu> > wrote: > >> Currently Cassandra java process is taking 1% of cpu (total 8% is being >> used) and 14.3% memory (out of total 4G memory). >> As you can see there is not much load from other processes. >> >> Should I try changing default parameters of memory in Cassandra settings. >> >> On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >>> What's your memory / CPU usage at? And how much ram + cpu do you have on >>> this server? >>> >>> >>> >>> On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >>> wrote: >>> >>>> Currently there is only single node which I am calling directly with >>>> around 150000 rows. Full data will be in around billions per node. >>>> The code is working only for size 100/200. Also the consecutive >>>> fetching is taking around 5-10 secs. >>>> >>>> I have a parallel script which is inserting the data while I am reading >>>> it. When I stopped the script it worked for 500/1000 but not more than >>>> that. >>>> >>>> >>>> >>>> On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> >>>> wrote: >>>> >>>>> If even 500-1000 isn't working, then your cassandra node might not be >>>>> up. >>>>> >>>>> 1) Try running nodetool status from shell on your cassandra server, >>>>> make sure the nodes are up. >>>>> >>>>> 2) Are you calling this on the same server where cassandra is running? >>>>> Its trying to connect to localhost . If you're running it on a different >>>>> server, try passing in the direct ip of your cassandra server. >>>>> >>>>> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta < >>>>> meme...@cs.stonybrook.edu> wrote: >>>>> >>>>>> Data won't change much but queries will be different. >>>>>> I am not working on the rendering tool myself so I don't know much >>>>>> details about it. >>>>>> >>>>>> Also as suggested by you I tried to fetch data in size of 500 or 1000 >>>>>> with java driver auto pagination. >>>>>> It fails when the number of records are high (around 100000) with >>>>>> following error: >>>>>> >>>>>> Exception in thread "main" >>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting >>>>>> for >>>>>> server response)) >>>>>> >>>>>> >>>>>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> How often does the data change? >>>>>>> >>>>>>> I would still recommend a caching of some kind, but without knowing >>>>>>> more details (how often the data is changing, what you're doing with >>>>>>> the 1m >>>>>>> rows after getting them, etc) I can't recommend a solution. >>>>>>> >>>>>>> I did see your other thread. I would also vote for elasticsearch / >>>>>>> solr , they are more suited for the kind of analytics you seem to be >>>>>>> doing. >>>>>>> Cassandra is more for storing data, it isn't all that great for complex >>>>>>> queries / analytics. >>>>>>> >>>>>>> If you want to stick to cassandra, you might have better luck if you >>>>>>> made your range columns part of the primary key, so something like >>>>>>> PRIMARY >>>>>>> KEY(caseId, x, y) >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta < >>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>> >>>>>>>> The rendering tool renders a portion a very large image. It may >>>>>>>> fetch different data each time from billions of rows. >>>>>>>> So I don't think I can cache such large results. Since same results >>>>>>>> will rarely fetched again. >>>>>>>> >>>>>>>> Also do you know how I can do 2d range queries using Cassandra. >>>>>>>> Some other users suggested me using Solr. >>>>>>>> But is there any way I can achieve that without using any other >>>>>>>> technology. >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sorry, meant to say "that way when you have to render, you can >>>>>>>>> just display the latest cache." >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I would probably do this in a background thread and cache the >>>>>>>>>> results, that way when you have to render, you can just cache the >>>>>>>>>> latest >>>>>>>>>> results. >>>>>>>>>> >>>>>>>>>> I don't know why Cassandra can't seem to be able to fetch large >>>>>>>>>> batch sizes, I've also run into these timeouts but reducing the >>>>>>>>>> batch size >>>>>>>>>> to 2k seemed to work for me. >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta < >>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>> >>>>>>>>>>> We have UI interface which needs this data for rendering. >>>>>>>>>>> So efficiency of pulling this data matters a lot. It should be >>>>>>>>>>> fetched within a minute. >>>>>>>>>>> Is there a way to achieve such efficiency >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar < >>>>>>>>>>> ali.rac...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m >>>>>>>>>>>> rows, it seems like the difference would only be a few minutes. Do >>>>>>>>>>>> you have >>>>>>>>>>>> to do this all the time, or only once in a while? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> yes it works for 1000 but not more than that. >>>>>>>>>>>>> How can I fetch all rows using this efficiently? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar < >>>>>>>>>>>>> ali.rac...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Jens, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have tried with fetch size of 10000 still its not giving >>>>>>>>>>>>>>> any results. >>>>>>>>>>>>>>> My expectations were that Cassandra can handle a million >>>>>>>>>>>>>>> rows easily. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there any mistake in the way I am defining the keys or >>>>>>>>>>>>>>> querying them. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil < >>>>>>>>>>>>>>> jens.ran...@tink.se> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't >>>>>>>>>>>>>>>> set it too high, and you don't have too many tombstones, that >>>>>>>>>>>>>>>> should do it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Jens >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> – >>>>>>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have requirement to fetch million row as result of my >>>>>>>>>>>>>>>>> query which is giving timeout errors. >>>>>>>>>>>>>>>>> I am fetching results by selecting clustering columns, >>>>>>>>>>>>>>>>> then why the queries are taking so long. I can change the >>>>>>>>>>>>>>>>> timeout settings >>>>>>>>>>>>>>>>> but I need the data to fetched faster as per my requirement. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My table definition is: >>>>>>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, >>>>>>>>>>>>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, >>>>>>>>>>>>>>>>> x double, y >>>>>>>>>>>>>>>>> double, loc varchar, w double, h double, normalized varchar, >>>>>>>>>>>>>>>>> type varchar, >>>>>>>>>>>>>>>>> filehost varchar, filename varchar, image_uuid uuid, >>>>>>>>>>>>>>>>> image_uri varchar, >>>>>>>>>>>>>>>>> image_caseid varchar, image_mpp_x double, image_mpp_y double, >>>>>>>>>>>>>>>>> image_width >>>>>>>>>>>>>>>>> double, image_height double, objective double, cancer_type >>>>>>>>>>>>>>>>> varchar, Area >>>>>>>>>>>>>>>>> float, submit_date timestamp, points list<double>, PRIMARY >>>>>>>>>>>>>>>>> KEY >>>>>>>>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here each row is uniquely identified on the basis of >>>>>>>>>>>>>>>>> unique uuid. But since my data is generally queried based >>>>>>>>>>>>>>>>> upon *image_caseid >>>>>>>>>>>>>>>>> *I have made it partition key. >>>>>>>>>>>>>>>>> I am currently using Java Datastax api to fetch the >>>>>>>>>>>>>>>>> results. But the query is taking a lot of time resulting in >>>>>>>>>>>>>>>>> timeout errors: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed >>>>>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>>>>>>>> Caused by: >>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed >>>>>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also when I try the same query on console even while using >>>>>>>>>>>>>>>>> limit of 2000 rows: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and >>>>>>>>>>>>>>>>> Area>20 limit 2000; >>>>>>>>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >