ya I have cluster total 10 nodes but I am just testing with one node currently. Total data for all nodes will exceed 5 billion rows. But I may have memory on other nodes.
On Wed, Mar 18, 2015 at 6:06 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > 4g also seems small for the kind of load you are trying to handle > (billions of rows) etc. > > I would also try adding more nodes to the cluster. > > On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar <ali.rac...@gmail.com> wrote: > >> Yeah, it may be that the process is being limited by swap. This page: >> >> >> https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42 >> >> Lines 42 - 48 list a few settings that you could try out for increasing / >> reducing the memory limits (assuming you're on linux). >> >> Also, are you using an SSD? If so make sure the IO scheduler is noop or >> deadline . >> >> On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >> wrote: >> >>> Currently Cassandra java process is taking 1% of cpu (total 8% is being >>> used) and 14.3% memory (out of total 4G memory). >>> As you can see there is not much load from other processes. >>> >>> Should I try changing default parameters of memory in Cassandra settings. >>> >>> On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>> wrote: >>> >>>> What's your memory / CPU usage at? And how much ram + cpu do you have >>>> on this server? >>>> >>>> >>>> >>>> On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <meme...@cs.stonybrook.edu >>>> > wrote: >>>> >>>>> Currently there is only single node which I am calling directly with >>>>> around 150000 rows. Full data will be in around billions per node. >>>>> The code is working only for size 100/200. Also the consecutive >>>>> fetching is taking around 5-10 secs. >>>>> >>>>> I have a parallel script which is inserting the data while I am >>>>> reading it. When I stopped the script it worked for 500/1000 but not more >>>>> than that. >>>>> >>>>> >>>>> >>>>> On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>> wrote: >>>>> >>>>>> If even 500-1000 isn't working, then your cassandra node might not >>>>>> be up. >>>>>> >>>>>> 1) Try running nodetool status from shell on your cassandra server, >>>>>> make sure the nodes are up. >>>>>> >>>>>> 2) Are you calling this on the same server where cassandra is >>>>>> running? Its trying to connect to localhost . If you're running it on a >>>>>> different server, try passing in the direct ip of your cassandra server. >>>>>> >>>>>> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta < >>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>> >>>>>>> Data won't change much but queries will be different. >>>>>>> I am not working on the rendering tool myself so I don't know much >>>>>>> details about it. >>>>>>> >>>>>>> Also as suggested by you I tried to fetch data in size of 500 or >>>>>>> 1000 with java driver auto pagination. >>>>>>> It fails when the number of records are high (around 100000) with >>>>>>> following error: >>>>>>> >>>>>>> Exception in thread "main" >>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>>>>> host(s) >>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting >>>>>>> for >>>>>>> server response)) >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> How often does the data change? >>>>>>>> >>>>>>>> I would still recommend a caching of some kind, but without knowing >>>>>>>> more details (how often the data is changing, what you're doing with >>>>>>>> the 1m >>>>>>>> rows after getting them, etc) I can't recommend a solution. >>>>>>>> >>>>>>>> I did see your other thread. I would also vote for elasticsearch / >>>>>>>> solr , they are more suited for the kind of analytics you seem to be >>>>>>>> doing. >>>>>>>> Cassandra is more for storing data, it isn't all that great for complex >>>>>>>> queries / analytics. >>>>>>>> >>>>>>>> If you want to stick to cassandra, you might have better luck if >>>>>>>> you made your range columns part of the primary key, so something like >>>>>>>> PRIMARY KEY(caseId, x, y) >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta < >>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>> >>>>>>>>> The rendering tool renders a portion a very large image. It may >>>>>>>>> fetch different data each time from billions of rows. >>>>>>>>> So I don't think I can cache such large results. Since same >>>>>>>>> results will rarely fetched again. >>>>>>>>> >>>>>>>>> Also do you know how I can do 2d range queries using Cassandra. >>>>>>>>> Some other users suggested me using Solr. >>>>>>>>> But is there any way I can achieve that without using any other >>>>>>>>> technology. >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Sorry, meant to say "that way when you have to render, you can >>>>>>>>>> just display the latest cache." >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> I would probably do this in a background thread and cache the >>>>>>>>>>> results, that way when you have to render, you can just cache the >>>>>>>>>>> latest >>>>>>>>>>> results. >>>>>>>>>>> >>>>>>>>>>> I don't know why Cassandra can't seem to be able to fetch large >>>>>>>>>>> batch sizes, I've also run into these timeouts but reducing the >>>>>>>>>>> batch size >>>>>>>>>>> to 2k seemed to work for me. >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta < >>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>> >>>>>>>>>>>> We have UI interface which needs this data for rendering. >>>>>>>>>>>> So efficiency of pulling this data matters a lot. It should be >>>>>>>>>>>> fetched within a minute. >>>>>>>>>>>> Is there a way to achieve such efficiency >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar < >>>>>>>>>>>> ali.rac...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m >>>>>>>>>>>>> rows, it seems like the difference would only be a few minutes. >>>>>>>>>>>>> Do you have >>>>>>>>>>>>> to do this all the time, or only once in a while? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> yes it works for 1000 but not more than that. >>>>>>>>>>>>>> How can I fetch all rows using this efficiently? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar < >>>>>>>>>>>>>> ali.rac...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jens, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have tried with fetch size of 10000 still its not giving >>>>>>>>>>>>>>>> any results. >>>>>>>>>>>>>>>> My expectations were that Cassandra can handle a million >>>>>>>>>>>>>>>> rows easily. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there any mistake in the way I am defining the keys or >>>>>>>>>>>>>>>> querying them. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil < >>>>>>>>>>>>>>>> jens.ran...@tink.se> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't >>>>>>>>>>>>>>>>> set it too high, and you don't have too many tombstones, that >>>>>>>>>>>>>>>>> should do it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Jens >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> – >>>>>>>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have requirement to fetch million row as result of my >>>>>>>>>>>>>>>>>> query which is giving timeout errors. >>>>>>>>>>>>>>>>>> I am fetching results by selecting clustering columns, >>>>>>>>>>>>>>>>>> then why the queries are taking so long. I can change the >>>>>>>>>>>>>>>>>> timeout settings >>>>>>>>>>>>>>>>>> but I need the data to fetched faster as per my requirement. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> My table definition is: >>>>>>>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, >>>>>>>>>>>>>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, >>>>>>>>>>>>>>>>>> x double, y >>>>>>>>>>>>>>>>>> double, loc varchar, w double, h double, normalized varchar, >>>>>>>>>>>>>>>>>> type varchar, >>>>>>>>>>>>>>>>>> filehost varchar, filename varchar, image_uuid uuid, >>>>>>>>>>>>>>>>>> image_uri varchar, >>>>>>>>>>>>>>>>>> image_caseid varchar, image_mpp_x double, image_mpp_y >>>>>>>>>>>>>>>>>> double, image_width >>>>>>>>>>>>>>>>>> double, image_height double, objective double, cancer_type >>>>>>>>>>>>>>>>>> varchar, Area >>>>>>>>>>>>>>>>>> float, submit_date timestamp, points list<double>, PRIMARY >>>>>>>>>>>>>>>>>> KEY >>>>>>>>>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here each row is uniquely identified on the basis of >>>>>>>>>>>>>>>>>> unique uuid. But since my data is generally queried based >>>>>>>>>>>>>>>>>> upon *image_caseid >>>>>>>>>>>>>>>>>> *I have made it partition key. >>>>>>>>>>>>>>>>>> I am currently using Java Datastax api to fetch the >>>>>>>>>>>>>>>>>> results. But the query is taking a lot of time resulting in >>>>>>>>>>>>>>>>>> timeout errors: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed >>>>>>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>>>>>>>>> Caused by: >>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed >>>>>>>>>>>>>>>>>> out waiting for >>>>>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also when I try the same query on console even while >>>>>>>>>>>>>>>>>> using limit of 2000 rows: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and >>>>>>>>>>>>>>>>>> Area>20 limit 2000; >>>>>>>>>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >