Currently Cassandra java process is taking 1% of cpu (total 8% is being used) and 14.3% memory (out of total 4G memory). As you can see there is not much load from other processes.
Should I try changing default parameters of memory in Cassandra settings. On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > What's your memory / CPU usage at? And how much ram + cpu do you have on > this server? > > > > On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <meme...@cs.stonybrook.edu> > wrote: > >> Currently there is only single node which I am calling directly with >> around 150000 rows. Full data will be in around billions per node. >> The code is working only for size 100/200. Also the consecutive fetching >> is taking around 5-10 secs. >> >> I have a parallel script which is inserting the data while I am reading >> it. When I stopped the script it worked for 500/1000 but not more than >> that. >> >> >> >> On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >>> If even 500-1000 isn't working, then your cassandra node might not be >>> up. >>> >>> 1) Try running nodetool status from shell on your cassandra server, make >>> sure the nodes are up. >>> >>> 2) Are you calling this on the same server where cassandra is running? >>> Its trying to connect to localhost . If you're running it on a different >>> server, try passing in the direct ip of your cassandra server. >>> >>> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >>> wrote: >>> >>>> Data won't change much but queries will be different. >>>> I am not working on the rendering tool myself so I don't know much >>>> details about it. >>>> >>>> Also as suggested by you I tried to fetch data in size of 500 or 1000 >>>> with java driver auto pagination. >>>> It fails when the number of records are high (around 100000) with >>>> following error: >>>> >>>> Exception in thread "main" >>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for >>>> server response)) >>>> >>>> >>>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> >>>> wrote: >>>> >>>>> How often does the data change? >>>>> >>>>> I would still recommend a caching of some kind, but without knowing >>>>> more details (how often the data is changing, what you're doing with the >>>>> 1m >>>>> rows after getting them, etc) I can't recommend a solution. >>>>> >>>>> I did see your other thread. I would also vote for elasticsearch / >>>>> solr , they are more suited for the kind of analytics you seem to be >>>>> doing. >>>>> Cassandra is more for storing data, it isn't all that great for complex >>>>> queries / analytics. >>>>> >>>>> If you want to stick to cassandra, you might have better luck if you >>>>> made your range columns part of the primary key, so something like PRIMARY >>>>> KEY(caseId, x, y) >>>>> >>>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta < >>>>> meme...@cs.stonybrook.edu> wrote: >>>>> >>>>>> The rendering tool renders a portion a very large image. It may fetch >>>>>> different data each time from billions of rows. >>>>>> So I don't think I can cache such large results. Since same results >>>>>> will rarely fetched again. >>>>>> >>>>>> Also do you know how I can do 2d range queries using Cassandra. Some >>>>>> other users suggested me using Solr. >>>>>> But is there any way I can achieve that without using any other >>>>>> technology. >>>>>> >>>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Sorry, meant to say "that way when you have to render, you can just >>>>>>> display the latest cache." >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I would probably do this in a background thread and cache the >>>>>>>> results, that way when you have to render, you can just cache the >>>>>>>> latest >>>>>>>> results. >>>>>>>> >>>>>>>> I don't know why Cassandra can't seem to be able to fetch large >>>>>>>> batch sizes, I've also run into these timeouts but reducing the batch >>>>>>>> size >>>>>>>> to 2k seemed to work for me. >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta < >>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>> >>>>>>>>> We have UI interface which needs this data for rendering. >>>>>>>>> So efficiency of pulling this data matters a lot. It should be >>>>>>>>> fetched within a minute. >>>>>>>>> Is there a way to achieve such efficiency >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, >>>>>>>>>> it seems like the difference would only be a few minutes. Do you >>>>>>>>>> have to do >>>>>>>>>> this all the time, or only once in a while? >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>> >>>>>>>>>>> yes it works for 1000 but not more than that. >>>>>>>>>>> How can I fetch all rows using this efficiently? >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar < >>>>>>>>>>> ali.rac...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Jens, >>>>>>>>>>>>> >>>>>>>>>>>>> I have tried with fetch size of 10000 still its not giving any >>>>>>>>>>>>> results. >>>>>>>>>>>>> My expectations were that Cassandra can handle a million rows >>>>>>>>>>>>> easily. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there any mistake in the way I am defining the keys or >>>>>>>>>>>>> querying them. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Mehak >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil < >>>>>>>>>>>>> jens.ran...@tink.se> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set >>>>>>>>>>>>>> it too high, and you don't have too many tombstones, that should >>>>>>>>>>>>>> do it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Jens >>>>>>>>>>>>>> >>>>>>>>>>>>>> – >>>>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have requirement to fetch million row as result of my >>>>>>>>>>>>>>> query which is giving timeout errors. >>>>>>>>>>>>>>> I am fetching results by selecting clustering columns, then >>>>>>>>>>>>>>> why the queries are taking so long. I can change the timeout >>>>>>>>>>>>>>> settings but I >>>>>>>>>>>>>>> need the data to fetched faster as per my requirement. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My table definition is: >>>>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, >>>>>>>>>>>>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x >>>>>>>>>>>>>>> double, y >>>>>>>>>>>>>>> double, loc varchar, w double, h double, normalized varchar, >>>>>>>>>>>>>>> type varchar, >>>>>>>>>>>>>>> filehost varchar, filename varchar, image_uuid uuid, image_uri >>>>>>>>>>>>>>> varchar, >>>>>>>>>>>>>>> image_caseid varchar, image_mpp_x double, image_mpp_y double, >>>>>>>>>>>>>>> image_width >>>>>>>>>>>>>>> double, image_height double, objective double, cancer_type >>>>>>>>>>>>>>> varchar, Area >>>>>>>>>>>>>>> float, submit_date timestamp, points list<double>, PRIMARY KEY >>>>>>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here each row is uniquely identified on the basis of unique >>>>>>>>>>>>>>> uuid. But since my data is generally queried based upon >>>>>>>>>>>>>>> *image_caseid >>>>>>>>>>>>>>> *I have made it partition key. >>>>>>>>>>>>>>> I am currently using Java Datastax api to fetch the results. >>>>>>>>>>>>>>> But the query is taking a lot of time resulting in timeout >>>>>>>>>>>>>>> errors: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>>>> waiting for >>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>>>>>> Caused by: >>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: >>>>>>>>>>>>>>> All host(s) >>>>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>>>> waiting for >>>>>>>>>>>>>>> server response)) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also when I try the same query on console even while using >>>>>>>>>>>>>>> limit of 2000 rows: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 >>>>>>>>>>>>>>> limit 2000; >>>>>>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>>>> Mehak >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >