Currently there is only single node which I am calling directly with around 150000 rows. Full data will be in around billions per node. The code is working only for size 100/200. Also the consecutive fetching is taking around 5-10 secs.
I have a parallel script which is inserting the data while I am reading it. When I stopped the script it worked for 500/1000 but not more than that. On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > If even 500-1000 isn't working, then your cassandra node might not be up. > > 1) Try running nodetool status from shell on your cassandra server, make > sure the nodes are up. > > 2) Are you calling this on the same server where cassandra is running? Its > trying to connect to localhost . If you're running it on a different > server, try passing in the direct ip of your cassandra server. > > On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta <meme...@cs.stonybrook.edu> > wrote: > >> Data won't change much but queries will be different. >> I am not working on the rendering tool myself so I don't know much >> details about it. >> >> Also as suggested by you I tried to fetch data in size of 500 or 1000 >> with java driver auto pagination. >> It fails when the number of records are high (around 100000) with >> following error: >> >> Exception in thread "main" >> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >> tried for query failed (tried: localhost/127.0.0.1:9042 >> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for >> server response)) >> >> >> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >>> How often does the data change? >>> >>> I would still recommend a caching of some kind, but without knowing more >>> details (how often the data is changing, what you're doing with the 1m rows >>> after getting them, etc) I can't recommend a solution. >>> >>> I did see your other thread. I would also vote for elasticsearch / solr >>> , they are more suited for the kind of analytics you seem to be doing. >>> Cassandra is more for storing data, it isn't all that great for complex >>> queries / analytics. >>> >>> If you want to stick to cassandra, you might have better luck if you >>> made your range columns part of the primary key, so something like PRIMARY >>> KEY(caseId, x, y) >>> >>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >>> wrote: >>> >>>> The rendering tool renders a portion a very large image. It may fetch >>>> different data each time from billions of rows. >>>> So I don't think I can cache such large results. Since same results >>>> will rarely fetched again. >>>> >>>> Also do you know how I can do 2d range queries using Cassandra. Some >>>> other users suggested me using Solr. >>>> But is there any way I can achieve that without using any other >>>> technology. >>>> >>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> >>>> wrote: >>>> >>>>> Sorry, meant to say "that way when you have to render, you can just >>>>> display the latest cache." >>>>> >>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com> >>>>> wrote: >>>>> >>>>>> I would probably do this in a background thread and cache the >>>>>> results, that way when you have to render, you can just cache the latest >>>>>> results. >>>>>> >>>>>> I don't know why Cassandra can't seem to be able to fetch large batch >>>>>> sizes, I've also run into these timeouts but reducing the batch size to >>>>>> 2k >>>>>> seemed to work for me. >>>>>> >>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta < >>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>> >>>>>>> We have UI interface which needs this data for rendering. >>>>>>> So efficiency of pulling this data matters a lot. It should be >>>>>>> fetched within a minute. >>>>>>> Is there a way to achieve such efficiency >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it >>>>>>>> seems like the difference would only be a few minutes. Do you have to >>>>>>>> do >>>>>>>> this all the time, or only once in a while? >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>> >>>>>>>>> yes it works for 1000 but not more than that. >>>>>>>>> How can I fetch all rows using this efficiently? >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Jens, >>>>>>>>>>> >>>>>>>>>>> I have tried with fetch size of 10000 still its not giving any >>>>>>>>>>> results. >>>>>>>>>>> My expectations were that Cassandra can handle a million rows >>>>>>>>>>> easily. >>>>>>>>>>> >>>>>>>>>>> Is there any mistake in the way I am defining the keys or >>>>>>>>>>> querying them. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Mehak >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil < >>>>>>>>>>> jens.ran...@tink.se> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set >>>>>>>>>>>> it too high, and you don't have too many tombstones, that should >>>>>>>>>>>> do it. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Jens >>>>>>>>>>>> >>>>>>>>>>>> – >>>>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have requirement to fetch million row as result of my query >>>>>>>>>>>>> which is giving timeout errors. >>>>>>>>>>>>> I am fetching results by selecting clustering columns, then >>>>>>>>>>>>> why the queries are taking so long. I can change the timeout >>>>>>>>>>>>> settings but I >>>>>>>>>>>>> need the data to fetched faster as per my requirement. >>>>>>>>>>>>> >>>>>>>>>>>>> My table definition is: >>>>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id >>>>>>>>>>>>> varchar, analysis_execution_uuid uuid, x double, y double, loc >>>>>>>>>>>>> varchar, w >>>>>>>>>>>>> double, h double, normalized varchar, type varchar, filehost >>>>>>>>>>>>> varchar, >>>>>>>>>>>>> filename varchar, image_uuid uuid, image_uri varchar, >>>>>>>>>>>>> image_caseid varchar, >>>>>>>>>>>>> image_mpp_x double, image_mpp_y double, image_width double, >>>>>>>>>>>>> image_height >>>>>>>>>>>>> double, objective double, cancer_type varchar, Area float, >>>>>>>>>>>>> submit_date >>>>>>>>>>>>> timestamp, points list<double>, PRIMARY KEY >>>>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>>>> >>>>>>>>>>>>> Here each row is uniquely identified on the basis of unique >>>>>>>>>>>>> uuid. But since my data is generally queried based upon >>>>>>>>>>>>> *image_caseid >>>>>>>>>>>>> *I have made it partition key. >>>>>>>>>>>>> I am currently using Java Datastax api to fetch the results. >>>>>>>>>>>>> But the query is taking a lot of time resulting in timeout errors: >>>>>>>>>>>>> >>>>>>>>>>>>> Exception in thread "main" >>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>>>>>>>>>>> host(s) >>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>> waiting for >>>>>>>>>>>>> server response)) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>>>> Caused by: >>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>>>>>>>>>>> host(s) >>>>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>>>> waiting for >>>>>>>>>>>>> server response)) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>>>> at >>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>>>> at >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>>>> at >>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>>>> >>>>>>>>>>>>> Also when I try the same query on console even while using >>>>>>>>>>>>> limit of 2000 rows: >>>>>>>>>>>>> >>>>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 >>>>>>>>>>>>> limit 2000; >>>>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks and Regards, >>>>>>>>>>>>> Mehak >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >