Data won't change much but queries will be different. I am not working on the rendering tool myself so I don't know much details about it.
Also as suggested by you I tried to fetch data in size of 500 or 1000 with java driver auto pagination. It fails when the number of records are high (around 100000) with following error: Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for server response)) On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > How often does the data change? > > I would still recommend a caching of some kind, but without knowing more > details (how often the data is changing, what you're doing with the 1m rows > after getting them, etc) I can't recommend a solution. > > I did see your other thread. I would also vote for elasticsearch / solr , > they are more suited for the kind of analytics you seem to be doing. > Cassandra is more for storing data, it isn't all that great for complex > queries / analytics. > > If you want to stick to cassandra, you might have better luck if you made > your range columns part of the primary key, so something like PRIMARY > KEY(caseId, x, y) > > On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <meme...@cs.stonybrook.edu> > wrote: > >> The rendering tool renders a portion a very large image. It may fetch >> different data each time from billions of rows. >> So I don't think I can cache such large results. Since same results will >> rarely fetched again. >> >> Also do you know how I can do 2d range queries using Cassandra. Some >> other users suggested me using Solr. >> But is there any way I can achieve that without using any other >> technology. >> >> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: >> >>> Sorry, meant to say "that way when you have to render, you can just >>> display the latest cache." >>> >>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac...@gmail.com> >>> wrote: >>> >>>> I would probably do this in a background thread and cache the results, >>>> that way when you have to render, you can just cache the latest results. >>>> >>>> I don't know why Cassandra can't seem to be able to fetch large batch >>>> sizes, I've also run into these timeouts but reducing the batch size to 2k >>>> seemed to work for me. >>>> >>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta <meme...@cs.stonybrook.edu >>>> > wrote: >>>> >>>>> We have UI interface which needs this data for rendering. >>>>> So efficiency of pulling this data matters a lot. It should be fetched >>>>> within a minute. >>>>> Is there a way to achieve such efficiency >>>>> >>>>> >>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>> wrote: >>>>> >>>>>> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it >>>>>> seems like the difference would only be a few minutes. Do you have to do >>>>>> this all the time, or only once in a while? >>>>>> >>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta < >>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>> >>>>>>> yes it works for 1000 but not more than that. >>>>>>> How can I fetch all rows using this efficiently? >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Have you tried a smaller fetch size, such as 5k - 2k ? >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta < >>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>> >>>>>>>>> Hi Jens, >>>>>>>>> >>>>>>>>> I have tried with fetch size of 10000 still its not giving any >>>>>>>>> results. >>>>>>>>> My expectations were that Cassandra can handle a million rows >>>>>>>>> easily. >>>>>>>>> >>>>>>>>> Is there any mistake in the way I am defining the keys or querying >>>>>>>>> them. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Mehak >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <jens.ran...@tink.se> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Try setting fetchsize before querying. Assuming you don't set it >>>>>>>>>> too high, and you don't have too many tombstones, that should do it. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Jens >>>>>>>>>> >>>>>>>>>> – >>>>>>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta < >>>>>>>>>> meme...@cs.stonybrook.edu> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have requirement to fetch million row as result of my query >>>>>>>>>>> which is giving timeout errors. >>>>>>>>>>> I am fetching results by selecting clustering columns, then why >>>>>>>>>>> the queries are taking so long. I can change the timeout settings >>>>>>>>>>> but I >>>>>>>>>>> need the data to fetched faster as per my requirement. >>>>>>>>>>> >>>>>>>>>>> My table definition is: >>>>>>>>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id >>>>>>>>>>> varchar, analysis_execution_uuid uuid, x double, y double, loc >>>>>>>>>>> varchar, w >>>>>>>>>>> double, h double, normalized varchar, type varchar, filehost >>>>>>>>>>> varchar, >>>>>>>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid >>>>>>>>>>> varchar, >>>>>>>>>>> image_mpp_x double, image_mpp_y double, image_width double, >>>>>>>>>>> image_height >>>>>>>>>>> double, objective double, cancer_type varchar, Area float, >>>>>>>>>>> submit_date >>>>>>>>>>> timestamp, points list<double>, PRIMARY KEY >>>>>>>>>>> ((image_caseid),Area,uuid));* >>>>>>>>>>> >>>>>>>>>>> Here each row is uniquely identified on the basis of unique >>>>>>>>>>> uuid. But since my data is generally queried based upon >>>>>>>>>>> *image_caseid >>>>>>>>>>> *I have made it partition key. >>>>>>>>>>> I am currently using Java Datastax api to fetch the results. But >>>>>>>>>>> the query is taking a lot of time resulting in timeout errors: >>>>>>>>>>> >>>>>>>>>>> Exception in thread "main" >>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>>>>>>>>> host(s) >>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>> waiting for >>>>>>>>>>> server response)) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>>>>>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>>>>>>>> at TestQuery.main(TestQuery.java:35) >>>>>>>>>>> Caused by: >>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All >>>>>>>>>>> host(s) >>>>>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out >>>>>>>>>>> waiting for >>>>>>>>>>> server response)) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>>>>>>>> at >>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>>>>>>>> at >>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>>>>> at >>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>>>>> >>>>>>>>>>> Also when I try the same query on console even while using limit >>>>>>>>>>> of 2000 rows: >>>>>>>>>>> >>>>>>>>>>> cqlsh:images> select count(*) from results where >>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 >>>>>>>>>>> limit 2000; >>>>>>>>>>> errors={}, last_host=127.0.0.1 >>>>>>>>>>> >>>>>>>>>>> Thanks and Regards, >>>>>>>>>>> Mehak >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >