Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems like the difference would only be a few minutes. Do you have to do this all the time, or only once in a while?
On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta <meme...@cs.stonybrook.edu> wrote: > yes it works for 1000 but not more than that. > How can I fetch all rows using this efficiently? > > On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > >> Have you tried a smaller fetch size, such as 5k - 2k ? >> >> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta <meme...@cs.stonybrook.edu> >> wrote: >> >>> Hi Jens, >>> >>> I have tried with fetch size of 10000 still its not giving any results. >>> My expectations were that Cassandra can handle a million rows easily. >>> >>> Is there any mistake in the way I am defining the keys or querying them. >>> >>> Thanks >>> Mehak >>> >>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <jens.ran...@tink.se> >>> wrote: >>> >>>> Hi, >>>> >>>> Try setting fetchsize before querying. Assuming you don't set it too >>>> high, and you don't have too many tombstones, that should do it. >>>> >>>> Cheers, >>>> Jens >>>> >>>> – >>>> Skickat från Mailbox <https://www.dropbox.com/mailbox> >>>> >>>> >>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta <meme...@cs.stonybrook.edu >>>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I have requirement to fetch million row as result of my query which is >>>>> giving timeout errors. >>>>> I am fetching results by selecting clustering columns, then why the >>>>> queries are taking so long. I can change the timeout settings but I need >>>>> the data to fetched faster as per my requirement. >>>>> >>>>> My table definition is: >>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id >>>>> varchar, analysis_execution_uuid uuid, x double, y double, loc varchar, w >>>>> double, h double, normalized varchar, type varchar, filehost varchar, >>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid >>>>> varchar, >>>>> image_mpp_x double, image_mpp_y double, image_width double, image_height >>>>> double, objective double, cancer_type varchar, Area float, submit_date >>>>> timestamp, points list<double>, PRIMARY KEY ((image_caseid),Area,uuid));* >>>>> >>>>> Here each row is uniquely identified on the basis of unique uuid. But >>>>> since my data is generally queried based upon *image_caseid *I have >>>>> made it partition key. >>>>> I am currently using Java Datastax api to fetch the results. But the >>>>> query is taking a lot of time resulting in timeout errors: >>>>> >>>>> Exception in thread "main" >>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting >>>>> for >>>>> server response)) >>>>> at >>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) >>>>> at >>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289) >>>>> at >>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205) >>>>> at >>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) >>>>> at QueryDB.queryArea(TestQuery.java:59) >>>>> at TestQuery.main(TestQuery.java:35) >>>>> Caused by: >>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) >>>>> tried for query failed (tried: localhost/127.0.0.1:9042 >>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting >>>>> for >>>>> server response)) >>>>> at >>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108) >>>>> at >>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> >>>>> Also when I try the same query on console even while using limit of >>>>> 2000 rows: >>>>> >>>>> cqlsh:images> select count(*) from results where >>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20 limit >>>>> 2000; >>>>> errors={}, last_host=127.0.0.1 >>>>> >>>>> Thanks and Regards, >>>>> Mehak >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >