I think you'll need to show us how to reproduce without your custom LoadFunc, e.g., with normal index scans outside of pig.
On Wed, Nov 17, 2010 at 3:56 PM, Christian Decker <decker.christ...@gmail.com> wrote: > On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> I'm pretty sure that "reading an index" and "using pig" are not >> compatible right now. the m/r support that pig builds on always does >> sequential-scan range queries. > > Yes it does, I have a specialized LoadFunc to read and load manually > maintained indices (pre-0.7 style), and it works like a charm as long as I > don't do nodetool loadbalance or add new nodes to the cluster. >> >> can you see the missing rows if you do a normal get_slice query for it >> without pig? > > They are empty, I suspect that the "eventual" in "eventual consistency" hit > me in the head, the empty rows are disappearing at an incredibly slow rate, > I guess it's repairing in the background, but it's taking forever > (100'000'000 rows in the cluster, 2 nodes added and after 3 days it's still > not done migrating to the new nodes). > > Could this actually be the case? > > Regards, > Chris > > B.T.W.: M/R and indices might mix well if we can just fetch the size of the > index, and then we could create the splits telling them to "fetch from index > starting from col n and fetch a max of m" any plans on implementing it? >> >> On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker >> <decker.christ...@gmail.com> wrote: >> > I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range >> > scans >> > since I perform a multi_get on the indexed keys. >> > >> > Regards, >> > Chris >> > >> > On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jbel...@gmail.com> >> > wrote: >> >> >> >> Are you using a version with working range scans? >> >> >> >> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker >> >> <decker.christ...@gmail.com> wrote: >> >> > Hi all, >> >> > >> >> > I'm having some doubts about the current state of my cluster. I >> >> > started >> >> > with >> >> > one node, filled it with some 10 million rows, then flushed and >> >> > compacted >> >> > the node. Then I ran a small pig script that read an index and >> >> > fetched >> >> > the >> >> > matching rows, no problem until this point. Now I add a new node with >> >> > AutoBootStrap turned on, it all seems to work as it chooses a token >> >> > to >> >> > take >> >> > over some of the first nodes responsibilities, it seems to transfer >> >> > all >> >> > the >> >> > relevant data and everything looks fine. Now if I run the pig script >> >> > again >> >> > it'll produce many empty rows, which points me to believe that these >> >> > rows >> >> > were read from the new node which doesn't yet have the corresponding >> >> > data. >> >> > Now this puzzles me, since I thought the bootstrap would transfer the >> >> > needed >> >> > data, will this eventually return to give me no empty rows or have I >> >> > done >> >> > something terribly wrong? >> >> > >> >> > Regards, >> >> > Chris >> >> > >> >> >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of Riptano, the source for professional Cassandra support >> >> http://riptano.com >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com