Re: Rows missing after new node bootstrapped

Jonathan Ellis Mon, 22 Nov 2010 11:12:03 -0800

I think you'll need to show us how to reproduce without your custom
LoadFunc, e.g., with normal index scans outside of pig.


On Wed, Nov 17, 2010 at 3:56 PM, Christian Decker
<decker.christ...@gmail.com> wrote:
> On Tue, Nov 16, 2010 at 6:58 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> I'm pretty sure that "reading an index" and "using pig" are not
>> compatible right now.  the m/r support that pig builds on always does
>> sequential-scan range queries.
>
> Yes it does, I have a specialized LoadFunc to read and load manually
> maintained indices (pre-0.7 style), and it works like a charm as long as I
> don't do nodetool loadbalance or add new nodes to the cluster.
>>
>> can you see the missing rows if you do a normal get_slice query for it
>> without pig?
>
> They are empty, I suspect that the "eventual" in "eventual consistency" hit
> me in the head, the empty rows are disappearing at an incredibly slow rate,
> I guess it's repairing in the background, but it's taking forever
> (100'000'000 rows in the cluster, 2 nodes added and after 3 days it's still
> not done migrating to the new nodes).
>
> Could this actually be the case?
>
> Regards,
> Chris
>
> B.T.W.: M/R and indices might mix well if we can just fetch the size of the
> index, and then we could create the splits telling them to "fetch from index
> starting from col n and fetch a max of m" any plans on implementing it?
>>
>> On Mon, Nov 15, 2010 at 7:03 AM, Christian Decker
>> <decker.christ...@gmail.com> wrote:
>> > I'm using tag cassandra-0.7.0-beta3. I wouldn't know why I need range
>> > scans
>> > since I perform a multi_get on the indexed keys.
>> >
>> > Regards,
>> > Chris
>> >
>> > On Sun, Nov 14, 2010 at 9:51 AM, Jonathan Ellis <jbel...@gmail.com>
>> > wrote:
>> >>
>> >> Are you using a version with working range scans?
>> >>
>> >> On Sat, Nov 13, 2010 at 6:11 PM, Christian Decker
>> >> <decker.christ...@gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > I'm having some doubts about the current state of my cluster. I
>> >> > started
>> >> > with
>> >> > one node, filled it with some 10 million rows, then flushed and
>> >> > compacted
>> >> > the node. Then I ran a small pig script that read an index and
>> >> > fetched
>> >> > the
>> >> > matching rows, no problem until this point. Now I add a new node with
>> >> > AutoBootStrap turned on, it all seems to work as it chooses a token
>> >> > to
>> >> > take
>> >> > over some of the first nodes responsibilities, it seems to transfer
>> >> > all
>> >> > the
>> >> > relevant data and everything looks fine. Now if I run the pig script
>> >> > again
>> >> > it'll produce many empty rows, which points me to believe that these
>> >> > rows
>> >> > were read from the new node which doesn't yet have the corresponding
>> >> > data.
>> >> > Now this puzzles me, since I thought the bootstrap would transfer the
>> >> > needed
>> >> > data, will this eventually return to give me no empty rows or have I
>> >> > done
>> >> > something terribly wrong?
>> >> >
>> >> > Regards,
>> >> > Chris
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Rows missing after new node bootstrapped

Reply via email to