Querying Multiple Indices

2012-01-25 Thread Runar Jordahl
Are there any plans for allowing Riak to query multiple indices in one
operation? If there are, how will these queries work?

Kind regards
Runar

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Should Riak have used dedicated nodes for secondary indices?

2012-01-25 Thread Runar Jordahl
Siddharth Anand, says that secondary indices (for a key-value store)
best is placed on a separate node, avoiding the need to look up 1 / N
nodes during a query:

"Systems that shard data based on a primary key will do well when
routed by that key. When routed by a secondary key, the system will
need to “spray” a query across all shards. If one of the shards is
experiencing high latency, the system will return either no results or
incomplete (i.e. inconsistent) results. For this reason, it would make
sense to store the secondary index on an unsharded (but replicated)
system."

http://highscalability.com/blog/2012/1/24/the-state-of-nosql-in-2012.html

If I understand Riak correctly, it takes the opposite approach,
storing secondary indices together with the data.

To me at appears like Riak’s approach gives a more uniform system,
with all nodes having the same responsibilities. Does anyone else have
any thoughts on this?

Kind regards
Runar Jordahl
blog.epigent.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Should Riak have used dedicated nodes for secondary indices?

2012-01-25 Thread Jeremiah Peschka
Good news! Riak doesn't use sharding.

Data locality is critical in a distributed system. When you create an
index, your structure looks something like:

indexed_value:record_id

Reading from an index requires locating indexed_value, finding all matching
values, and then retrieving all matching record_ids. By keeping index data
on the same node as the source data, Riak avoids having to remote the query
to retrieve object data. This is a Good Thing. The network is slow and
unreliable. Just ask an Australian.

Riak's approach is intended to provide a uniform system where you can treat
any node equally. The idea that there should be an unsharded index node is
a bit ludicrous. Let's say you have 1TB of raw data. Your indexes are
pretty light and are only about 20% of your data size. This means that you
need 200GB of good storage (not some cheap $150 SATA HDD you found on
NewEgg). 200GB of RAID 10 SAS storage isn't that pricey to put in a single
unsharded machine. Over time as your data grows and your indexing changes,
you may have 10TB and your index size is ~40% of your data. Your unsharded
index server now has to have 4TB of fast, reliable storage. And, since this
is an unsharded system, you'll want multiple replicas of your unsharded
index server to make sure that a hardware hiccup doesn't take down your
ability to perform fast lookups. Besides - a single indexing server becomes
a single bottleneck and a single point of failure in your system.

Most people using Lucene as their indexing store are sharding Lucene. From
an anecdotal standpoint, about 70% of the people I've talked to using
Lucene are getting to the point of sharding their replicated Lucene indexes.

I'm not saying that either approach is good or bad; just remember that
every solution has drawbacks.
---
Jeremiah Peschka, SQL Server MVP
Managing Director, Brent Ozar PLF, LLC


On Wed, Jan 25, 2012 at 5:15 AM, Runar Jordahl wrote:

> Siddharth Anand, says that secondary indices (for a key-value store)
> best is placed on a separate node, avoiding the need to look up 1 / N
> nodes during a query:
>
> "Systems that shard data based on a primary key will do well when
> routed by that key. When routed by a secondary key, the system will
> need to “spray” a query across all shards. If one of the shards is
> experiencing high latency, the system will return either no results or
> incomplete (i.e. inconsistent) results. For this reason, it would make
> sense to store the secondary index on an unsharded (but replicated)
> system."
>
> http://highscalability.com/blog/2012/1/24/the-state-of-nosql-in-2012.html
>
> If I understand Riak correctly, it takes the opposite approach,
> storing secondary indices together with the data.
>
> To me at appears like Riak’s approach gives a more uniform system,
> with all nodes having the same responsibilities. Does anyone else have
> any thoughts on this?
>
> Kind regards
> Runar Jordahl
> blog.epigent.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Querying Multiple Indices

2012-01-25 Thread Jeremiah Peschka
I have no idea if there are plans for this.

Writing these queries yourself should be pretty trivial. Just spin up n
threads, query n indexes, and then use some join algorithm [1] to merge the
result sets together to produce the key list that you need to perform your
lookups.

[1]: https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
---
Jeremiah Peschka
Managing Director, Brent Ozar PLF, LLC


On Wed, Jan 25, 2012 at 5:04 AM, Runar Jordahl wrote:

> Are there any plans for allowing Riak to query multiple indices in one
> operation? If there are, how will these queries work?
>
> Kind regards
> Runar
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Cant load data after enabling indexing on bucket with precommit hook

2012-01-25 Thread charlesugwuh
I figured this out.

So I'm using the  https://github.com/basho/riak-php-client PHP Riak Client .
I was using the setProperties method to set the commit hooks for the bucket
to enable search. 

However, I was defining the bucket properties like so: '$prop_arr =
array('precommit' => array('mod' => 'riak_search_kv_hook', 'fun' =>
'precommit'));' instead of like so: '$prop_arr = array('precommit' =>
array(array('mod' => 'riak_search_kv_hook', 'fun' => 'precommit')));'.

It's a very subtle difference but the first one creates this:
'{"props":{"precommit":{"mod":"riak_search_kv_hook","fun":"precommit"}}}'
while the second creates this
'{"props":{"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}]}}'.
The difference is the square bracket and this threw the whole thing out of
whack when it wasn't present.

So now everything is working ok.

--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Cant-load-data-after-enabling-indexing-on-bucket-with-precommit-hook-tp3686584p3687878.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


unsubscribe

2012-01-25 Thread Roberto Calero



From: jeremiah.pesc...@gmail.com
Date: Wed, 25 Jan 2012 06:48:45 -0800
Subject: Re: Should Riak have used dedicated nodes for secondary indices?
To: runar.jord...@gmail.com
CC: riak-users@lists.basho.com

Good news! Riak doesn't use sharding.

Data locality is critical in a distributed system. When you create an index, 
your structure looks something like:

indexed_value:record_id

Reading from an index requires locating indexed_value, finding all matching 
values, and then retrieving all matching record_ids. By keeping index data on 
the same node as the source data, Riak avoids having to remote the query to 
retrieve object data. This is a Good Thing. The network is slow and unreliable. 
Just ask an Australian.



Riak's approach is intended to provide a uniform system where you can treat any 
node equally. The idea that there should be an unsharded index node is a bit 
ludicrous. Let's say you have 1TB of raw data. Your indexes are pretty light 
and are only about 20% of your data size. This means that you need 200GB of 
good storage (not some cheap $150 SATA HDD you found on NewEgg). 200GB of RAID 
10 SAS storage isn't that pricey to put in a single unsharded machine. Over 
time as your data grows and your indexing changes, you may have 10TB and your 
index size is ~40% of your data. Your unsharded index server now has to have 
4TB of fast, reliable storage. And, since this is an unsharded system, you'll 
want multiple replicas of your unsharded index server to make sure that a 
hardware hiccup doesn't take down your ability to perform fast lookups. Besides 
- a single indexing server becomes a single bottleneck and a single point of 
failure in your system.



Most people using Lucene as their indexing store are sharding Lucene. From an 
anecdotal standpoint, about 70% of the people I've talked to using Lucene are 
getting to the point of sharding their replicated Lucene indexes.



I'm not saying that either approach is good or bad; just remember that every 
solution has drawbacks.---
Jeremiah Peschka, SQL Server MVP
Managing Director, Brent Ozar PLF, LLC





On Wed, Jan 25, 2012 at 5:15 AM, Runar Jordahl  wrote:


Siddharth Anand, says that secondary indices (for a key-value store)

best is placed on a separate node, avoiding the need to look up 1 / N

nodes during a query:



"Systems that shard data based on a primary key will do well when

routed by that key. When routed by a secondary key, the system will

need to “spray” a query across all shards. If one of the shards is

experiencing high latency, the system will return either no results or

incomplete (i.e. inconsistent) results. For this reason, it would make

sense to store the secondary index on an unsharded (but replicated)

system."



http://highscalability.com/blog/2012/1/24/the-state-of-nosql-in-2012.html



If I understand Riak correctly, it takes the opposite approach,

storing secondary indices together with the data.



To me at appears like Riak’s approach gives a more uniform system,

with all nodes having the same responsibilities. Does anyone else have

any thoughts on this?



Kind regards

Runar Jordahl

blog.epigent.com



___

riak-users mailing list

riak-users@lists.basho.com

http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com  
  ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Querying Multiple Indices

2012-01-25 Thread Sean Cribbs
We have always intended to have the ability to query multiple indexes at
once, but the feature is not pinned on the roadmap yet. I'm sure there will
be an announcement preceding its release.

On Wed, Jan 25, 2012 at 9:50 AM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> I have no idea if there are plans for this.
>
> Writing these queries yourself should be pretty trivial. Just spin up n
> threads, query n indexes, and then use some join algorithm [1] to merge the
> result sets together to produce the key list that you need to perform your
> lookups.
>
> [1]: https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
> ---
> Jeremiah Peschka
> Managing Director, Brent Ozar PLF, LLC
>
>
>
> On Wed, Jan 25, 2012 at 5:04 AM, Runar Jordahl wrote:
>
>> Are there any plans for allowing Riak to query multiple indices in one
>> operation? If there are, how will these queries work?
>>
>> Kind regards
>> Runar
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs 
Software Engineer
Basho Technologies, Inc.
http://basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Cant load data after enabling indexing on bucket with precommit hook

2012-01-25 Thread Sean Cribbs
Also note that on 1.0 and later you can set the "search" bucket property to
true and that will install the hook for you.

On Wed, Jan 25, 2012 at 9:51 AM, charlesugwuh wrote:

> I figured this out.
>
> So I'm using the  https://github.com/basho/riak-php-client PHP Riak
> Client .
> I was using the setProperties method to set the commit hooks for the bucket
> to enable search.
>
> However, I was defining the bucket properties like so: '$prop_arr =
> array('precommit' => array('mod' => 'riak_search_kv_hook', 'fun' =>
> 'precommit'));' instead of like so: '$prop_arr = array('precommit' =>
> array(array('mod' => 'riak_search_kv_hook', 'fun' => 'precommit')));'.
>
> It's a very subtle difference but the first one creates this:
> '{"props":{"precommit":{"mod":"riak_search_kv_hook","fun":"precommit"}}}'
> while the second creates this
>
> '{"props":{"precommit":[{"mod":"riak_search_kv_hook","fun":"precommit"}]}}'.
> The difference is the square bracket and this threw the whole thing out of
> whack when it wasn't present.
>
> So now everything is working ok.
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Cant-load-data-after-enabling-indexing-on-bucket-with-precommit-hook-tp3686584p3687878.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs 
Software Engineer
Basho Technologies, Inc.
http://basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: unsubscribe

2012-01-25 Thread Ryan Zezeski
I think this is a reasonable question to ask, re the 1/N question during
query time.  This is currently an implementation detail of secondary
indices in Riak.  The document vs. term-based index partitioning seems to
be a common debate.  It almost feels like the emacs/vim flamewar of the
distributed index field.  A couple issues to take with the document-based
approach is that since you must query 1/N of your nodes you a) reduce the
throughput of concurrent queries across the cluster (because queries must
contend for resources) and b) as N grows you increase your chance of
hitting the tcp incast problem [1] which can cause major problems.  A
couple of issues with term-based are a)  your data and it's index are no
longer co-located which can be nice b) indexing a single document will
often cause a write to all nodes.

To fight 1/N in document-based perhaps you could do a chained query
approach where the query is sent around the ring to avoid the incast
problem, but at the cost of higher latencies (although you could
parallelize it to  some set number X and skip nodes in the ring at the end
having X nodes converging to the coordinator).  To fight term-based's write
to all nodes on index problem perhaps you could hash on something less
granular like index/field or even just index but replicate to more nodes
and play some tricks at the index level for better concurrent access.  If
this all seems hand wavy that's because it is.

I also think it's perfectly reasonable to keep your index completely
separate from the data itself.  Look no further than a library for a real
life example of this, the card catalog.  Yes, you have potential
consistency problems, you end up having special nodes in the system (e.g.
special nodes dedicated to indexing), multiple hops to get from index
lookup to object retrieval, but at the end of the day these are the
tradeoffs you must weigh in light of your system.

These are questions I continue to ponder myself and it's my belief that
Riak will continue to get stronger at querying your data.  However,
sometimes you may also need to use another solution to store your index
that works alongside Riak and I also think that is a perfectly reasonable
choice as long as you understand the system you're building and the
tradeoffs you are making.  I think in the future it would even be good for
Riak to have integration points with other solutions to make stuff like
this easier.  Please voice your opinions on the mailing list if you have
them.  I would love to hear them.

[1]: http://www.snookles.com/slf-blog/2012/01/05/tcp-incast-what-is-it/

-Ryan



On Wed, Jan 25, 2012 at 10:12 AM, Roberto Calero  wrote:

>
>
> --
> From: jeremiah.pesc...@gmail.com
> Date: Wed, 25 Jan 2012 06:48:45 -0800
> Subject: Re: Should Riak have used dedicated nodes for secondary indices?
> To: runar.jord...@gmail.com
> CC: riak-users@lists.basho.com
>
> Good news! Riak doesn't use sharding.
>
> Data locality is critical in a distributed system. When you create an
> index, your structure looks something like:
>
> indexed_value:record_id
>
> Reading from an index requires locating indexed_value, finding all
> matching values, and then retrieving all matching record_ids. By keeping
> index data on the same node as the source data, Riak avoids having to
> remote the query to retrieve object data. This is a Good Thing. The network
> is slow and unreliable. Just ask an Australian.
>
> Riak's approach is intended to provide a uniform system where you can
> treat any node equally. The idea that there should be an unsharded index
> node is a bit ludicrous. Let's say you have 1TB of raw data. Your indexes
> are pretty light and are only about 20% of your data size. This means that
> you need 200GB of good storage (not some cheap $150 SATA HDD you found on
> NewEgg). 200GB of RAID 10 SAS storage isn't that pricey to put in a single
> unsharded machine. Over time as your data grows and your indexing changes,
> you may have 10TB and your index size is ~40% of your data. Your unsharded
> index server now has to have 4TB of fast, reliable storage. And, since this
> is an unsharded system, you'll want multiple replicas of your unsharded
> index server to make sure that a hardware hiccup doesn't take down your
> ability to perform fast lookups. Besides - a single indexing server becomes
> a single bottleneck and a single point of failure in your system.
>
> Most people using Lucene as their indexing store are sharding Lucene. From
> an anecdotal standpoint, about 70% of the people I've talked to using
> Lucene are getting to the point of sharding their replicated Lucene indexes.
>
> I'm not saying that either approach is good or bad; just remember that
> every solution has drawbacks.
> ---
> Jeremiah Peschka, SQL Server MVP
> Managing Director, Brent Ozar PLF, LLC
>
>
> On Wed, Jan 25, 2012 at 5:15 AM, Runar Jordahl wrote:
>
> Siddharth Anand, says that secondary indices (for a key-value store)
> best is pl

secondary index read with limit?

2012-01-25 Thread Michael Radford
I'm wondering if there's any way to achieve a limited secondary index
read in Riak, i.e., up to N records in some key range.  With that
primitive, it seems like it would be possible for the user to
implement things like pagination, incremental joins over multiple
indices, etc.  (For efficiency I guess it would have to be something
like "up to N records from each vnode"?)

Looking at the source code, I think the answer is no, but since range
queries are already implemented using iterators, it doesn't seem like
it would be too hard to add. Can anyone comment on the roadmap for a
feature like this, and/or the possibility of getting a patch accepted?

Mike

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: secondary index read with limit?

2012-01-25 Thread Alexander Sicular
Afaik, range queries the likes of which you suggest are not possible in riak 
because riak is an inherently unordered system. That being said, it is a major 
cause of concern for those looking at riak as a potential solution. One 
ultimately needs to implement a secondary ordered index elsewhere to achieve 
desired results. I pair riak with redis, like pairing a sizzling steak with a 
fine red wine. 

-Alexander Sicular

@siculars

On Jan 25, 2012, at 1:08 PM, Michael Radford wrote:

> I'm wondering if there's any way to achieve a limited secondary index
> read in Riak, i.e., up to N records in some key range.  With that
> primitive, it seems like it would be possible for the user to
> implement things like pagination, incremental joins over multiple
> indices, etc.  (For efficiency I guess it would have to be something
> like "up to N records from each vnode"?)
> 
> Looking at the source code, I think the answer is no, but since range
> queries are already implemented using iterators, it doesn't seem like
> it would be too hard to add. Can anyone comment on the roadmap for a
> feature like this, and/or the possibility of getting a patch accepted?
> 
> Mike
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: secondary index read with limit?

2012-01-25 Thread Reid Draper

On Jan 25, 2012, at 1:20 PM, Alexander Sicular wrote:

> Afaik, range queries the likes of which you suggest are not possible in riak 
> because riak is an inherently unordered system. That being said, it is a 
> major cause of concern for those looking at riak as a potential solution. One 
> ultimately needs to implement a secondary ordered index elsewhere to achieve 
> desired results. I pair riak with redis, like pairing a sizzling steak with a 
> fine red wine. 
Values are sorted by key (per vnode) when using the eleveldb backend. As Mike 
pointed out, doing numerically limited range queries across the cluster would 
require getting (unknown at query time) data from each vnode, and sorting the 
results as they come in. We've talked about ideas like this, but there is 
nothing specific currently on the roadmap. Range queries are possible now using 
2i, but the results aren't sorted and there is no way to numerically limit the 
results.
> 
> -Alexander Sicular
> 
> @siculars
> 
> On Jan 25, 2012, at 1:08 PM, Michael Radford wrote:
> 
>> I'm wondering if there's any way to achieve a limited secondary index
>> read in Riak, i.e., up to N records in some key range.  With that
>> primitive, it seems like it would be possible for the user to
>> implement things like pagination, incremental joins over multiple
>> indices, etc.  (For efficiency I guess it would have to be something
>> like "up to N records from each vnode"?)
>> 
>> Looking at the source code, I think the answer is no, but since range
>> queries are already implemented using iterators, it doesn't seem like
>> it would be too hard to add. Can anyone comment on the roadmap for a
>> feature like this, and/or the possibility of getting a patch accepted?
>> 
>> Mike
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


v1.0.3 search merge index preventing partition handoff?

2012-01-25 Thread Fisher, Ryan
Hello all,

We are hitting an issue with a riak 1.0.3 cluster when adding new nodes to
the ring.  Specifically the handoff appears stuck and isn't making any
progress.

I have read a number of the threads on here and realize handoff will take a
while, and have also tried attaching to the console and doing a force_update
along w/ force_handoffs.  However over 12 hours later the nodes haven't made
any progress.  After digging through the log files it appears that the
search merge_index could be my problem?  Possibly the compaction isn't
occurring properly?

We are running a riak 1.0.3 cluster for a research project, where we are
utilizing the python client for reads, writes, and queries of the cluster.
Using a small data set of 20k keys things were humming along nicely.

We then started to ramp up the number of objects and ended up getting to
around 1M objects.  At this same time I added an additional node (w/ plans
to expand to 8 nodes total).

However it appears that the partition handoff is stuck after performing the
'join' command on the 5th node I was adding.

So currently it is a 4 + 1 node cluster with 4 gig of memory per node, am
running the bitcask backend with 'search' enabled on some of the buckets.
Specifically I am using the 'out of the box' JSON encoding schema by simply
setting the mime-type to "application/json", when I do the store from the
python client.

I'm wondering if enabling search and using the default JSON schema was too
much data to index?  Outside of increasing the linux file limit on the
nodes, enabling 'search' (in the config file and w/ the pre-commit hook),
and upping the ring_creation_size to 256 (before I started or added any
nodes) there shouldn't be much else out of the ordinary going on.  This was
an original 1.0 riak cluster which I have been performing rolling upgrades
on as the bug fix versions come out.  However currently all 4 + 1 nodes are
1.0.3

Here are the *I hope* relevant error logs?

Riak error log:
http://pastebin.com/99cdPdCk

Riak crash log:
http://pastebin.com/07FRZkf2

Riak erlang log:
http://pastebin.com/DvdasWyR

Does anyone have any ideas on how to 'unstick' the partition handoff?  Or
maybe the bigger question is indexing all of the incoming data (outside of
the disk space requirements) a bad idea?  Perhaps I need to write a custom
schema that limits what gets indexed?

I should mention that the search is a 'nice-to-have' but the data is
structured in a way that we know the keys we need at lookup time (for the
most part) and I can probably use m/r to query the restŠ With that I'm
wondering if it comes down to it can search be easily 'undone' on the
cluster?  Maybe as simply as disabling the pre-commit hook, turning it off
in the app.config and them deleting the riak/merge_index directories on each
node?  


Thanks,
ryan




smime.p7s
Description: S/MIME cryptographic signature
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: v1.0.3 search merge index preventing partition handoff?

2012-01-25 Thread Kresten Krab Thorup
Hi Ryan,

>From the looks of the crash log, it seems that one of your merge index files 
>may be corrupt (did you run out of disk space, or crash a node?)

At any rate, what seems to be happening is that the search vnode is in the 
middle of a handoff (presumably to the new machine), and while it is doing a 
full scan of the merge index segment files to transfer data, it encounters a 
bad file.  This results in the crash log is that it tries to do 
binary_to_term(<<131,109,0,0,128,40,...>>) on a 46-byte binary; but the encoded 
stream says that the data length should be 128*16+40 bytes, i.e. 2088 bytes 
long.

So, something is too short, which I would guess could have happened because 
either the server crashed or ran out of disk.  From a casual inspection of the 
code, it doesn't look like merge indexes are resilient to a node crashing while 
it is writing to disk.

I don't know search intimately, but I have seen mention of problems before that 
were caused by "bad indexes", and the resolution seems to be to delete the 
merge index files (the search index in your case 
/var/lib/riak/merge_index/159851741583067506678528028578343455274867621888), 
and then iterate over all values and re-write them.  Bummer.

Perhaps someone from Basho can chime in and tell us (A) if it seems plausible 
that the merge index segment files are indeed corrupt, and (B) if so, what is 
the right way to recover from that.

Kresten





On Jan 25, 2012, at 9:18 PM, Fisher, Ryan wrote:

Hello all,

We are hitting an issue with a riak 1.0.3 cluster when adding new nodes to the 
ring.  Specifically the handoff appears stuck and isn't making any progress.

I have read a number of the threads on here and realize handoff will take a 
while, and have also tried attaching to the console and doing a force_update 
along w/ force_handoffs.  However over 12 hours later the nodes haven't made 
any progress.  After digging through the log files it appears that the search 
merge_index could be my problem?  Possibly the compaction isn't occurring 
properly?

We are running a riak 1.0.3 cluster for a research project, where we are 
utilizing the python client for reads, writes, and queries of the cluster.  
Using a small data set of 20k keys things were humming along nicely.

We then started to ramp up the number of objects and ended up getting to around 
1M objects.  At this same time I added an additional node (w/ plans to expand 
to 8 nodes total).

However it appears that the partition handoff is stuck after performing the 
'join' command on the 5th node I was adding.

So currently it is a 4 + 1 node cluster with 4 gig of memory per node, am 
running the bitcask backend with 'search' enabled on some of the buckets.  
Specifically I am using the 'out of the box' JSON encoding schema by simply 
setting the mime-type to "application/json", when I do the store from the 
python client.

I'm wondering if enabling search and using the default JSON schema was too much 
data to index?  Outside of increasing the linux file limit on the nodes, 
enabling 'search' (in the config file and w/ the pre-commit hook), and upping 
the ring_creation_size to 256 (before I started or added any nodes) there 
shouldn't be much else out of the ordinary going on.  This was an original 1.0 
riak cluster which I have been performing rolling upgrades on as the bug fix 
versions come out.  However currently all 4 + 1 nodes are 1.0.3

Here are the *I hope* relevant error logs?

Riak error log:
http://pastebin.com/99cdPdCk

Riak crash log:
http://pastebin.com/07FRZkf2

Riak erlang log:
http://pastebin.com/DvdasWyR

Does anyone have any ideas on how to 'unstick' the partition handoff?  Or maybe 
the bigger question is indexing all of the incoming data (outside of the disk 
space requirements) a bad idea?  Perhaps I need to write a custom schema that 
limits what gets indexed?

I should mention that the search is a 'nice-to-have' but the data is structured 
in a way that we know the keys we need at lookup time (for the most part) and I 
can probably use m/r to query the rest… With that I'm wondering if it comes 
down to it can search be easily 'undone' on the cluster?  Maybe as simply as 
disabling the pre-commit hook, turning it off in the app.config and them 
deleting the riak/merge_index directories on each node?


Thanks,
ryan




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
8787  |  www.trifork.com





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Querying Multiple Indices

2012-01-25 Thread Carl

  
  
It would be more efficient in disk reads if the keys output by one
index search could somehow constrain the second one, and so on.

On Wed, Jan 25, 2012 at 9:50 AM, Jeremiah Peschka 
wrote:

  

  I have no idea if there are plans for this.
  
  Writing these queries yourself should be pretty trivial. Just
  spin up n threads, query n indexes, and then use some join
  algorithm [1] to merge the result sets together to produce the
  key list that you need to perform your lookups.
  
  [1]: https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
  ---
  Jeremiah Peschka
  Managing Director, Brent Ozar PLF,
  LLC
  

  
  
  On Wed, Jan 25, 2012 at 5:04 AM,
Runar Jordahl 
wrote:

  Are there any plans for allowing Riak to query
  multiple indices in one
  operation? If there are, how will these queries work?
  
  Kind regards
  Runar
  
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

  
  

  
  
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
  

  
  
  
  
  
  -- 
  Sean Cribbs 
  Software Engineer
  Basho Technologies, Inc.
  http://basho.com/
  
  
  
  
  ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



  


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Querying Multiple Indices

2012-01-25 Thread Sean Cribbs
See also: http://features.basho.com/entries/20516853-bitmap-index-queries

On Wed, Jan 25, 2012 at 5:05 PM, Carl  wrote:

>  It would be more efficient in disk reads if the keys output by one index
> search could somehow constrain the second one, and so on.
>
>
> On Wed, Jan 25, 2012 at 9:50 AM, Jeremiah Peschka <
> jeremiah.pesc...@gmail.com> wrote:
>
>  I have no idea if there are plans for this.
>>
>> Writing these queries yourself should be pretty trivial. Just spin up n
>> threads, query n indexes, and then use some join algorithm [1] to merge the
>> result sets together to produce the key list that you need to perform your
>> lookups.
>>
>> [1]: https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
>> ---
>> Jeremiah Peschka
>> Managing Director, Brent Ozar PLF, LLC
>>
>>
>>
>> On Wed, Jan 25, 2012 at 5:04 AM, Runar Jordahl 
>> wrote:
>>
>>> Are there any plans for allowing Riak to query multiple indices in one
>>> operation? If there are, how will these queries work?
>>>
>>> Kind regards
>>> Runar
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
>  --
> Sean Cribbs 
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>
>
>
> ___
> riak-users mailing 
> listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs 
Software Engineer
Basho Technologies, Inc.
http://basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak sizing considerations

2012-01-25 Thread Ian Plosker
Sebastian,

With all requests, Riak will attempt to read or write to all replicas 
regardless of the specified r or w value. The r and w values affect how many 
reads from or writes to partitions must be completed before the operation is 
considered successful.

As a result, the get (read) and put (write) handlers outlive the client 
request. They will continue to wait for either all vnodes (replicas/partitions) 
to respond or for the 60 timeout to elapse. As such, network traffic after a 
large number of reads with r=1 shouldn't be surprising, the request handlers 
are continuing to await responses from vnodes who are working through their 
request queues.

On modest hardware, I've seen Riak clusters perform multiples of 500 ops per 
second. I'm curious, what are you using to perform your benchmark? Does it 
perform requests in parallel? Are requests being made to all nodes in the 
cluster or just one? To find your maximum throughput, you should experiment 
with various ratios of parallel request per node.

Hope that helps.  

--  
Ian Plosker 
Developer Advocate
Basho Technologies


On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote:

> Dear Riak-Users,
>  
> we consider to save a large amount (5000) of binary Data (Images) in
> a riak cluster. Each image has a size of 648 KB. We want to store 3
> copy's of each image.
>  
> In this case i need to store 5000 * 648 KB * 3 = 90.5 TB Data. This
> calculation didn't include any overhead for reorganisation and other stuff.
>  
> On the other hand is the network. I run some benchmarks on a 4 node
> cluster. Each with a 1 Gbps interface. In addition to the benchmarks
> I've made some calculations.
>  
> Some information for the benchmark:
> - I use the same interface for clustercommunication and benchmarking.
> - I use the riak http api interface
> - time curl -s
> HTTP://interface:8098/buckets/test-01/keys/[10001-2].jpg > /dev/null
>  
> In theory, a 1 Gbps interface provides 125 MB per second. In my
> calculation i only use 50 percent of the theoretically available
> bandwidth. This fit very well to my benchmarks.
>  
> I try a while with the '{"props":{"r":X}}'.
>  
> Calculation “r=2”
> available bandwidth = 62.5 MB per second / (3*648 KB) = 33 requests per
> second per node = 132 requests per second over the cluster.
>  
> Calculation “r=1”
> available bandwidth = 62.5 MB per second / (2*648 KB) = 50 requests per
> second per node = 200 requests per second over the cluster.
>  
> In this second case i see some strange effects in the network. My send
> and received queues grow verry fast. And after finishing the benchmark
> there is a while a lot of traffic between the riak nodes.
>  
> Does anyone have experience with these data sets and can give a few
> hints at a possible setup? The goal is to processed at least 500
> requests per second.
>  
> Some other points in my considerations are the time required for a
> reorganization after a new node are added to the cluster or a node has
> been replaced.
>  
> Many thanks for your reply and your attention.
>  
> Kind regards
> Sebastian
>  
>  
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>  
>  


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak sizing considerations

2012-01-25 Thread Ian Plosker

On Tuesday, January 24, 2012 at 5:21 AM, Sebastian Gerlach wrote:

> Some other points in my considerations are the time required for a
> reorganization after a new node are added to the cluster or a node has
> been replaced.
> 
> 

Sebastian,

Sorry I missed this part of your question.

The time required to rebalance a ring after adding or removing a node depends 
on a number of factors. These include but are not limited to: size of dataset, 
ring_creation_size, n_val, number of nodes, network throughput, I/O throughput, 
and load. Unfortunately, it's impossible to say with much accuracy how long a 
rebalance will take without accounting for these factors. If you want to get 
some idea how long it will take, you should perform a benchmark trying to mimic 
your hypothetical production environment as closely as possible. 

-- 
Ian Plosker 
Developer Advocate
Basho Technologies







___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: how does vnode_vclock work?

2012-01-25 Thread Ian Plosker
Adam, 

There's a basic explanation in our wiki: 
http://wiki.basho.com/Vector-Clocks.html#VNode-Vector-Clocks. 

Requests to a vnode are processed in serial, but for a given bucket-key pair 
there's no guarantee that the same vnode will coordinate separate requests. 
Thus, two writes to the same bucket-key pair can take place at the same time, 
resulting in siblings. The main advantage of vnode_vclocks is that it places a 
limit on the length of vclocks as the actors identified in the vclock are the 
coordinating vnodes rather than clients. Long vclocks are a problem for two 
reasons: they are space inefficent and they induce vclock pruning, a process 
wherein the number of entries in a vclock are reduced to save space. There are 
conditions where vclock pruning can result in sibling generation.

-- 
Ian Plosker 
Developer Advocate
Basho Technologies


On Tuesday, January 24, 2012 at 1:00 PM, Adam Schepis wrote:

> Hey all,
> 
> i'm having trouble articulating for colleagues why using vnode_vclock=true in 
> Riak 1.0 avoids the issues outlined in the Vector Clocks are Hard page. This 
> is probably a sign that i don't understand it well enough myself.  
> 
> My understanding is that requests to a vnode are processed in serial so that 
> the vector clock updates in a vnode are guaranteed to not create duplicates 
> for a particular object (causing the data loss.) If this is correct, does it 
> mean that write operations are done in serial, or that both read and writes 
> are processed in serial? Also is it still just using an incrementing number 
> or using something like a timestamp as the vector clock value for the client? 
> 
>  I would love to be able to get a good explanation for this so I can 
> understand it better myself and so that I can articulate it to colleagues. 
> 
> Thanks,
> Adam
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Fastest way to count number of items in Secondary Index query

2012-01-25 Thread Ian Plosker
Jeremy, 

Requests to 2i's HTTP interface will return keys rather the objects themselves. 
You may find better performance counting the returned keys rather than 
employing a M/R job. 

-- 
Ian Plosker 
Developer Advocate
Basho Technologies


On Monday, January 23, 2012 at 2:27 PM, Jeremy Raymond wrote:

> What is the fastest way to count the number of items in a secondary
> index range query? I'm currently feeding an index input directly into
> the riak_kv_mapreduce:reduce_count_inputs/2. Anything faster than
> doing this? Anyway to query an index without loading the objects
> themselves?
> 
> --
> Jeremy
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


RE: Fastest way to count number of items in Secondary Index query

2012-01-25 Thread Jeremy Raymond
I’m using the Erlang PB interface in my application. The Erlang client gets the 
keys via this Input/Query in riakc_pb_socket:get_index/4 at 
https://github.com/basho/riak-erlang-client/blob/master/src/riakc_pb_socket.erl#L677

 

Input = {index, Bucket, Index, Key},

IdentityQuery = [{reduce,

  {modfun, riak_kv_mapreduce, reduce_identity},

  [{reduce_phase_only_1, true}],

  true}],

 

My queries can return hundreds of thousands of keys. Transferring all the keys 
to the client and then doing a length/1 on them is slower than feeding the 
index query directly into riak_kv_mapreduce:reduce_count_inputs/2 and returning 
only the sum. I thought there might be a shortcut to avoid mapred altogether. 
Does the http interface access to some internals not available to a PB client 
that avoids this mapreduce OR is the HTTP interface doing the same thing as the 
Erlang client and just doing an index query fed into reduce_identity?

 

--

Jeremy

 

From: Ian Plosker [mailto:i...@basho.com] 
Sent: Wednesday, January 25, 2012 5:55 PM
To: Jeremy Raymond
Cc: riak-users@lists.basho.com
Subject: Re: Fastest way to count number of items in Secondary Index query

 

Jeremy, 

 

Requests to 2i's HTTP interface will return keys rather the objects themselves. 
You may find better performance counting the returned keys rather than 
employing a M/R job.

 

-- 

Ian Plosker 

Developer Advocate

Basho Technologies

 

On Monday, January 23, 2012 at 2:27 PM, Jeremy Raymond wrote:

What is the fastest way to count the number of items in a secondary

index range query? I'm currently feeding an index input directly into

the riak_kv_mapreduce:reduce_count_inputs/2. Anything faster than

doing this? Anyway to query an index without loading the objects

themselves?

 

--

Jeremy

 

___

riak-users mailing list

riak-users@lists.basho.com

http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for January 20 - 25

2012-01-25 Thread Mark Phillips
Morning, Afternoon, Evening To All -

Coming to you live from ScaleConf in the stunningly-beautiful Cape
Town, here's a short Recap from the last few days.

Enjoy.

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups
---

Riak Recap for January 20 - 25
===

1) We put out a press release a few days ago with details on how Auric
Systems is using Riak.

* Read here --->
http://www.marketwatch.com/story/auric-systems-selects-riak-to-meet-pci-compliance-requirements-for-archway-the-marketing-logistics-and-fulfillment-leader-to-the-fortune-1000-2012-01-24

2) This is less technical than most items on the Recap but I wanted to
pass it along anyways: If you happen to be driving up the 101 in the
Bay Area over the next month, you might see Basho and Voxer on a
billboard. :)

* Take a look --->
https://twitter.com/#!/argv0/status/162282422900232193/photo/1

3) Srdjan Pejic continues to be a great voice for Riak on StackOverflow.

* Q & A here -->
http://stackoverflow.com/questions/8989297/riak-search-giving-me-not-found-error-for-available-data
* Q & A here -->
http://stackoverflow.com/questions/8987643/backing-up-riak-data-when-changing-backends
* This one is still unanswered if anyone has a moment --->
http://stackoverflow.com/questions/9010259/securing-riak-on-ubuntu-using-shorewall-firewall

# Bugs

1) New

* 1327: bad argument in riak_core_stat:'-vnodeq_stats/0-lc$^0/1-0-'/1
- https://issues.basho.com/show_bug.cgi?id=1327

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com