High throughput with small updates

2013-09-28 Thread Jason Campbell
I have an application that pushes a large amount of small updates (usually 
below 1KB).  Instead of reading massive numbers of keys, they are aggregated 
into keys of roughly 1MB.  The problem is as these keys near the 1MB limit the 
throughput obviously drops both from a disk and network perspective.

So two questions.  First off, is there any downside to using the memory backend 
as a temporary data store?  Obviously if all nodes for vnode go down at the 
same time, there will be data loss, but it isn't really any worse than using an 
external buffer.  Is there anything else I should be aware of?

Secondly, is there a downside to abusing siblings?  If instead of reading and 
writing rather large keys, is there any issues with writing several hundred 
siblings and resolving them via a scheduled task?
  ___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Forcing Siblings to Occur

2013-11-12 Thread Jason Campbell
I am currently forcing siblings for time series data. The maximum bucket sizes are very predictable due to the nature of the data. I originally used the get/update/set cycle, but as I approach the end of the interval, reading and writing 1MB+ objects at a high frequency kills network bandwidth. So now, I append siblings, and I have a cron that merges the previous siblings (a simple set union works for me, only entire objects are ever deleted).I can see how it can be dangerous to insert siblings, bit if you have some other method of knowing how much data is in one, I don't see size being an issue. I have also considered using a counter to know how large an object is without fetching it, which shouldn't be off by more than a few siblings unless there is a network partition.So aside from size issues, which can be roughly predicted or worked around, is there any reason to not create hundreds or thousands of siblings and resolve them later? I realise sets could work well for my use case, but they seem overkill for simple append operations when I don't need delete functionality. Creating your own CRDTs are trivial if you never need to delete.Thoughts are welcome,Jason  From: John DailySent: Wednesday, 13 November 2013 3:10 AMTo: Olav FrengstadCc: riak-usersSubject: Re: Forcing Siblings to OccurForcing siblings other than for testing purposes is not typically a good idea; as you indicate, the object size can easily become a problem as all siblings will live inside the same Riak value.Your counter-example sounds a lot like a use case for server-side CRDTs; data structures that allow the application to add values without retrieving the server-side content first, and siblings are resolved by Riak.These will arrive with Riak 2.0; see https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.-JohnOn Nov 12, 2013, at 7:13 AM, Olav Frengstad  wrote:Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls.For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects).
It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:?A section of the docs[1] comees comes to mind:"Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors."
[1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
2013/11/9 Brian Roach 
On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown  wrote:

> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that.

for (int i = 0; i < numReplicasWanted; i++) {
    bucket.store("key", "value").withoutFetch().execute();
}

:)

- Roach

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: May allow_mult cause DoS?

2013-12-19 Thread Jason Campbell
Can anyone familiar with Riak internals describe how siblings are handled in Level DB? I think part of this issue is a misunderstanding of how that works and false expectations. As far as I understand, siblings are light at write time and resolved at read time.Meaning if I had an object with 100 siblings, it would result in 100 GETs from the Level DB backend when reading that object. Also an additional sibling would only require writing the new data to an additional object.Am I correct in this, or does Riak do some special merging in the backend on write to make reads more efficient? Also, how does it handle siblings since Level DB doesn't support them? Is there an intermediate object that stores links to all siblings?I think this information would allow people to better utilise siblings and understand why certain workloads can crash Riak or make it dreadfully slow.Thanks,Jason Campbell   From: Rune Skou LarsenSent: Thursday, 19 December 2013 7:10 AMTo: Viable Nisei; riak-users@lists.basho.comSubject: SV: May allow_mult cause DoS?Save the transaction list inside the customer object keyed by customerid. Index this object with 2i on storeids for each contained tx.If some customer objects grow too big, you can move old txs into archive objects keyed by customerid_seqno. For your low latency customer reads, you probably only need the newest txs anyway.That's just one idea. Trifork will be happy to help you find a suitable model for your use cases.We usually do this by stress-testing a simulation with realistic data sizes/shapes and access patterns. It's fastest if we come onsite for a couple of days and work with you to set it up, but we can also help you offsite.Write me if you're interested, then we can do a call.Rune Skou LarsenTrifork, Denmark- Reply message -Fra: "Viable Nisei" Til: "riak-users@lists.basho.com" Emne: May allow_mult cause DoS?Dato: ons., dec. 18, 2013 20:13-- Forwarded message --From: Viable Nisei >Date: Thu, Dec 19, 2013 at 2:11 AMSubject: Re: May allow_mult cause DoS?To: Russell Brown >Hi.Thank you for your descriptive and so informative answer very much.On Wed, Dec 18, 2013 at 3:29 PM, Russell Brown > wrote:Hi,Can you describe your use case a little? Maybe it would be easier for us to help.Yeah, let me describe some abstract case equivalent to our. Let we have CUSTOMER object, STORE object and TRANSACTION object, each TRANSACTION has one tribool attribute STATE={ACTIVE, COMPLETED, ROLLED_BACK}.We should be able to list all the TRANSACTIONs of given CUSTOMER, for example (so we should establish 1-many relation, this list should not be long, 10^2-10^3 records, but we should be able to obtain this list fast enough). Also we should be able to list all the TRANSACTIONs of given STATE made in given STORE (lists may be very long, up to 10^8 records), but these list may be computed with some latency. Predictable latency is surely preferred but is not show-stopper. So, that's all.Another pain is races and/or operations atomicity, but it's not so important at current time.On 18 Dec 2013, at 04:32, Viable Nisei > wrote:> On Wed, Dec 18, 2013 at 8:32 AM, Erik Søe Sørensen > wrote:> It really is not a good idea to use siblings to represent 1-to-many relations. That's not what it's intended for, nor what it's optimized for...> Ok, understood.>> Can you tell us exactly why you need Bitcask rather than LevelDB? 2i would probably do it.> 1) According to http://docs.basho.com/riak/latest/ops/running/backups/#LevelDB-Backups , it's real pain to implement backups with leveldb.> 2) According to http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/ , reads may be slower comparing to bitcask, it's critical for us>> Otherwise, storing a list of items under each key could be a solution, depending of course on the number of items per key. (But do perform conflict resolution.)> Why any conflict resolving is required? As far as I understood, with allow_mult=true riak should just collect all the values written to key without anything additional work? What design decision leads to exponential slowdown and crashes when multiple values allowed for any single key?.. So, what's the REAL purpose of allow_mult=true if it's bad idea to use it for unlimited values per single key?The real purpose of allow_mult=true is so that writes are never dropped. In the case where your 

Re: May allow_mult cause DoS?

2013-12-20 Thread Jason Campbell

- Original Message -

From: "Sean Cribbs"  
To: "Jason Campbell"  
Sent: Saturday, 21 December, 2013 3:17:57 AM 
Subject: Re: May allow_mult cause DoS? 





> No, the behavior in LevelDB is no different than the behavior of any of our 
> other backends, namely, all siblings occupy the same key. Every write 
> involves a fetch from the backend so that the existing value can be 
> superseded by or merged with the incoming write. 

I'm confused then, why is reading hundreds or thousands of siblings a problem 
if they are stored in the same object anyway? What is the difference between 
reading a large number of siblings and a merged object? 
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: May allow_mult cause DoS?

2013-12-20 Thread Jason Campbell

- Original Message -
From: "Andrew Stone" 
To: "Jason Campbell" 
Cc: "Sean Cribbs" , "riak-users" , 
"Viable Nisei" 
Sent: Saturday, 21 December, 2013 10:01:29 AM
Subject: Re: May allow_mult cause DoS?


> Think of an object with thousands of siblings. That's an object that has 1 
> copy of the data for each sibling. That object could be on the order of 100s 
> of megabytes. Everytime an object is read off disk and returned to the client 
> 100mb is being transferred. Furthermore leveldb must rewrite the entire 100mb 
> to disk everytime a new sibling is added. And it just got larger with that 
> write. If a merge occurs, the amount of data is a single copy of the data at 
> that key instead of what amounts to approximately 1 copies of the same 
> sized data, when all you care about is one of those 10,000. 

This makes sense for concurrent writes, but the use case that was being talked 
about was siblings with no parent object.  In that case, there shouldn't be 
much difference at all, since each sibling is just the data that was inserted.  
I understand the original use case being discussed was tens of millions of 
objects, and the metadata alone would likely exceed recommended object sizes in 
Riak.

I've mentioned my use case before, which is trying to get fast writes on large 
objects.  I abuse siblings to some extent, although by the nature of the data, 
there will never be more than a few thousand small siblings (under a hundred 
bytes).  I merge them on read and write the updated object back.  Even with 
sibling metadata, I doubt the bloated object is over a few MB, especially with 
snappy compression which handles duplicate content quite well.  Even if Riak 
merges the object on every write, it's still much faster than transferring the 
whole object over the network every time I want to do a small write.  Is there 
a more efficient way to do this?  I thought about writing single objects and 
using a custom index, but that results in a read and 2 writes, and the index 
could grow quite large compared to the amount of data I'm writing.

Thanks,
Jason

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: May allow_mult cause DoS?

2013-12-21 Thread Jason Campbell
- Original Message -
From: "Russell Brown" 
To: "Jason Campbell" 
Cc: "riak-users" , "Viable Nisei" 

Sent: Saturday, 21 December, 2013 8:25:01 PM
Subject: Re: May allow_mult cause DoS?

>Hi,

>What is a "sibling with no parent object”? I think I understand what you’re 
>getting at, when each sibling is some fragment of the whole, is that it?

Within Riak, siblings created with a supplied vclock, usually with no related 
data to any other sibling.

>This is similar, i suppose, to Riak 2.0 data types. We send an operation to 
>Riak, and apply that inside the database rather then fetching, mutating, 
>writing at the client. Think of adding to a Set, you just send the thing to be 
>added and Riak merges it for you. For your use case would a user defined merge 
>function in the database be a valuable feature? It would be every better if 
>Riak stored data differently (incrementally, append-only rather than 
>read-merge-write at the vnode.) These are things we’re going to be working on 
>soon (I hope!) I had no idea that people used siblings this way. It’s 
>interesting.

I actually wrote a reply that was a few pages, but it probably doesn't belong 
on this mailing list.  I'll write up a blog post and share it when I get some 
time about the problems I was trying to solve, and why I chose Riak despite not 
fitting the best.

I haven't tried Riak sets yet, it's on my to-do list.  I think it will replace 
siblings if it works the way I expect it to, but that needs to be validated.  
As far as a custom merge, I'm not sure I would want my Riak cluster doing that. 
 Personally, I would rather the memory and CPU time of my database be used to 
service requests and not mangle data that should be done in the application 
(once per object usually, not every write).  I'm sure some people would find it 
useful, but I wouldn't use it, even if it was there.

For things like logging and time-series data though, an append-only set would 
be helpful.  It would be easier to implement in Riak than the current CRDT set, 
require much less metadata, and fit certain use cases better.  Entire object 
keys can still be deleted when data expires, but data would never be updated or 
deleted with a single object.

Thanks,
Jason

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Performance Tuning in OmniOS

2014-01-21 Thread Jason Campbell
First off, make sure you are comparing apples to apples.  I am assuming the 
"default" RAID is RAIDZ on ZFS, so make sure the LVM raid is using RAID5 for 
comparable performance.

Generally, I wouldn't be using RAIDZ on AWS at all.  The primary issue with 
RAID5/RAIDZ is that the IO speeds are not only limited to that of a single EBS 
volume, but that of the slowest EBS volume.  That speed can vary with which 
host you get assigned to, time of day, contention from other customers, and 
seemingly the phase of the moon.  I don't think anyone would be surprised if 
you ran the same test again and got completely different results.  Provisioned 
IOPS disks will help even out extremely slow disks, but you can still get quite 
a bit of variance.

I would suggest moving to mirrored disks (RAID1) in both ZFS and Ubuntu.  I'm 
not sure about LVM, but ZFS will use the mirror to even out reads (writes are 
harder) which should fix some of the high latency, even on normal EBS volumes.  
I would suggest 4 disks in a RAID 10 configuration (striping 2 mirrored pairs). 
 Even better would be a 4-way mirror in ZFS if price isn't much of a concern.  
This will limit you to the capacity of a single EBS volume, but reads will use 
the fastest disk of the 4 disks, instead of the slowest.  It also has a nice 
side effect of being extremely fault tolerant.

The other thing to keep in mind is that ZFS is extremely RAM hungry and read 
performance drops considerably when it is RAM starved, so I would ensure that 
Riak doesn't use the last 1 - 1.5 GB of RAM so ZFS can use it for caches.

So in summary:
 - Test, test and test again on different instances, this will help even out 
EBS issues.
 - Use mirrors, not RAID5/RAIDZ
 - Use dedicated IOPS (at least for testing purposes)
 - Ensure ZFS has some RAM to play with

Hope this helps,
Jason Campbell

- Original Message -
From: "Jared Morrow" 
To: ejh...@gmail.com
Cc: "riak-users" 
Sent: Wednesday, 22 January, 2014 7:51:19 AM
Subject: Re: Performance Tuning in OmniOS



Oh I think OmniOS is far from hopeless. The problem you are having is the same 
problem you'd have if you were on ubuntu and you made a LVM raid on vanilla 
EBS. EBS is the problem when it comes to predictable write / read speed. People 
still use it, but not without careful thought and consideration. You can try 
using provisioned IOPS for EBS, which the m1.large supports, or ask in 
risk-users what other AWS users have setup. I know we have a lot of customers 
and OSS users running on AWS, so they are far more knowledgeable about 
real-world performance than I am. 


Good luck, 
Jared 








On Tue, Jan 21, 2014 at 12:05 PM, Hari John Kuriakose < ejh...@gmail.com > 
wrote: 




I am using the default raid itself. 

Well, if this is the case, I will run the tests again with a different setup as 
you said, and get back as soon as possible. I would just like to believe that 
OmniOS is not too hopeless. 

Thank you. 


On Jan 21, 2014 11:17 PM, "Jared Morrow" < ja...@basho.com > wrote: 



What type of RAID did you chose for your spool of 5 volumes? If you chose the 
default of raidz, you will not be getting much of a performance boost over 
vanilla EBS, just a big integrity boost. Also, unless you are using provisioned 
IOPS for EBS, you are starting from an extremely slow base-case, so adding ZFS 
on top might not help matters much. 


If speed is the concern, as a test I'm willing to bet if you do another test 
run against the two instance storage disks on that m1.large, you will probably 
beat those 5 EBS volumes pretty easily. 


-Jared 



On Tue, Jan 21, 2014 at 9:22 AM, Hari John Kuriakose < ejh...@gmail.com > 
wrote: 



Hello, 


I am using standard EBS devices, with a zpool in an instance comprising of five 
40GB volumes. 
Each of the Riak instance is of m1.large type. 


I have made the following changes in zfs properties: 



# My reason: the default sst block size for leveldb is 4k. 
zfs set recordsize=4k tank/riak 
# My reason: by default, leveldb verifies checksums automatically. 
zfs set checksum=off tank/riak 
zfs set atime=off tank/riak 
zfs set snapdir=visible tank/riak 


And I did the following with help from Basho AWS tuning docs: 


projadd -c "riak" -K "process.max-file-descriptor=(basic,65536,deny)" user.riak 

bash -c "echo 'set rlim_fd_max=65536' >> /etc/system" 


bash -c "echo 'set rlim_fd_cur=65536' >> /etc/system" 
ndd -set /dev/tcp tcp_conn_req_max_q0 4 


ndd -set /dev/tcp tcp_conn_req_max_q 4000 
ndd -set /dev/tcp tcp_tstamp_always 0 
ndd -set /dev/tcp tcp_sack_permitted 2 
ndd -set /dev/tcp tcp_wscale_always 1 
ndd -set /dev/tcp tcp_time_wait_interval 6 
ndd -set /dev/tcp tcp_keepalive_interval 12 
ndd -set /dev/tcp tcp_xmit_hiwat 2097152 
ndd -set /dev/tcp tcp_recv_hiwat 2097152 
ndd -set /dev/tcp 

Re: last_write_wins

2014-01-30 Thread Jason Campbell
I'm not sure Riak is the best fit for this.  Riak is great for applications 
where it is the source of data, and has very strong consistency when used in 
this way.  You are using it as a cache, where Riak will be significantly slower 
than other cache solutions.  Especially since you say that each worker will 
have a set of documents it is responsible for.  Something like a local memcache 
or redis would likely suit this use case just as well, but do it much faster 
with less overhead.

Riak will guarantee 3 writes to disk (by default), where something like 
memcache or redis will stay in memory, and if local, won't have network latency 
either.  In the worst case where a node goes offline, the real data can be 
pulled from the backend again, so it isn't a big deal.  It will also simplify 
your application, because node.js can always request from cache and not worry 
about the speed, instead of maintaining it's own cache layer.

I'm as happy as the next person on this list to see Riak being used for all 
sorts of uses, but I believe in the right tool for the right job.  Unless there 
is something I don't understand, Riak is probably the wrong tool.  It will 
work, but there is other software that will work much better.

I hope this helps,
Jason Campbell

- Original Message -
From: "Edgar Veiga" 
To: "Russell Brown" 
Cc: "riak-users" 
Sent: Friday, 31 January, 2014 3:20:42 AM
Subject: Re: last_write_wins



I'll try to explain this the best I can, although it's a simples architecture 
I'm not describing it in my native language :) 


I have a set of node.js workers (64 for now) that serve as a cache/middleware 
layer for a dozen of php applications. Each worker deals with a set of 
documents (it's not a distributed cache system). Each worker updates the 
documents in memory, and tags them as dirty (just like OS file cache), and from 
time to time (for now, it's a 5 seconds window interval), a persister module 
will deal with the persistence of those dirty documents to riak. 
If the document isn't in memory, it will be fetched from riak. 


If you want document X, you need to ask to the corresponding worker dealing 
with it. Two different workers, don't deal with the same document. 
That way we can guarantee that there will be no concurrent writes to riak. 


Best Regards, 







On 30 January 2014 10:46, Russell Brown < russell.br...@me.com > wrote: 







On 30 Jan 2014, at 10:37, Edgar Veiga < edgarmve...@gmail.com > wrote: 



Also, 


Using last_write_wins = true, do I need to always send the vclock while on a 
PUT request? In the official documention it says that riak will look only at 
the timestamp of the requests. 


Ok, from what you’ve said it sounds like you are always wanting to replace what 
is at a key with the new information you are putting. If that is the case, then 
you have the perfect use case for LWW=true. And indeed, you do not need to pass 
a vclock with your put request. And it sounds like there is no need for you to 
fetch-before-put since that is only to get context /resolve siblings. Curious 
about your use case if you can share more. 


Cheers 


Russell 










Best regards, 



On 29 January 2014 10:29, Edgar Veiga < edgarmve...@gmail.com > wrote: 



Hi Russel, 


No, it doesn't depend. It's always a new value. 


Best regards 





On 29 January 2014 10:10, Russell Brown < russell.br...@me.com > wrote: 







On 29 Jan 2014, at 09:57, Edgar Veiga < edgarmve...@gmail.com > wrote: 



tl;dr 


If I guarantee that the same key is only written with a 5 second interval, is 
last_write_wins=true profitable? 

It depends. Does the value you write depend in anyway on the value you read, or 
is it always that you are just getting a totally new value that replaces what 
is in Riak (regardless what is in Riak)? 








On 27 January 2014 23:25, Edgar Veiga < edgarmve...@gmail.com > wrote: 



Hi there everyone! 


I would like to know, if my current application is a good use case to set 
last_write_wins to true. 


Basically I have a cluster of node.js workers reading and writing to riak. Each 
node.js worker is responsible for a set of keys, so I can guarantee some kind 
of non distributed cache... 
The real deal here is that the writing operation is not run evertime an object 
is changed but each 5 seconds in a "batch insertion/update" style. This brings 
the guarantee that the same object cannot be write to riak at the same time, 
not event at the same seconds, there's always a 5 second window between each 
insertion/update. 


That said, is it profitable to me if I set last_write_wins to true? I've been 
facing some massive writting delays under high loads and it would be nice if I 
have some kind of way to tune riak. 


Thanks a lot and keep up the good work! 


___

Re: last_write_wins

2014-01-30 Thread Jason Campbell
Oh, I completely misunderstood, I'm sorry for that.  I was thinking of your 
application as a typical web application which could regenerate the data at any 
time (making that the authoritative source, not Riak).

In that case, Riak does sound perfect, but I would definitely not use the 
memory backend if that is the only copy of the data.

Eric, I'm sorry if I made is sound like Riak is a poor cache in all situations, 
I just didn't think it fit here (although I clearly misunderstood).  There is a 
tradeoff between speed and consistency/reliability, and the whole application 
has to take advantage of the extra consistency and reliability for it to make 
sense.

Sorry again,
Jason Campbell

- Original Message -
From: "Edgar Veiga" 
To: "Eric Redmond" 
Cc: "Jason Campbell" , "riak-users" 
, "Russell Brown" 
Sent: Friday, 31 January, 2014 9:54:33 AM
Subject: Re: last_write_wins


Hi! 


I think that you are making some kind of confusion here... I'm not using riak 
for cache purposes, thats exactly the opposite! Riak is my end persistence 
system, I need to store the documents in a strong, secure, available and 
consistent place. That's riak. 


It's like I've said before, just make an analogy with the linux file cache 
system. Node.js workers simulate that in-memory cache, php applications write 
and read from them and when something is dirty, it's persisted to riak... 


Best regards 







On 30 January 2014 22:26, Eric Redmond < eredm...@basho.com > wrote: 




Actually people use Riak as a distributed cache all the time. In fact, many 
customers use it exclusively as a cache system. Not all backends write to disk. 
Riak supports a main memory backend[1], complete with size limits and TTL. 


Eric 


[1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ 






On Jan 30, 2014, at 1:48 PM, Jason Campbell < xia...@xiaclo.net > wrote: 


I'm not sure Riak is the best fit for this. Riak is great for applications 
where it is the source of data, and has very strong consistency when used in 
this way. You are using it as a cache, where Riak will be significantly slower 
than other cache solutions. Especially since you say that each worker will have 
a set of documents it is responsible for. Something like a local memcache or 
redis would likely suit this use case just as well, but do it much faster with 
less overhead. 

Riak will guarantee 3 writes to disk (by default), where something like 
memcache or redis will stay in memory, and if local, won't have network latency 
either. In the worst case where a node goes offline, the real data can be 
pulled from the backend again, so it isn't a big deal. It will also simplify 
your application, because node.js can always request from cache and not worry 
about the speed, instead of maintaining it's own cache layer. 

I'm as happy as the next person on this list to see Riak being used for all 
sorts of uses, but I believe in the right tool for the right job. Unless there 
is something I don't understand, Riak is probably the wrong tool. It will work, 
but there is other software that will work much better. 

I hope this helps, 
Jason Campbell 

- Original Message - 
From: "Edgar Veiga" < edgarmve...@gmail.com > 
To: "Russell Brown" < russell.br...@me.com > 
Cc: "riak-users" < riak-users@lists.basho.com > 
Sent: Friday, 31 January, 2014 3:20:42 AM 
Subject: Re: last_write_wins 



I'll try to explain this the best I can, although it's a simples architecture 
I'm not describing it in my native language :) 


I have a set of node.js workers (64 for now) that serve as a cache/middleware 
layer for a dozen of php applications. Each worker deals with a set of 
documents (it's not a distributed cache system). Each worker updates the 
documents in memory, and tags them as dirty (just like OS file cache), and from 
time to time (for now, it's a 5 seconds window interval), a persister module 
will deal with the persistence of those dirty documents to riak. 
If the document isn't in memory, it will be fetched from riak. 


If you want document X, you need to ask to the corresponding worker dealing 
with it. Two different workers, don't deal with the same document. 
That way we can guarantee that there will be no concurrent writes to riak. 


Best Regards, 







On 30 January 2014 10:46, Russell Brown < russell.br...@me.com > wrote: 







On 30 Jan 2014, at 10:37, Edgar Veiga < edgarmve...@gmail.com > wrote: 



Also, 


Using last_write_wins = true, do I need to always send the vclock while on a 
PUT request? In the official documention it says that riak will look only at 
the timestamp of the requests. 


Ok, from what you’ve said it sounds like you are always wanting to replace what 
is at a key with the new inform

Re: Max/Min Integer CRDTs?

2014-02-08 Thread Jason Campbell
My understanding of what Elias wanted was a counter that simply stored the 
minimum and maximum values it has ever reached, an optional reset would 
probably be nice as well.  It would be quite helpful when dealing with 
statistics counters that can decrement.

Then again, I could be wrong.

- Original Message -
From: "Russell Brown" 
To: "Elias Levy" 
Cc: "riak-users" 
Sent: Sunday, 9 February, 2014 9:53:42 AM
Subject: Re: Max/Min Integer CRDTs?

Hi Elias,

This is a great time for you to ask, if you’re asking what I think you’re 
asking.

On 8 Feb 2014, at 22:35, Elias Levy  wrote:

> Does Basho have any plans for implementing a CRDT that maintains the minimum 
> or maximum value for an integer?  It would come in handy in our application 
> and it would be very simple to implement.

Do you mean some kind of bounded counter that cannot be incremented beyond (say 
1000), or decremented below a certain bound (i.e. non-negative counter?) If so, 
then the plan is yes, but I’m not sure it is simple. If you have a design for 
such a thing please share it. We’re working with a team from Universidade Nova 
de Lisboa as part of the SyncFree project on this, but I’d love to hear your 
ideas for an implementation. If you want to keep it private feel free to email 
me off list.

Cheers

Russell

> 
> Elias Levy 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak as log aggregator

2014-04-23 Thread Jason Campbell
Hi Bryce,

I have code that does something similar to this, and it works well.

In my case, the value is a JSON array, with a JSON object per event.

Siblings are easily resolved by merging the two arrays.

In Riak 2.0, sets using JSON-encoded strings would probably do this
automatically and more cleanly this manually resolving siblings.

I like sorted JSON, but any data format that produces identical strings
would work.  If there is a chance of duplicate submissions into Riak,
you need to ensure the data format always produces identical output to
allow Riak to recognize and eliminate duplicates.

The other thing I would worry about is how long-lived your sessions are
and how many events can they generate.  Riak starts having performance
issues over a few MB and you should probably consider another data
model at that point (maybe storing references instead of the data
itself).

Good luck with your project,
Jason

- Original Message -
From: "Bryce" 
To: "riak-users" 
Sent: Thursday, 24 April, 2014 1:22:12 PM
Subject: Riak as log aggregator

Hi All,

I'm interested in using Riak for a log aggregation project. These are 
basically apache logs that I would like to correlate together based on 
their session ID's. These session ID's would make sense as the key, but 
its the "value" part of this that confuses me. There will be multiple 
lines within these logs that have the same session ID, thus I will be 
creating siblings. Now, is there a CRDT that will allow me to combine 
all of these siblings into a single value or will I need to write my own 
solution to do so? Any and all pointers are welcomed. Also, if Riak is a 
bad fit for this, please let me know.

Warm regard,
Bryce


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak for streaming video

2014-06-26 Thread Jason Campbell
Riak CS is designed to be the same interface as Amazon S3.  It can be run 
anywhere you can run Riak.

Think of it as a way to run your own S3.

I think the hard limit on Riak objects is 50MB, but the recommended size is 
less than 1MB per object.  The main reason for that is latency.  It's faster to 
get 100x1MB chunks from the entire cluster than it is to get 1x100MB chunk from 
a single server.  It's also easier for disks to pull a 1MB object without 
delaying any other requests than pulling a 100MB object, which may end up 
queuing other requests made after.

There used to be a project called luwak that was built into Riak and handled 
chunking files like this, but it has been EOL'd.  Perhaps someone can shed some 
light on this.  Is RiakCS the final solution for binary storage?  Or is there 
another way to store larger objects in Riak?

RiakCS is wonderful, but it does seem overkill for some uses where you don't 
need to expose an S3 interface, don't need the S3 permission structure, and 
really just want file storage.

- Original Message -
From: "Alex De la rosa" 
To: "Hector Castro" 
Cc: "riak-users" 
Sent: Friday, 27 June, 2014 6:55:13 AM
Subject: Re: Riak for streaming video



Hi Hector, 


Thanks for your response :) highly appreciated. I have more questions though. 


Riak CS has to be implemented in Amazon S3? Or can I house it myself with own 
managed servers? 


What is the limit for an object in Standard Riak? Large objects should go to 
Riak CS, but what would be a limit for the Standard Riak? I may want to save 
pictures, for example, in binary data... not a RAW version, but like processed 
photo uploads from users... I think Standard Riak can hold that with no 
problem, but it would be good to know the size limit for a key's content. 


Thanks, 
Alex 



On Thu, Jun 26, 2014 at 10:45 PM, Hector Castro < hec...@basho.com > wrote: 


Hey Alex, 

My response are inline below. 

-- 
Hector 



On Thu, Jun 26, 2014 at 4:26 PM, Alex De la rosa 
< alex.rosa@gmail.com > wrote: 
> Hi Hector, 
> 
> I see, I always thought that Riak CS was the same as Riak just that you run 
> it in Amazon S3... they are actually different... 

Correct, but it is important to note that Riak is still at the core of Riak CS. 


> So... if I have a social network and one of the features is that they can 
> share video, I would use a normal Riak cluster for the webapp and data and a 
> Riak CS cluster for the video storage/streaming... am I right? 

That seems like a reasonable conclusion. 

Riak is fundamentally a distributed key/value store for low latency 
access to smaller pieces of data. Riak CS is an S3/Swift 
API-compatible object storage platform with a little higher request 
latency, but also the added ability to house very large objects 
(backups, raw images, video). 


> By the way, couldn't this "range header" be implemented in standard Riak? 
> might be a good thing to have so you don't need 2 clusters for this matter. 

It could some day. Right now, separating the clusters is best because 
tuning a single cluster for both use cases would be difficult. 



> Thanks, 
> Alex 
> 
> 
> On Thu, Jun 26, 2014 at 9:28 PM, Hector Castro < hec...@basho.com > wrote: 
>> 
>> Hi Alex, 
>> 
>> For this type of problem, you may want to look into Riak CS. [0] It is 
>> an object storage platform built on top of Riak. 
>> 
>> When you GET an object from Riak CS, you can specify a Range header to 
>> get the object by its byte offset. [1] 
>> 
>> -- 
>> Hector 
>> 
>> [0] http://docs.basho.com/riakcs/latest/ 
>> [1] 
>> http://docs.basho.com/riakcs/latest/references/apis/storage/s3/RiakCS-GET-Object/#Examples
>>  
>> 
>> On Thu, Jun 26, 2014 at 3:22 PM, Alex De la rosa 
>> < alex.rosa@gmail.com > wrote: 
>> > Hi there, 
>> > 
>> > Imagine that I want to use Riak for a video service like Youtube and I 
>> > save 
>> > the video file in Riak... is there a way to stream the contents of 1 
>> > key? I 
>> > know we can stream keys... but what about its content? 
>> > 
>> > It would be pretty troublesome to have to wait for the full data to be 
>> > downloaded $bucket->get('myvideo') to be able to serve it... as they can 
>> > be 
>> > pretty big files. 
>> > 
>> > Is there any recommendation to store/get big files like videos? or it 
>> > would 
>> > be better to use some other system than Riak for the job? 
>> > 
>> > Thanks, 
>> > Alex 
>> > 
>> > ___ 
>> > riak-users mailing list 
>> > riak-users@lists.basho.com 
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
>> > 
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Australia data center

2014-07-30 Thread Jason Campbell
Hi Guido

For Sydney hosting, I would recommend either Equinix or Global Switch.  There 
is also another NextDC datacenter in Melbourne.

For larger installations, deal with the datacenters themselves.  If you are 
looking for smaller installations, Servers Australia resells partial rack colo 
space.  https://www.serversaustralia.com.au/colocation-servers-australia.php

I haven't personally used Servers Australia, so I can't vouch for them, but I 
have dealt with both Equinix and Global Switch, and the datacenters themselves 
are very impressive.

Hope this helps,
Jason

- Original Message -
From: "Dave Finster" 
To: "Guido Medina" 
Cc: "riak-users" 
Sent: Thursday, 31 July, 2014 7:01:10 AM
Subject: Re: Australia data center

Hi Guido

How much are you looking to co-locate? Typically the best combo of service 
quality and value are in Sydney as there are more DCs down there (I'm in 
Brisbane) and most providers have their cores there as well. 

We co-Lo a full 42 RU rack in Brisbane for around $2400/month AUD but NextDC 
(in any of their facilities around the country) are closer to $1200/month with 
15A power (if I remember correctly) and you can buy more as you grow. 

NextDC tends to have tons of connectivity options to transit providers. Let me 
know if you'd like more info


Thanks
Dave


> On 31 Jul 2014, at 1:38 am, Guido Medina  wrote:
> 
> Hi,
> 
> Does anyone know and can recommend a good -with price/value in mind- data 
> center in Australia for hosting a Riak cluster? Our main data center is in 
> Germany -http://www.hetzner.de- which is great, stable, fast and cheap but 
> haven't had any luck finding something similar, and it has to be in Australia 
> for data protection laws.
> 
> Best regards,
> 
> Guido.
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Using UUID as keys is problematic for Riak Search

2014-08-10 Thread Jason Campbell
I like UUIDs for everything as well, although I expected compatibility issues with something. Base 64 encoding the binary value is a nice compromise for me, and takes 22 characters (if you drop the padding) instead of the usual 36 for the hyphenated hex format.It would still require re encoding all the keys, but it's a partial solutions.From: Eric RedmondSent: Monday, 11 August 2014 9:15 AMTo: David JamesCc: riak-usersSubject: Re: Using UUID as keys is problematic for Riak SearchYou're correct that yokozuna only supports utf8, because the Solr interface only supports utf8 (note that the failure happens when attempting to build a non-utf8 JSON add document command). There's not much we can do here at the moment, since we've yet to (if ever) support a custom interface to Solr that accepts arbitrary binary values. In the mean time, to use yokozuna, you'll have to encode your keys to utf8.
Eric Redmond, Engineer @ Basho
On Sun, Aug 10, 2014 at 4:01 PM, David James  wrote:I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. (I'd rather spend 16 bytes for each key, not 36.)As I understand it, Yokozuna maps the Riak key to _yz_id.

Here is the suggested schema from the documentation: 



Would you expect this to work with Riak Search? I would hope so.(Or must keys be UTF-8 strings?)I get this error, which does not surprise me, given that the _yz_id is defined as a string:









==> log/error.log <==
2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index object {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} with error {ucs,{bad_utf8_character_code}} because [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}]

I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is a good idea.What can I do?Thanks,David




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bitcask Key Listing

2014-08-19 Thread Jason Campbell
I currently maintain my own indexes for some things, and use natural keys where 
I can, but a question has been nagging me lately.

Why is key listing slow?  Specifically, why is bitcask key listing slow?

One of the biggest issues with bitcask is all keys (including the bucket name 
and some overhead) must fit into RAM.  For large amounts of keys, I understand 
the coordination data transfer will hurt, but shouldn't things like list 
buckets (or listing keys from small buckets) be fast?

Is there a reason this is slow, and is there a plan to fix it?

Thanks,
Jason

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Bitcask Key Listing

2014-08-19 Thread Jason Campbell
Hello Kelly,

Thanks for the detailed response, but I'm still a bit confused.  I understand 
the overhead over the covering set, and the key listing will only complete as 
fast as the slowest node.  The data transfer shouldn't be much for a small key 
set though.  So assuming we aren't doing something crazy like trying to stream 
billions of keys, it should be roughly the same speed as getting those keys 
from 1/3 of the nodes.

As far as bitcask being unordered, I'm not sure how that is possible.  I 
understand it's on-disk format is log structured, which although ordered by 
last-modified time, is fairly useless for most purposes.  But in-memory it has 
to be able to satisfy requests for a random key name in fairly fixed time to 
meet the latency guarantees that are expected of it.  I would expect it to be 
some kind of search tree, in which case, it could be used to filter partial 
ranges (like a bucket lookup) just as easily as a full key lookup.  It could 
also be a hash of the key, at which point it would be useless for partial 
ranges, but then so would the bitcask memory calculator, as it would be a fixed 
size per record, not variable with the length of the bucket and key.

I don't know Erlang well enough to jump into the bitcask source, and the papers 
I have found explain the keydir and data structures well enough, but not how a 
request is able to look up a key in memory.  Is it using a b-tree?

Thanks,
Jason

- Original Message -
From: "Kelly McLaughlin" 
To: "Jason Campbell" , "riak-users" 

Sent: Wednesday, 20 August, 2014 1:26:36 AM
Subject: Re: Bitcask Key Listing

Jason,

There are two aspects to to a key listing operation that make it 
expensive relative to normal gets or puts.

The first part is that, due to the way data is distributed in Riak, key 
listing requires a covering set of vnodes to participate in
order to determine the list of keys for a bucket. A minimal covering set 
of vnodes works out to 1/N nodes in the cluster where N
is the n_val of the bucket. By default this is 3 so in the default case 
a key listing request must send a request to and receive
responses from 1/3 of the nodes in the cluster. This incurs network 
traversal overhead as the keys from each vnode are returned
and the speed to completion is limited by the slowest vnode in the 
covering set. This is true regardless of the backend in use.

The second part is specific to bitcask. Bitcask is an unordered backend 
and the consequence of this when doing a key listing is
that all of the keys stored by a vnode that participates in a key 
listing request must be scanned. It doesn't matter if there are
2 keys or 2000 keys for the bucket being queried, they all must be 
scanned. This is a case where all the keys being stored in memory
is beneficial to performance, but as the amount of data stored increases 
so does the expense to scan over it. The leveldb backend is
ordered and we are able to take advantage of that fact to only scan over 
data for the bucket in question, but for bitcask that is
not an option.

At this time there is nothing in the works to specifically improve key 
listing performance. It is certainly something we are aware of,
but at this time there are other things with higher priority.

Hope that helps answer your question.


Kelly


On 08/19/2014 05:17 AM, Jaston Campbell wrote:
> I currently maintain my own indexes for some things, and use natural keys 
> where I can, but a question has been nagging me lately.
>
> Why is key listing slow?  Specifically, why is bitcask key listing slow?
>
> One of the biggest issues with bitcask is all keys (including the bucket name 
> and some overhead) must fit into RAM.  For large amounts of keys, I 
> understand the coordination data transfer will hurt, but shouldn't things 
> like list buckets (or listing keys from small buckets) be fast?
>
> Is there a reason this is slow, and is there a plan to fix it?
>
> Thanks,
> Jason
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Do developers need to sanitize JSON input before sending to Riak Client?

2014-12-12 Thread Jason Campbell
Riak is binary storage, if there is an injection attack, it will be done 
against the Ruby portion, not against Riak.

The example you listed gives a ParserError when trying to load it via JSON, so 
I don’t understand how it could even get far enough in your code to reach Riak, 
unless you are not validating user input at all and just storing it in Riak.  
There is nothing wrong with that, but then anything coming out of Riak should 
be considered untrusted and handled accordingly.

A typical workflow would be something like this:
  1. Load user input (a string) as JSON.  This will provide a Ruby object 
representing the JSON
  2. Validate / business logic / whatever you want
  3. Save the Ruby object (or a changed one) in Riak via the client
  4. The Ruby client will convert it to JSON and store the JSON in Riak
  5. On the next request, the Ruby client will load the JSON from Riak and 
convert it back to a Ruby object

So an “injection” object will either fail 1 (like this one) because it’s 
invalid and doesn’t parse or it will fail 2 because you aren’t letting users 
set the admin flag without permission (right?).  I’m not sure how to get to 
step 3 without a seriously flawed JSON library or not checking obvious things.

Regardless though, the injection attack isn’t hidden at any point.  If it 
parses, it will be a Ruby object with all the properties in plain sight, right 
up until you save it into Riak.

If I misunderstood, I’m sorry.  If so, can you clarify what assumption was 
wrong?

Hopefully this helps a bit,
Jason

> On 12 Dec 2014, at 22:30, J. Austin Hughey  wrote:
> 
>  
> Hi,
> 
> I’m fairly new to Riak, and I haven’t seen a clear answer to this question 
> through Google, so I’m going to ask the list here.  
> 
> Do I need to sanitize user input BEFORE sending to the Riak client, or does 
> the client handle sanitization? (I plan to use the Ruby client, but I may 
> want to use a different language client later on too depending on the 
> project.)
> 
> I can see the potential for injection here being a real issue:  
> 
> {  
> “username”:”bob”,
> “password_hash”:”abc123xyz\””,”admin”:”true"
> }
> 
> …or something like that. You get the idea. I’m just trying to get it right 
> before I start building something that’s just begging to get owned.
> 
> Thanks.
> 
> --  
> J. Austin Hughey
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Do developers need to sanitize JSON input before sending to Riak Client?

2014-12-12 Thread Jason Campbell
Exactly, SQL injection happens because people construct SQL statements 
themselves and aren’t aware of correct escaping edge cases.  Use a library to 
convert between Ruby and JSON, and you’ll be fine on that front.  The same 
applies to SQL really.  SQL injection is impossible if you pass Ruby objects 
directly into an SQL library and let it escape them.

Security as a topic is much too large to cover in a mailing list, but the same 
principle applies.  Use proven libraries, and don’t try to do too much security 
yourself.  It’s hard, and you will most likely get it wrong.  Especially crypto.

Specific to Riak though, ensure you lock it down.  It applies to any database 
really, but if you can, remove any direct internet access from the boxes, 
firewall every port you can.  If you are using protocol buffers, you can 
disable HTTP access as well, it makes it a bit harder for an attacker.  Run 
Riak on a non-standard port, firewall communication between the Riak nodes.  
Riak can also use a TLS certificate to encrypt handoff traffic which can 
prevent data interception between Riak nodes.

That is by no means a complete list, but a starting point at least.  Security 
is a mindset, remove anything you don’t need, because the more tools you give 
an attacker, the worse things get, even if you think they are harmless.

Hope that helps,
Jason

> On 13 Dec 2014, at 02:28, Kyle Marek-Spartz  
> wrote:
> 
> Ruby should "do the right thing" and escape your content appropriately,
> that is, if you have a Ruby hash of:
> 
> {'test' => 'yes", "injection": "oops!"'}
> 
> JSON.dump will give you:
> 
> {"test":"yes\", \"injection\": \"oops!\""}
> 
> 
> J. Austin Hughey writes:
> 
>> Thank you very much for the explanation, Jason. My example attack was 
>> probably a bit “off”, but the basic use case I’m thinking of is an API. Say 
>> I have an API that accepts JSON. The user has “spoofed” the JSON input to 
>> the API, possibly setting an “admin” flag or something (doesn’t matter what, 
>> specifically, just an example of how a property may be set). Assuming no 
>> other business logic validation (which of course I DEFINITELY plan to 
>> implement), I was simply curious if, when sending info directly to Riak as 
>> JSON, if the client did anything to sanitize input from injection.
>> 
>> Think SQL injection, but JSON instead. Data is data, representation 
>> implementation differs, but the same concerns apply. I’m just trying to see 
>> what I need to do to prevent that injection-style attack from happening.
>> 
>> Sounds like, based on your response (thanks again, by the way!), I need to 
>> validate the Ruby object after it’s created and ensure no invalid attributes 
>> or values exist, prior to its being sent to Riak.
>> 
>> Any additional thoughts on other security concerns I should have here? I’m 
>> new to “NoSQL” and I like Riak due to its operational model, so I’d like to 
>> use it more often. I just need to know what I’m doing before I put an app 
>> out into the wild!
>> 
>> Thanks again.
> 
> --
> Kyle Marek-Spartz


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data modelling questions

2015-02-21 Thread Jason Campbell
I have the same questions as Christopher.

Does this data need to change, or is it write-once?
What information do you have when querying?
 - Will you already have timestamp and msg-id?
 - If not, you may want to consider aggregating everything into a single key.  
This is easier of the data isn’t changing.
What data will you typically be querying?
 - Will you typically be looking for a single element of data, or aggregates 
(graphing or mapping for example)?
 - If aggregates, what fields are you aggregating on (timestamp, geo, location, 
etc) and which will be fixed?

The aggregate question may need a little more explanation, so I will use an 
example.

I have been working on time-series data with my key being: 
::
Node-id and metric-id are fixed, they will never be merged in an aggregate way, 
and I have them before querying.
Timestamp is my aggregate value, I may need a single timestamp, or hundreds of 
thousands of timestamps (to draw a graph).  For this reason, I grouped my 
metrics by 5 minute block instead of one key per timestamp.  I also created 
aggregates with relevant averages and such for 1 hour, 1 day and 1 month to 
reduce the amount of key lookups for large graphs.

So it depends what visualisations you want.  If you are going to be mapping the 
most recent data based on the geo or location, I would include aggregates for 
that.  If you are more interested in timestamp, group by that.  Because Riak 
doesn’t have multi-key consistency though, also choose an canonical source of 
data.  If you store the same data in multiple keys, they will diverge at some 
point.  Decide now which is the real source, and which are derived, it will 
make your life easier when fixing data later.

Also keep in mind typical periods and data size.  There was no point for me to 
create a 1 minute increment since the 5 minute data was an acceptable size.  
Sure it’s a waste to transmit 4 minutes of data I don’t need, but it’s measured 
in milliseconds (mainly unserialising JSON in my app), so it doesn’t matter to 
me and makes larger aggregates much more performant.

> On 22 Feb 2015, at 03:44, Christopher Meiklejohn  
> wrote:
> 
> 
>> On Feb 20, 2015, at 5:35 PM, AM  wrote:
>> 
>> Hi All.
>> 
>> I am currently looking at using Riak as a data store for time series data. 
>> Currently we get about 1.5T of data in JSON format that I intend to persist 
>> in Riak. I am having some difficulty figuring out how to model it such that 
>> I can fulfill the use cases I have been handed.
>> 
>> The data is provided in several types of log formats with some common fields:
>> 
>> - timestamp
>> - geo
>> - s/w build #
>> - location #
>> 
>> -  whole bunch of other key value pairs.
>> 
>> For the most part I will need to provide aggregated views based on geo. 
>> There are some views based on s/w build # and location #. The aggregation 
>> will be on an hourly basis.
>> 
>> The model that I came up with:
>> 
>> [][-]: 
> 
> Hi AM, 
> 
> Additionally, it would be great if you could provide additional information 
> on how you plan on querying both the original and aggregated values.  
> Querying is usually the most difficult part to get right in Riak, and your 
> query pattern will be very important in establishing the best way to lay out 
> this data on disk.
> 
> - Chris
> 
> Christopher Meiklejohn
> Senior Software Engineer
> Basho Technologies, Inc.
> cmeiklej...@basho.com
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data modelling questions

2015-02-22 Thread Jason Campbell
Coming at this from another angle, if you already have a permanent data store, 
and you are only reporting on each hour at a time, can you run the reports 
based on the log itself?

A lot of Riak’s advantage comes from the stability and availability of data 
storage, but S3 is already doing that for you.  Riak can store the data, but 
I’m not sure what benefit it serves from my understanding of your problem.

Aggregates are usually quite small (even with more advanced things like 
histograms), so it’s relatively easy to parse a log line-by-line and produce 
aggregates in-memory for a report.

Can you give a bit more detail on why are you using Riak?

Hope this helps,
Jason

> On 23 Feb 2015, at 13:03, AM  wrote:
> 
> Hi Jason, Christopher.
> 
> This is supposed to be an append-only time-limited data. I only intend to 
> save about 2 weeks worth of data (which is yet another thing I need to figure 
> out, ie how to vacate older data).
> 
> Re: querying, for the most part the system will be building out hourly 
> reports based on geo, build and location information so I need to have a 
> model that allows me to aggregate by timestamp + [each-of-geo-build-location] 
> or just do it on the fly during ingestion.
> 
> Ingestion is yet another thing where I have some flexibility as it is a batch 
> job, ie log files get dropped on S3 and we get notified (usually on an hourly 
> basis, some logs on a 10-min basis) so I can massage it further but I am 
> concerned that every place where I buffer is another opportunity for losing 
> data and I would like to avoid reprocessing as much as possible.
> 
> Messages will already have the timestamp and msg-id and I will mostly be 
> interested in aggregates. In some very rare cases I expect to be able to 
> simply run map-reduce jobs for custom queries.
> 
> Given that, does my current model look reasonable?
> 
> Thanks.
> AM
> 
> 
> On 2/21/15 6:40 PM, Jason Campbell wrote:
>> I have the same questions as Christopher.
>> 
>> Does this data need to change, or is it write-once?
>> What information do you have when querying?
>>  - Will you already have timestamp and msg-id?
>>  - If not, you may want to consider aggregating everything into a single 
>> key.  This is easier of the data isn’t changing.
>> What data will you typically be querying?
>>  - Will you typically be looking for a single element of data, or aggregates 
>> (graphing or mapping for example)?
>>  - If aggregates, what fields are you aggregating on (timestamp, geo, 
>> location, etc) and which will be fixed?
>> 
>> The aggregate question may need a little more explanation, so I will use an 
>> example.
>> 
>> I have been working on time-series data with my key being: 
>> ::
>> Node-id and metric-id are fixed, they will never be merged in an aggregate 
>> way, and I have them before querying.
>> Timestamp is my aggregate value, I may need a single timestamp, or hundreds 
>> of thousands of timestamps (to draw a graph).  For this reason, I grouped my 
>> metrics by 5 minute block instead of one key per timestamp.  I also created 
>> aggregates with relevant averages and such for 1 hour, 1 day and 1 month to 
>> reduce the amount of key lookups for large graphs.
>> 
>> So it depends what visualisations you want.  If you are going to be mapping 
>> the most recent data based on the geo or location, I would include 
>> aggregates for that.  If you are more interested in timestamp, group by 
>> that.  Because Riak doesn’t have multi-key consistency though, also choose 
>> an canonical source of data.  If you store the same data in multiple keys, 
>> they will diverge at some point.  Decide now which is the real source, and 
>> which are derived, it will make your life easier when fixing data later.
>> 
>> Also keep in mind typical periods and data size.  There was no point for me 
>> to create a 1 minute increment since the 5 minute data was an acceptable 
>> size.  Sure it’s a waste to transmit 4 minutes of data I don’t need, but 
>> it’s measured in milliseconds (mainly unserialising JSON in my app), so it 
>> doesn’t matter to me and makes larger aggregates much more performant.
>> 
>>> On 22 Feb 2015, at 03:44, Christopher Meiklejohn  
>>> wrote:
>>> 
>>> 
>>>> On Feb 20, 2015, at 5:35 PM, AM  wrote:
>>>> 
>>>> Hi All.
>>>> 
>>>> I am currently looking at using Riak as a data store for time series data. 
>>>> Currently we get about 1.5T of data in JSON format that I intend to 
>>>> persist in Riak. I am having some difficulty figuring out how to model it 
>>>

Re: Data modelling questions

2015-02-23 Thread Jason Campbell
Thanks for the info.

The model looks reasonable, but something I would worry about is the 
availability of the key data.  For example, the timestamps and msg-ids should 
be known without key-listing Riak (which is always a very slow operation).  
There is several options for this, you can either maintain your own index (Riak 
CRDT sets work very well for this), use 2i, or Riak search.

The other thing I’m worried about is something I’ve run into with my data.  If 
you create a key per message as you have indicated, your key size can be very 
small, and you end up aggregating thousands of keys for any reasonable query.  
For pulling large amounts of data out of Riak, try to keep key sizes between 
about 100KB and 1MB.  Riak is still very responsive at those sizes, and there 
isn’t much parsing overhead even if you are only interested in one of the 
messages.  For me, that means grouping data into fixed 5 minute blocks.  It 
will obviously vary depending on message size and number of messages, but I 
wouldn’t go with a key per message unless the messages are >10KB.  Grouping by 
timestamp also gives the advantage that any client can know the keys to query 
in advance since they are fixed.  You said a 10 minute range is ideal, so if 
you can manage to group your data into 10 minute keys, that would likely give 
the best performance when querying.

For grouping data, I would recommend using Riak sets and serialised JSON 
strings.  As long as you don’t have exact duplicate messages, it works very 
well, and allows Riak to resolve conflicts automatically.

As far as those aggregate metrics (for graphing and alerting), I would 
definitely store those in a separate bucket, and group them by 10 minute 
intervals.  The full data keys should only be used for unplanned queries (Riak 
MR jobs), and anything you know you will need should ideally be generated when 
loading the data initially.

Hope this helps, let me know if you have any other questions.

Jason

> On 24 Feb 2015, at 05:24, AM  wrote:
> 
> On 2/22/15 6:16 PM, Jason Campbell wrote:
>> Coming at this from another angle, if you already have a permanent data 
>> store, and you are only reporting on each hour at a time, can you run the 
>> reports based on the log itself?
>> A lot of Riak’s advantage comes from the stability and availability of data 
>> storage, but S3 is already doing that for you.  Riak can store the data, but 
>> I’m not sure what benefit it serves from my understanding of your problem.
>> 
>> Aggregates are usually quite small (even with more advanced things like 
>> histograms), so it’s relatively easy to parse a log line-by-line and produce 
>> aggregates in-memory for a report.
>> 
>> Can you give a bit more detail on why are you using Riak?
> 
> For the most part yes, we are using EMR at the moment, but some of the 
> reasons I want to go down that road are:
> 
> - We are not quite 'bit data' (using that definition that I can process 60 
> mins of my data on an 8 core 16G machine in under 40 mins) and EMR is 
> actually 'slower' for us, than just running it locally on a large machine. 
> That brings its own stability and maintenance issues for us. It would be much 
> nicer if the data was stored relliably and in a format that was query-able 
> quickly instead of having to reprocess things.
> 
> - The data is compressed and we actually waste quite a bit of time 
> decompressing it for EMR which is yet another issue if we have to re-process 
> due to single machine durability issues.
> 
> - We want to  be able to drive graphs and alerts off of the data whose 
> granularity is most likely going to be of the order of 10 mins . These are 
> just counters on a single time dimension so I am assuming that if I get the 
> model right I will this will be easy. Yes we can do this via EMR but it also 
> requires additional moving parts that we would have to manage.
> 
> - We have certain BI use cases (as yet not clearly defined) that riak MR 
> would be quite useful and faster for us.
> 
> All in all Riak appears to offer the sweet spot of reliability, data 
> management and querying tools such that all we would have to be concerned 
> about is the the actual cluster itself.
> 
> Thanks.
> AM
>> Hope this helps,
>> Jason
>> 
>>> On 23 Feb 2015, at 13:03, AM  wrote:
>>> 
>>> Hi Jason, Christopher.
>>> 
>>> This is supposed to be an append-only time-limited data. I only intend to 
>>> save about 2 weeks worth of data (which is yet another thing I need to 
>>> figure out, ie how to vacate older data).
>>> 
>>> Re: querying, for the most part the system will be building out hourly 
>>> reports based on geo, build and location information so I nee

Yokozuna queries slow

2015-04-20 Thread Jason Campbell
Hello,

I'm currently trying to debug slow YZ queries, and I've narrowed down the 
issue, but not sure how to solve it.

First off, we have about 80 million records in Riak (and YZ), but the queries 
return relatively few (a thousand or so at most).  Our query times are anywhere 
from 800ms to 1.5s.

I have been experimenting with queries directly on the Solr node, and it seems 
to be a problem with YZ and the way it does vnode filters.

Here is the same query, emulating YZ first:

{
  "responseHeader":{
"status":0,
"QTime":958,
"params":{
  "q":"timestamp:[1429579919010 TO 1429579921010]",
  "indent":"true",
  "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":80,"start":0,"docs":[]
  }}

And the same query, but including the vnode filter in the main body instead of 
using a filter query:

{
  "responseHeader":{
"status":0,
"QTime":1,
"params":{
  "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR 
_yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)",
  "indent":"true",
  "rows":"0",
  "wt":"json"}},
  "response":{"numFound":80,"start":0,"docs":[]
  }}

I understand there is a caching benefit to using filter queries, but a 
performance difference of 100x or greater doesn't seem worth it, especially 
with a constant data stream.

Is there a way to make YZ do this, or is the only way to query Solr directly, 
bypassing YZ?  Does anyone have any other suggestions of how to make this 
faster?

The timestamp field is a SolrTrieLongField with default settings if anyone is 
curious.

Thanks,
Jason
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna queries slow

2015-04-21 Thread Jason Campbell
Thanks Zeeshan for the info.

Is there a workaround in the mean time, or is the only option to handle queries 
to the individual nodes ourselves?

Is there a planned timeframe for the 2.0.1 release?

Thanks,
Jason

> On 21 Apr 2015, at 16:13, Zeeshan Lakhani  wrote:
> 
> Hey Jason,
> 
> We’re working on performance issues with YZ filter queries, e.g. 
> https://github.com/basho/yokozuna/issues/392, and coverage plan 
> generation/caching, and our CliServ team has started doing a ton of 
> benchmarks as well.
> 
> You can bypass YZ, but then you’d have to create a way to generate your own 
> coverage plans and other things involving distributed solr that YZ gives you. 
> Nonetheless, we’re actively working on improving these issues you’ve 
> encountered. 
> 
> Zeeshan Lakhani
> programmer | 
> software engineer at @basho | 
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> 
>> On Apr 21, 2015, at 1:06 AM, Jason Campbell  wrote:
>> 
>> Hello,
>> 
>> I'm currently trying to debug slow YZ queries, and I've narrowed down the 
>> issue, but not sure how to solve it.
>> 
>> First off, we have about 80 million records in Riak (and YZ), but the 
>> queries return relatively few (a thousand or so at most).  Our query times 
>> are anywhere from 800ms to 1.5s.
>> 
>> I have been experimenting with queries directly on the Solr node, and it 
>> seems to be a problem with YZ and the way it does vnode filters.
>> 
>> Here is the same query, emulating YZ first:
>> 
>> {
>>  "responseHeader":{
>>"status":0,
>>"QTime":958,
>>"params":{
>>  "q":"timestamp:[1429579919010 TO 1429579921010]",
>>  "indent":"true",
>>  "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10",
>>  "rows":"0",
>>  "wt":"json"}},
>>  "response":{"numFound":80,"start":0,"docs":[]
>>  }}
>> 
>> And the same query, but including the vnode filter in the main body instead 
>> of using a filter query:
>> 
>> {
>>  "responseHeader":{
>>"status":0,
>>"QTime":1,
>>"params":{
>>  "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR 
>> _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)",
>>  "indent":"true",
>>  "rows":"0",
>>  "wt":"json"}},
>>  "response":{"numFound":80,"start":0,"docs":[]
>>  }}
>> 
>> I understand there is a caching benefit to using filter queries, but a 
>> performance difference of 100x or greater doesn't seem worth it, especially 
>> with a constant data stream.
>> 
>> Is there a way to make YZ do this, or is the only way to query Solr 
>> directly, bypassing YZ?  Does anyone have any other suggestions of how to 
>> make this faster?
>> 
>> The timestamp field is a SolrTrieLongField with default settings if anyone 
>> is curious.
>> 
>> Thanks,
>> Jason
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna queries slow

2015-04-21 Thread Jason Campbell
That is really weird, I tried that before posting, yet it seems to fix things 
now.

Oh well, I'll try rolling it out across the cluster, hopefully it works.

Maybe someone with more Solr knowledge can enlighten me though, does disabling 
the filter cache change behaviour as well?

For example, running q=_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 still 
takes ~1500ms, yet q=timestamp:[1429579919010 TO 1429579921010]&qf=_yz_pn:55 OR 
_yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 takes 0-5ms.

So clearly the query filter isn't being executed in the same way it was before, 
it's not just that it doesn't have to write the result to a cache.

Thanks for the help guys.

> On 22 Apr 2015, at 08:02, Zeeshan Lakhani  wrote:
> 
> Hey Steve,
> 
> Yep, that’s the specific reference point I was referring to in the link I 
> shared. 
> 
> We’re working on cleaning-up some stuff to help perf for large indexes within 
> the yz code itself. We’ve also discussed how to improve configuring specific 
> solrconfig tunable items as a property of the index (not bucket); so, it’s 
> something we’re looking into.
> 
> Thanks for helping out :).
> 
> Zeeshan Lakhani
> programmer | 
> software engineer at @basho | 
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> 
>> On Apr 21, 2015, at 5:41 PM, Steve Garon  wrote:
>> 
>> Zeeshan, 
>> 
>> For that specific case, you guys should add {!cache=false} in front on your 
>> query plan. Therefore, queries on large index won't be slowed down. I'd 
>> really like to see some of the solrconfig.xml config to be exported to the 
>> riak bucket properties. The caching flag could be a property on the bucket. 
>> Same for soft commit timeouts. We had to increase soft commit timeouts to 
>> 10sec instead of the 1sec default. 
>> 
>> 
>> Steve
>> 
>> On 21 April 2015 at 16:02, Zeeshan Lakhani  wrote:
>> Nice Steve.
>> 
>> Zeeshan Lakhani
>> programmer | 
>> software engineer at @basho | 
>> org. member/founder of @papers_we_love | paperswelove.org
>> twitter => @zeeshanlakhani
>> 
>>> On Apr 21, 2015, at 3:57 PM, Steve Garon  wrote:
>>> 
>>> Jason, 
>>> 
>>> Comment the  section in the bucket's solrconfig.xml and 
>>> restart riak. Now your queries will be fast again :-)
>>> 
>>> 
>>> Steve
>>> 
>>> On 21 April 2015 at 04:24, Zeeshan Lakhani  wrote:
>>> No real workaround other than what you described or looking into 
>>> config/fq-no-cache settings as mentioned in 
>>> http://lucidworks.com/blog/advanced-filter-caching-in-solr/ and playing 
>>> around with those.
>>> 
>>> Riak is now at 2.1.0. I hope that one of the next few point releases will 
>>> see the fix. 
>>> 
>>> 
>>> Zeeshan Lakhani
>>> programmer | 
>>> software engineer at @basho | 
>>> org. member/founder of @papers_we_love | paperswelove.org
>>> twitter => @zeeshanlakhani
>>> 
>>>> On Apr 21, 2015, at 4:11 AM, Jason Campbell  wrote:
>>>> 
>>>> Thanks Zeeshan for the info.
>>>> 
>>>> Is there a workaround in the mean time, or is the only option to handle 
>>>> queries to the individual nodes ourselves?
>>>> 
>>>> Is there a planned timeframe for the 2.0.1 release?
>>>> 
>>>> Thanks,
>>>> Jason
>>>> 
>>>>> On 21 Apr 2015, at 16:13, Zeeshan Lakhani  wrote:
>>>>> 
>>>>> Hey Jason,
>>>>> 
>>>>> We’re working on performance issues with YZ filter queries, e.g. 
>>>>> https://github.com/basho/yokozuna/issues/392, and coverage plan 
>>>>> generation/caching, and our CliServ team has started doing a ton of 
>>>>> benchmarks as well.
>>>>> 
>>>>> You can bypass YZ, but then you’d have to create a way to generate your 
>>>>> own coverage plans and other things involving distributed solr that YZ 
>>>>> gives you. Nonetheless, we’re actively working on improving these issues 
>>>>> you’ve encountered. 
>>>>> 
>>>>> Zeeshan Lakhani
>>>>> programmer | 
>>>>> software engineer at @basho | 
>>>>> org. member/founder of @papers_we_love | paperswelove.org
>>>>> twitter => @zeeshanlakhani
>>>>> 
>>>>>> On Apr 21, 2015, at 1:06 AM, Jason Campbell  wrote:
>>>>>> 
>>>

Re: Yokozuna queries slow

2015-04-21 Thread Jason Campbell
solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"

This is where I start getting worried, the query is now changed to *:* instead 
of the one I passed to yokozuna, and rows=0.  Not surprisingly, this takes 
almost 2 seconds since it has to count a massive amount of documents.

127.0.0.1 - - [22/Apr/2015:02:18:03 +] "POST 
/internal_solr/snapshots_index/select HTTP/1.1" 200 882 "-" "-" 1.856 
"q=%2A%3A%2A&rows=0&wt=json&shards=10.0.1.234%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.235%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.237%3A8093%2Finternal_solr%2Fsnapshots_index%2C10.0.1.238%3A8093%2Finternal_solr%2Fsnapshots_index&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10"

This is a similar query as the previous one, but this one is from localhost 
(not the private IP) and the output format is json, not javabin.  It's also 
missing the usual Solr user agent.  This is still a *:* query requesting 0 rows 
though, and also takes almost 2 seconds.

Now I'm not sure these last 2 queries are actually required to complete the 
yokozuna request, but they are run every time I make a new request.  Although 
each of these queries takes over 1.8 seconds, the yokozuna request completed in 
1.2 seconds.  However, if it was only waiting on the shorter requests, even 
1.2s seconds is much too long.

Can anyone give insight into what yokozuna is doing here and hopefully a way to 
make it as fast as the underlying solr queries (>10ms)?

Still confused,
Jason

> On 22 Apr 2015, at 08:27, Jason Campbell  wrote:
> 
> That is really weird, I tried that before posting, yet it seems to fix things 
> now.
> 
> Oh well, I'll try rolling it out across the cluster, hopefully it works.
> 
> Maybe someone with more Solr knowledge can enlighten me though, does 
> disabling the filter cache change behaviour as well?
> 
> For example, running q=_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 still 
> takes ~1500ms, yet q=timestamp:[1429579919010 TO 1429579921010]&qf=_yz_pn:55 
> OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10 takes 0-5ms.
> 
> So clearly the query filter isn't being executed in the same way it was 
> before, it's not just that it doesn't have to write the result to a cache.
> 
> Thanks for the help guys.
> 
>> On 22 Apr 2015, at 08:02, Zeeshan Lakhani  wrote:
>> 
>> Hey Steve,
>> 
>> Yep, that’s the specific reference point I was referring to in the link I 
>> shared. 
>> 
>> We’re working on cleaning-up some stuff to help perf for large indexes 
>> within the yz code itself. We’ve also discussed how to improve configuring 
>> specific solrconfig tunable items as a property of the index (not bucket); 
>> so, it’s something we’re looking into.
>> 
>> Thanks for helping out :).
>> 
>> Zeeshan Lakhani
>> programmer | 
>> software engineer at @basho | 
>> org. member/founder of @papers_we_love | paperswelove.org
>> twitter => @zeeshanlakhani
>> 
>>> On Apr 21, 2015, at 5:41 PM, Steve Garon  wrote:
>>> 
>>> Zeeshan, 
>>> 
>>> For that specific case, you guys should add {!cache=false} in front on your 
>>> query plan. Therefore, queries on large index won't be slowed down. I'd 
>>> really like to see some of the solrconfig.xml config to be exported to the 
>>> riak bucket properties. The caching flag could be a property on the bucket. 
>>> Same for soft commit timeouts. We had to increase soft commit timeouts to 
>>> 10sec instead of the 1sec default. 
>>> 
>>> 
>>> Steve
>>> 
>>> On 21 April 2015 at 16:02, Zeeshan Lakhani  wrote:
>>> Nice Steve.
>>> 
>>> Zeeshan Lakhani
>>> programmer | 
>>> software engineer at @basho | 
>>> org. member/founder of @papers_we_love | paperswelove.org
>>> twitter => @zeeshanlakhani
>>> 
>>>> On Apr 21, 2015, at 3:57 PM, Steve Garon  wrote:
>>>> 
>>>> Jason, 
>>>> 
>>>> Comment the  section in the bucket's solrconfig.xml and 
>&g

Re: Yokozuna queries slow

2015-04-22 Thread Jason Campbell
After some comparisons with tcpdump, I think the strange results are related to 
a too aggressive redirect rule.

Please ignore.  Sorry for the confusion.

> On 22 Apr 2015, at 12:35, Jason Campbell  wrote:
> 
> Sorry, still running into issues.
> 
> So I've disabled cache on all solr nodes, but the yokozuna query times are 
> still high.
> 
> Oddly, running queries against any one node completes in a few ms.  So I 
> injected nginx between riak and solr so I could log queries, and now I'm even 
> more confused.
> 
> There is 5 requests to each solr node for every request to yokozuna.
> 
> These are the nginx logs for one Solr node.  The log format is apache 
> combined, plus the total query time, and a dump of the POST body.
> 
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +] "POST 
> /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" 
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.007 
> "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.235%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +] "POST 
> /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" 
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.008 
> "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.234%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 10.0.1.238 - - [22/Apr/2015:02:18:01 +] "POST 
> /internal_solr/snapshots_index/select HTTP/1.1" 200 774 "-" 
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 0.001 
> "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10.0.1.235%3A8093=%28_yz_pn%3A62+AND+%28_yz_fpn%3A62+OR+_yz_fpn%3A61%29%29+OR+_yz_pn%3A57+OR+_yz_pn%3A47+OR+_yz_pn%3A37+OR+_yz_pn%3A27+OR+_yz_pn%3A17+OR+_yz_pn%3A7&10.0.1.238%3A8093=_yz_pn%3A60+OR+_yz_pn%3A50+OR+_yz_pn%3A40+OR+_yz_pn%3A30+OR+_yz_pn%3A20+OR+_yz_pn%3A10&start=0&fsv=true&fl=_yz_id%2Cscore&distrib=false&isShard=true&shard.url=10.0.1.237%3A8093%2Finternal_solr%2Fsnapshots_index&NOW=1429669081777&wt=javabin&version=2"
> 
> These queries look fine, they are from localhost, but over the internal IP 
> address, and they all complete in under 10ms.  The return the exact same body 
> length, so I'm confused why it's run 3 times, but not much of a performance 
> concern.
> 
> 10.0.1.238 - - [22/Apr/2015:02:18:03 +] "POST 
> /internal_solr/snapshots_index/select HTTP/1.1" 200 778 "-" 
> "Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0" 1.844 
> "rows=0&10.0.1.237%3A8093=_yz_pn%3A54+OR+_yz_pn%3A44+OR+_yz_pn%3A34+OR+_yz_pn%3A24+OR+_yz_pn%3A14+OR+_yz_pn%3A4&q=*%3A*&10.0.1.234%3A8093=%28_yz_pn%3A51+AND+%28_yz_fpn%3A51%29%29+OR+%28_yz_pn%3A41+AND+%28_yz_fpn%3A41%29%29+OR+%28_yz_pn%3A31+AND+%28_yz_fpn%3A31%29%29+OR+%28_yz_pn%3A21+AND+%28_yz_fpn%3A21%29%29+OR+%28_yz_pn%3A11+AND+%28_yz_fpn%3A11%29%29+OR+_yz_pn%3A1&10

Reindexing solr after backup restore

2015-04-23 Thread Jason Campbell
I've just done a backup and restore of our production Riak cluster, and 
Yokozuna has dropped from around 125 million records to 25million.  Obviously 
the IPs have changed, and although the Riak cluster is stable, I'm not sure 
Solr handled the transition as nicely.

Is there a way to force Solr to rebuild the indexes, or at least get back to 
the state it was in before the backup?

Also, is this expected behaviour?

Thanks,
Jason
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Reindexing solr after backup restore

2015-04-23 Thread Jason Campbell
I think I figured it out.

I followed this guide: 
http://docs.basho.com/riak/latest/ops/running/nodes/renaming/#Clusters-from-Backups

The first Riak node (changed with riak-admin reip) kept it's Solr index.  
However, the other nodes when joined via riak-admin cluster force-replace, 
dropped their Solr indexes.

Is this expected?  If so, it should really be in the docs, and there should be 
another way to restore a cluster keeping Solr intact.

Also, is there a way to rebuild a Solr index?

Thanks,
Jason

> On 24 Apr 2015, at 15:16, Jason Campbell  wrote:
> 
> I've just done a backup and restore of our production Riak cluster, and 
> Yokozuna has dropped from around 125 million records to 25million.  Obviously 
> the IPs have changed, and although the Riak cluster is stable, I'm not sure 
> Solr handled the transition as nicely.
> 
> Is there a way to force Solr to rebuild the indexes, or at least get back to 
> the state it was in before the backup?
> 
> Also, is this expected behaviour?
> 
> Thanks,
> Jason
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Reindexing solr after backup restore

2015-04-24 Thread Jason Campbell
Is there a way to do a restore without rebuilding these indexes though?  
Obviously this could take a long time depending on the amount of indexed data 
in the cluster.  It's a fairly big gotcha to say that Yokozuna fixes a lot of 
the data access issues that Riak has, but if you restore from a backup, it 
could be useless for days or weeks.

As far as disk consistency, the nodes were stopped during the snapshot, so I'm 
assuming on-disk it would be consistent within a single node.  And cluster 
wide, I would expect the overall data to fall somewhere between the first and 
last node snapshot.  AAE should still repair the bits left over, but it 
shouldn't have to rebuild the entire Solr index.

So the heart of the question can I join a node to a cluster without dropping 
it's Solr index?  force-replace obviously doesn't work, what is the harm in 
running reip on every node instead of just the first?

Thanks for the help,
Jason

> On 25 Apr 2015, at 00:36, Zeeshan Lakhani  wrote:
> 
> Hey Jason,
> 
> Here’s a little more discussion on Yokozuna backup strategies: 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-January/014514.html.
> 
> Nonetheless, I wouldn’t say the behavior’s expected, but we’re going to be 
> adding more to the docs on how to rebuild indexes.
> 
> To do so, you could just remove the yz_anti_entropy directory, and make AAE 
> more aggressive, via
> 
> ```
> rpc:multicall([node() | nodes()], application, set_env, [yokozuna, 
> anti_entropy_build_limit, {100, 1000}]).
> rpc:multicall([node() | nodes()], application, set_env, [yokozuna, 
> anti_entropy_concurrency, 4])
> ```
> 
> and the indexes will rebuild. You can try to initialize the building of trees 
> with `yz_entropy_mgr:init([])` via `riak attach`, but a restart would also 
> kick AAE into gear. There’s a bit more related info on this thread: 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2015-March/016929.html.
> 
> Thanks.
> 
> Zeeshan Lakhani
> programmer | 
> software engineer at @basho | 
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> 
>> On Apr 24, 2015, at 1:34 AM, Jason Campbell  wrote:
>> 
>> I think I figured it out.
>> 
>> I followed this guide: 
>> http://docs.basho.com/riak/latest/ops/running/nodes/renaming/#Clusters-from-Backups
>> 
>> The first Riak node (changed with riak-admin reip) kept it's Solr index.  
>> However, the other nodes when joined via riak-admin cluster force-replace, 
>> dropped their Solr indexes.
>> 
>> Is this expected?  If so, it should really be in the docs, and there should 
>> be another way to restore a cluster keeping Solr intact.
>> 
>> Also, is there a way to rebuild a Solr index?
>> 
>> Thanks,
>> Jason
>> 
>>> On 24 Apr 2015, at 15:16, Jason Campbell  wrote:
>>> 
>>> I've just done a backup and restore of our production Riak cluster, and 
>>> Yokozuna has dropped from around 125 million records to 25million.  
>>> Obviously the IPs have changed, and although the Riak cluster is stable, 
>>> I'm not sure Solr handled the transition as nicely.
>>> 
>>> Is there a way to force Solr to rebuild the indexes, or at least get back 
>>> to the state it was in before the backup?
>>> 
>>> Also, is this expected behaviour?
>>> 
>>> Thanks,
>>> Jason
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Reindexing solr after backup restore

2015-04-24 Thread Jason Campbell
This may be a case of force-replace vs replace vs reip.  I'm happy to see if I 
can get new cluster from backup to keep the Solr indexes.

The disk backup was all of /var/lib/riak, so definitely included the YZ indexes 
before the force-replace, and they were kept on the first node that was changed 
with reip.  I stopped each node before the snapshot to ensure consistency.  So 
I would expect the final restored cluster to be somewhere between the first and 
last node snapshot in terms of data, and AAE to repair things to a consistent 
state for that few minute gap.

I'll experiment with different methods of rebuilding the cluster on Monday and 
see if I can get it to keep the Solr indexes.  Maybe moving the YZ indexes out 
of the way during the force-replace, then stopping the node and putting them 
back could help as well.  I'll let you know the results of the experiments 
either way.

Thanks,
Jason

> On 25 Apr 2015, at 09:25, Zeeshan Lakhani  wrote:
> 
> Hey Jason,
> 
> Yeah, nodes can normally be joined without a cluster dropping its Solr Index 
> and AAE normally rebuilds the missing KV bits.
> 
> In the case of restoring from a backup and having missing data, we can only 
> recommend a reindex (the indexes that have the issue) with aggressive AAE 
> settings to speed things up. It can be pretty fast. Recreating indexes are 
> cheap in Yokozuna, but are the `data/yz` directories missing from the nodes 
> that were force-replaced? Unless someone else wants to chime in, I’ll gather 
> more info on what occurred from the reip vs the force-replace. 
> 
> Zeeshan Lakhani
> programmer | 
> software engineer at @basho | 
> org. member/founder of @papers_we_love | paperswelove.org
> twitter => @zeeshanlakhani
> 
>> On Apr 24, 2015, at 7:02 PM, Jason Campbell  wrote:
>> 
>> Is there a way to do a restore without rebuilding these indexes though?  
>> Obviously this could take a long time depending on the amount of indexed 
>> data in the cluster.  It's a fairly big gotcha to say that Yokozuna fixes a 
>> lot of the data access issues that Riak has, but if you restore from a 
>> backup, it could be useless for days or weeks.
>> 
>> As far as disk consistency, the nodes were stopped during the snapshot, so 
>> I'm assuming on-disk it would be consistent within a single node.  And 
>> cluster wide, I would expect the overall data to fall somewhere between the 
>> first and last node snapshot.  AAE should still repair the bits left over, 
>> but it shouldn't have to rebuild the entire Solr index.
>> 
>> So the heart of the question can I join a node to a cluster without dropping 
>> it's Solr index?  force-replace obviously doesn't work, what is the harm in 
>> running reip on every node instead of just the first?
>> 
>> Thanks for the help,
>> Jason
>> 
>>> On 25 Apr 2015, at 00:36, Zeeshan Lakhani  wrote:
>>> 
>>> Hey Jason,
>>> 
>>> Here’s a little more discussion on Yokozuna backup strategies: 
>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-January/014514.html.
>>> 
>>> Nonetheless, I wouldn’t say the behavior’s expected, but we’re going to be 
>>> adding more to the docs on how to rebuild indexes.
>>> 
>>> To do so, you could just remove the yz_anti_entropy directory, and make AAE 
>>> more aggressive, via
>>> 
>>> ```
>>> rpc:multicall([node() | nodes()], application, set_env, [yokozuna, 
>>> anti_entropy_build_limit, {100, 1000}]).
>>> rpc:multicall([node() | nodes()], application, set_env, [yokozuna, 
>>> anti_entropy_concurrency, 4])
>>> ```
>>> 
>>> and the indexes will rebuild. You can try to initialize the building of 
>>> trees with `yz_entropy_mgr:init([])` via `riak attach`, but a restart would 
>>> also kick AAE into gear. There’s a bit more related info on this thread: 
>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2015-March/016929.html.
>>> 
>>> Thanks.
>>> 
>>> Zeeshan Lakhani
>>> programmer | 
>>> software engineer at @basho | 
>>> org. member/founder of @papers_we_love | paperswelove.org
>>> twitter => @zeeshanlakhani
>>> 
>>>> On Apr 24, 2015, at 1:34 AM, Jason Campbell  wrote:
>>>> 
>>>> I think I figured it out.
>>>> 
>>>> I followed this guide: 
>>>> http://docs.basho.com/riak/latest/ops/running/nodes/renaming/#Clusters-from-Backups
>>>> 
>>>> The first Riak node (changed with riak-admin reip) kept it's Solr index.  
>>>&

Re: Reindexing solr after backup restore

2015-05-01 Thread Jason Campbell
Hi guys,

We ended up prototyping DynamoDB for a variety of reasons, so I didn't get to 
test this during the workweek, but it's still interesting as I use Riak for my 
personal projects (although not YZ yet).

So here is the rough process I followed (mainly from 
http://docs.basho.com/riak/latest/ops/running/nodes/renaming/#Clusters-from-Backups):

1. Stop Riak process on all nodes (on freshly restored nodes, Riak refuses to 
start)
2. Restore Riak nodes (via any filesystem-level backup)
3. Change the Riak nodename in all configs
4. ReIP a single node
5. Start riak on the reip'd node
6. Bring "down" all nodes that will be replaced

So far the YZ directories are intact on all nodes, so this is where the fun 
starts.  Fetching a YZ doc count returns an error right now (since there is 
only one node up).

7. For each remaining node, remove the /var/lib/riak/ring directory (but not on 
the running node)
8. Start all remaining nodes

After this, we have 5 running Riak nodes, all with full YZ indexes still.  Only 
one node knows it's supposed to be in a cluster, the other 4 are standalone 
nodes right now.

9. Join the nodes to the cluster.

This is where the YZ indexes are removed.  This isn't immediate, and takes a 
few minutes to take effect, but this happens before ANY node is force-replaced, 
and also before the cluster plan is commited.
Another interesting behaviour pops up here.  If you run 'riak-admin cluster 
clear' on the "primary" node, the cluster status on that node updates, but all 
the other nodes still see the full cluster plan.  For a while at least, until 
all the other nodes eventually realise and crash.  If someone wants more detail 
on this crash, let me know what I can include.

10. So with the YZ indexes gone, let's heal the cluster.  force-replace the 
nodes as applicable and commit the new cluster plan.

So we now have a working cluster, with some of the original YZ documents (from 
the starting node).  Querying for all documents for me returns 35-39 million 
docs (depends on how many vnodes hit that working YZ node I think).  So let's 
try to get things restored properly.

11. Stop a node.
12. Replace it's /var/lib/yz and yz_anti_entropy directories from the backup.
13. Start the node. (49-53 million docs)
14. Repeat 11-13 for node 3. (73-39 million docs)
15. Repeat 11-13 for node 4. (97-103 million docs)
16. Repeat 11-13 for node 5. (126 million docs +- ~40,000 docs)

So overall, it works really well with some extra hackery.  The docs mentioned 
at the start can be followed fairly closely.  Before step 4 of "Bringing up the 
remaining nodes", the yz directory needs to be moved out of the way, or Riak 
will nuke it on cluster join.  After the cluster is fully reformed and stable, 
adding the yz directories back in seems to work fine (on stopped nodes at 
least), and I would imagine AAE would repair anything that's still inconsistent.

Hope that helps someone else, or gets the docs updated a bit.  It's a fairly 
big gotcha that if you don't do things exactly the YZ indexes get dropped, 
maybe Riak could do this on it's own?  Maybe during a cluster join, it could 
stop YZ and move the directory to a backup location, and during the cluster 
join being commited, delete it if it isn't needed (eg. not a force-replaced 
node).

Either way, it's not too hard to do in a shell script or any type of 
configuration management, and only affects restores.  The backup procedure of 
stop riak node, filesystem snapshot, start riak node works fine.

I still have the cluster available if there is anything else you want me to 
test.

Jason Campbell


> On 1 May 2015, at 22:51, Matthew Brender  wrote:
> 
> Hi Jason, 
> 
> Did you and Zeeshan have time to follow up on your experiments? I'm curious 
> to hear the conclusion. Please reply to the riak-user thread so others can 
> learn as well! 
> 
> Best, 
> Matt
> 
> Matt Brender | Developer Advocacy Lead
> Basho Technologies
> t: @mjbrender
> 
> 
> On Fri, Apr 24, 2015 at 8:56 PM, Jason Campbell  wrote:
> This may be a case of force-replace vs replace vs reip.  I'm happy to see if 
> I can get new cluster from backup to keep the Solr indexes.
> 
> The disk backup was all of /var/lib/riak, so definitely included the YZ 
> indexes before the force-replace, and they were kept on the first node that 
> was changed with reip.  I stopped each node before the snapshot to ensure 
> consistency.  So I would expect the final restored cluster to be somewhere 
> between the first and last node snapshot in terms of data, and AAE to repair 
> things to a consistent state for that few minute gap.
> 
> I'll experiment with different methods of rebuilding the cluster on Monday 
> and see if I can get it to keep the Solr indexes.  Maybe moving the YZ 
> indexes o