Starting riak with init.d-script on Debian 8 fails

2015-02-23 Thread Karsten Hauser
Hi together,

when I try to start my riak installation with the init-script, I run into the 
following error message:


* root@unity-backend-dev:~# /etc/init.d/riak start

* [] Starting riak (via systemctl): riak.serviceFailed to start 
riak.service: Unit riak.service failed to load: No such file or directory.

* failed!

So "riak.service" seems to be missing, but I don't know where.

My system is "Debian GNU/Linux 8" and I have installed "riak_2.0.4-1_amd64.deb".

"riak start" without init.d-script just works well.

Can somebody please help me with this?

Regards
Karsten
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Starting riak with init.d-script on Debian 8 fails

2015-02-23 Thread Ildar Alishev
Hi Karsten,

If Riak start works fine without init.d than it is great. It means that it is 
working, it is just in another directory.

Ildar.
> 23 февр. 2015 г., в 13:12, Karsten Hauser  написал(а):
> 
> Hi together,
>  
> when I try to start my riak installation with the init-script, I run into the 
> following error message:
>  
> · root@unity-backend-dev:~# /etc/init.d/riak start
> · [] Starting riak (via systemctl): riak.serviceFailed to start 
> riak.service: Unit riak.service failed to load: No such file or directory.
> · failed!
>  
> So “riak.service” seems to be missing, but I don’t know where.
>  
> My system is “Debian GNU/Linux 8” and I have installed 
> “riak_2.0.4-1_amd64.deb”.
>  
> “riak start” without init.d-script just works well.
>  
> Can somebody please help me with this?
>  
> Regards
> Karsten
> ___
> riak-users mailing list
> riak-users@lists.basho.com 
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
> 
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Starting riak with init.d-script on Debian 8 fails

2015-02-23 Thread Magnus Kessler
On 23 February 2015 at 10:12, Karsten Hauser  wrote:

>  Hi together,
>
>
>
> when I try to start my riak installation with the init-script, I run into
> the following error message:
>
>
>
> · root@unity-backend-dev:~# /etc/init.d/riak start
>
> · [] Starting riak (via systemctl): riak.serviceFailed to
> start riak.service: Unit riak.service failed to load: No such file or
> directory.
>
> · failed!
>
>
>
> So “riak.service” seems to be missing, but I don’t know where.
>
>
>
> My system is “Debian GNU/Linux 8” and I have installed
> “riak_2.0.4-1_amd64.deb”.
>
>
>
> “riak start” without init.d-script just works well.
>
>
>
> Can somebody please help me with this?
>
>
>
> Regards
>
> Karsten
>

Hi Karsten,

This is a known issue with Riak on the most recent Debian systems. Debian
has now moved to systemd, and Riak does not yet distribute a systemd
*.service file.

I will raise this with our engineering team.

Regards,

Magnus

-- 
Magnus Kessler
Client Services Engineer @ Basho
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


RE: Starting riak with init.d-script on Debian 8 fails

2015-02-23 Thread Karsten Hauser
Hi Ildar,

Indeed, but I have integrated riak into my puppet configuration to simplify our 
deployment. There the init.d-script will be called automatically. So just 
calling “riak start” doesn’t work for me..

Best
Karsten


From: Ildar Alishev [mailto:ildaralis...@gmail.com]
Sent: Montag, 23. Februar 2015 11:14
To: Karsten Hauser
Cc: riak-users@lists.basho.com
Subject: Re: Starting riak with init.d-script on Debian 8 fails

Hi Karsten,

If Riak start works fine without init.d than it is great. It means that it is 
working, it is just in another directory.

Ildar.
23 февр. 2015 г., в 13:12, Karsten Hauser 
mailto:kl.hau...@epages.com>> написал(а):

Hi together,

when I try to start my riak installation with the init-script, I run into the 
following error message:

• root@unity-backend-dev:~# /etc/init.d/riak start
• [] Starting riak (via systemctl): riak.serviceFailed to start 
riak.service: Unit riak.service failed to load: No such file or directory.
• failed!

So “riak.service” seems to be missing, but I don’t know where.

My system is “Debian GNU/Linux 8” and I have installed “riak_2.0.4-1_amd64.deb”.

“riak start” without init.d-script just works well.

Can somebody please help me with this?

Regards
Karsten
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data modelling questions

2015-02-23 Thread AM

On 2/22/15 6:16 PM, Jason Campbell wrote:

Coming at this from another angle, if you already have a permanent data store, 
and you are only reporting on each hour at a time, can you run the reports 
based on the log itself?
A lot of Riak’s advantage comes from the stability and availability of data 
storage, but S3 is already doing that for you.  Riak can store the data, but 
I’m not sure what benefit it serves from my understanding of your problem.

Aggregates are usually quite small (even with more advanced things like 
histograms), so it’s relatively easy to parse a log line-by-line and produce 
aggregates in-memory for a report.

Can you give a bit more detail on why are you using Riak?


For the most part yes, we are using EMR at the moment, but some of the 
reasons I want to go down that road are:


- We are not quite 'bit data' (using that definition that I can process 
60 mins of my data on an 8 core 16G machine in under 40 mins) and EMR is 
actually 'slower' for us, than just running it locally on a large 
machine. That brings its own stability and maintenance issues for us. It 
would be much nicer if the data was stored relliably and in a format 
that was query-able quickly instead of having to reprocess things.


- The data is compressed and we actually waste quite a bit of time 
decompressing it for EMR which is yet another issue if we have to 
re-process due to single machine durability issues.


- We want to  be able to drive graphs and alerts off of the data whose 
granularity is most likely going to be of the order of 10 mins . These 
are just counters on a single time dimension so I am assuming that if I 
get the model right I will this will be easy. Yes we can do this via EMR 
but it also requires additional moving parts that we would have to manage.


- We have certain BI use cases (as yet not clearly defined) that riak MR 
would be quite useful and faster for us.


All in all Riak appears to offer the sweet spot of reliability, data 
management and querying tools such that all we would have to be 
concerned about is the the actual cluster itself.


Thanks.
AM

Hope this helps,
Jason


On 23 Feb 2015, at 13:03, AM  wrote:

Hi Jason, Christopher.

This is supposed to be an append-only time-limited data. I only intend to save 
about 2 weeks worth of data (which is yet another thing I need to figure out, 
ie how to vacate older data).

Re: querying, for the most part the system will be building out hourly reports 
based on geo, build and location information so I need to have a model that 
allows me to aggregate by timestamp + [each-of-geo-build-location] or just do 
it on the fly during ingestion.

Ingestion is yet another thing where I have some flexibility as it is a batch 
job, ie log files get dropped on S3 and we get notified (usually on an hourly 
basis, some logs on a 10-min basis) so I can massage it further but I am 
concerned that every place where I buffer is another opportunity for losing 
data and I would like to avoid reprocessing as much as possible.

Messages will already have the timestamp and msg-id and I will mostly be 
interested in aggregates. In some very rare cases I expect to be able to simply 
run map-reduce jobs for custom queries.

Given that, does my current model look reasonable?

Thanks.
AM


On 2/21/15 6:40 PM, Jason Campbell wrote:

I have the same questions as Christopher.

Does this data need to change, or is it write-once?
What information do you have when querying?
  - Will you already have timestamp and msg-id?
  - If not, you may want to consider aggregating everything into a single key.  
This is easier of the data isn’t changing.
What data will you typically be querying?
  - Will you typically be looking for a single element of data, or aggregates 
(graphing or mapping for example)?
  - If aggregates, what fields are you aggregating on (timestamp, geo, 
location, etc) and which will be fixed?

The aggregate question may need a little more explanation, so I will use an 
example.

I have been working on time-series data with my key being: 
::
Node-id and metric-id are fixed, they will never be merged in an aggregate way, 
and I have them before querying.
Timestamp is my aggregate value, I may need a single timestamp, or hundreds of 
thousands of timestamps (to draw a graph).  For this reason, I grouped my 
metrics by 5 minute block instead of one key per timestamp.  I also created 
aggregates with relevant averages and such for 1 hour, 1 day and 1 month to 
reduce the amount of key lookups for large graphs.

So it depends what visualisations you want.  If you are going to be mapping the 
most recent data based on the geo or location, I would include aggregates for 
that.  If you are more interested in timestamp, group by that.  Because Riak 
doesn’t have multi-key consistency though, also choose an canonical source of 
data.  If you store the same data in multiple keys, they will diverge at some 
point.  Decide now which is the real source, and which 

ACLs not being set correctly for riak-cs

2015-02-23 Thread Shawn Debnath
Hi there,

I can't seem to be able to get ACLs set properly on newly created buckets in 
riak-cs. I am using s3curl to push the payload up  via PUT /?acl and it returns 
200 OK. However, a GET /?acl returns an xml payload with missing IDs. Without 
manually pushing new ACLs, the default ACLs correctly gives access to the 
owner, but as soon as I push a custom ACL set, it screws up the grants for both 
the owner and the other users.

NOTE: The keys below are for a private test environment so substitute your 
values accordingly.

Any help appreciated on pointing me to the right direction!

Thanks,
Shawn



Here are the three user IDs, keys and secrets. I want the owner to retain full 
control while I want to grant WRITE privileges to publisher and READ privileges 
to reader.


admin_id: feab26c2fec623a34e7d60e620b42a7786eca3223b5e2faebc5d248a34f3239e
admin_key: 1049V_JJHPH7TO_QPWVC
admin_secret: lMQsnn3Cukk1UR28FAtoZiap9KEOjBRgYKiVVg==
publisher_id: 
5efc8fb59754a6d11eb1a36c501a8ef7b1be44b0300fbe3df354423b7a115ac5
publisher_key: D-YBO-QHCHU9MEHNZR1D
publisher_secret: nin5LA4WHEuJeTuzN-qCWBXsOvTyUbdPuDQ3eg==
reader_id: de6831d6da88df325d474f7f6c1f708596998c54fc0817685f8c67f1d8cab239
reader_key: _QOKYEHYM6S-YDDHGSYF
reader_secret: sFc1HBhjQzfr70Yda-ke257LHkVCPNAN0chs9A==


http://data.basho.com/doc/2012-04-05/";>
  
feab26c2fec623a34e7d60e620b42a7786eca3223b5e2faebc5d248a34f3239e
  
  

  http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUser">

feab26c2fec623a34e7d60e620b42a7786eca3223b5e2faebc5d248a34f3239e
 
 FULL_CONTROL


  http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUser">

5efc8fb59754a6d11eb1a36c501a8ef7b1be44b0300fbe3df354423b7a115ac5
 
 WRITE


  http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUser">

de6831d6da88df325d474f7f6c1f708596998c54fc0817685f8c67f1d8cab239
 
 READ

  



$ bin/s3curl.pl --debug --id ${RIAK_ADMIN_KEY} --key ${RIAK_ADMIN_SECRET} --acl 
private -- -s -v -x localhost:50201 -X PUT http://social-media.cs.domain.com/

s3curl: Found the url: host=social-media.cs.domain.com; port=; uri=/; query=;
s3curl: vanity endpoint signing case
s3curl: StringToSign='PUT\n\n\nMon, 23 Feb 2015 20:03:15 
+\nx-amz-acl:private\n/social-media/'
s3curl: signature='v48ovqQBnqfEcBZ7kPedpbs1Xt4='
s3curl: exec curl -H Date: Mon, 23 Feb 2015 20:03:15 + -H Authorization: 
AWS 1049V_JJHPH7TO_QPWVC:v48ovqQBnqfEcBZ7kPedpbs1Xt4= -H x-amz-acl: private -L 
-s -v -x localhost:50201 -X PUT http://social-media.cs.domain.com/
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 50201 (#0)
> PUT http://social-media.cs.domain.com/ HTTP/1.1
> User-Agent: curl/7.37.1
> Host: social-media.cs.domain.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> Date: Mon, 23 Feb 2015 20:03:15 +
> Authorization: AWS 1049V_JJHPH7TO_QPWVC:v48ovqQBnqfEcBZ7kPedpbs1Xt4=
> x-amz-acl: private
>
< HTTP/1.1 200 OK
* Server Riak CS is not blacklisted
< Server: Riak CS
< Date: Mon, 23 Feb 2015 20:03:16 GMT
< Content-Type: application/xml
< Content-Length: 0
<
* Connection #0 to host localhost left intact



$  bin/s3curl.pl --debug --id ${RIAK_ADMIN_KEY} --key ${RIAK_ADMIN_SECRET} 
--put /tmp/riak-cs-bucket-policy.xml -- -s -v -x localhost:50201 -X PUT 
http://social-media.cs.domain.com/?acl

s3curl: Found the url: host=social-media.cs.domain.com; port=; uri=/; query=acl;
s3curl: vanity endpoint signing case
s3curl: StringToSign='PUT\n\n\nMon, 23 Feb 2015 20:03:21 
+\n/social-media/?acl'
s3curl: signature='QAcPGgB1tZO2+U4M0TvP4Q4uyxQ='
s3curl: exec curl -H Date: Mon, 23 Feb 2015 20:03:21 + -H Authorization: 
AWS 1049V_JJHPH7TO_QPWVC:QAcPGgB1tZO2+U4M0TvP4Q4uyxQ= -L -T 
/tmp/riak-cs-bucket-policy.xml -s -v -x localhost:50201 -X PUT 
http://social-media.cs.domain.com/?acl
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 50201 (#0)
> PUT http://social-media.cs.domain.com/?acl HTTP/1.1
> User-Agent: curl/7.37.1
> Host: social-media.cs.domain.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> Date: Mon, 23 Feb 2015 20:03:21 +
> Authorization: AWS 1049V_JJHPH7TO_QPWVC:QAcPGgB1tZO2+U4M0TvP4Q4uyxQ=
> Content-Length: 1003
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
* Server Riak CS is not blacklisted
< Server: Riak CS
< Date: Mon, 23 Feb 2015 20:03:21 GMT
< Content-Type: application/xml
< Content-Length: 0
<
* Connection #0 to host localhost left intact



bin/s3curl.pl --debug --id ${RIAK_ADMIN_KEY} --key ${RIAK_ADMIN_SECRET}  -- -s 
-v -x localhost:50201 -X GET http://social-media.cs.domain.com/?acl





feab26c2fec623a34e7d60e620b42a7786eca3223b5e2faebc5d248a34f3239e
riak-cs-admin



http://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUs

Re: Data modelling questions

2015-02-23 Thread Jason Campbell
Thanks for the info.

The model looks reasonable, but something I would worry about is the 
availability of the key data.  For example, the timestamps and msg-ids should 
be known without key-listing Riak (which is always a very slow operation).  
There is several options for this, you can either maintain your own index (Riak 
CRDT sets work very well for this), use 2i, or Riak search.

The other thing I’m worried about is something I’ve run into with my data.  If 
you create a key per message as you have indicated, your key size can be very 
small, and you end up aggregating thousands of keys for any reasonable query.  
For pulling large amounts of data out of Riak, try to keep key sizes between 
about 100KB and 1MB.  Riak is still very responsive at those sizes, and there 
isn’t much parsing overhead even if you are only interested in one of the 
messages.  For me, that means grouping data into fixed 5 minute blocks.  It 
will obviously vary depending on message size and number of messages, but I 
wouldn’t go with a key per message unless the messages are >10KB.  Grouping by 
timestamp also gives the advantage that any client can know the keys to query 
in advance since they are fixed.  You said a 10 minute range is ideal, so if 
you can manage to group your data into 10 minute keys, that would likely give 
the best performance when querying.

For grouping data, I would recommend using Riak sets and serialised JSON 
strings.  As long as you don’t have exact duplicate messages, it works very 
well, and allows Riak to resolve conflicts automatically.

As far as those aggregate metrics (for graphing and alerting), I would 
definitely store those in a separate bucket, and group them by 10 minute 
intervals.  The full data keys should only be used for unplanned queries (Riak 
MR jobs), and anything you know you will need should ideally be generated when 
loading the data initially.

Hope this helps, let me know if you have any other questions.

Jason

> On 24 Feb 2015, at 05:24, AM  wrote:
> 
> On 2/22/15 6:16 PM, Jason Campbell wrote:
>> Coming at this from another angle, if you already have a permanent data 
>> store, and you are only reporting on each hour at a time, can you run the 
>> reports based on the log itself?
>> A lot of Riak’s advantage comes from the stability and availability of data 
>> storage, but S3 is already doing that for you.  Riak can store the data, but 
>> I’m not sure what benefit it serves from my understanding of your problem.
>> 
>> Aggregates are usually quite small (even with more advanced things like 
>> histograms), so it’s relatively easy to parse a log line-by-line and produce 
>> aggregates in-memory for a report.
>> 
>> Can you give a bit more detail on why are you using Riak?
> 
> For the most part yes, we are using EMR at the moment, but some of the 
> reasons I want to go down that road are:
> 
> - We are not quite 'bit data' (using that definition that I can process 60 
> mins of my data on an 8 core 16G machine in under 40 mins) and EMR is 
> actually 'slower' for us, than just running it locally on a large machine. 
> That brings its own stability and maintenance issues for us. It would be much 
> nicer if the data was stored relliably and in a format that was query-able 
> quickly instead of having to reprocess things.
> 
> - The data is compressed and we actually waste quite a bit of time 
> decompressing it for EMR which is yet another issue if we have to re-process 
> due to single machine durability issues.
> 
> - We want to  be able to drive graphs and alerts off of the data whose 
> granularity is most likely going to be of the order of 10 mins . These are 
> just counters on a single time dimension so I am assuming that if I get the 
> model right I will this will be easy. Yes we can do this via EMR but it also 
> requires additional moving parts that we would have to manage.
> 
> - We have certain BI use cases (as yet not clearly defined) that riak MR 
> would be quite useful and faster for us.
> 
> All in all Riak appears to offer the sweet spot of reliability, data 
> management and querying tools such that all we would have to be concerned 
> about is the the actual cluster itself.
> 
> Thanks.
> AM
>> Hope this helps,
>> Jason
>> 
>>> On 23 Feb 2015, at 13:03, AM  wrote:
>>> 
>>> Hi Jason, Christopher.
>>> 
>>> This is supposed to be an append-only time-limited data. I only intend to 
>>> save about 2 weeks worth of data (which is yet another thing I need to 
>>> figure out, ie how to vacate older data).
>>> 
>>> Re: querying, for the most part the system will be building out hourly 
>>> reports based on geo, build and location information so I need to have a 
>>> model that allows me to aggregate by timestamp + 
>>> [each-of-geo-build-location] or just do it on the fly during ingestion.
>>> 
>>> Ingestion is yet another thing where I have some flexibility as it is a 
>>> batch job, ie log files get dropped on S3 and we get notified (usually on 
>>> an