Filtering not_found in reduce JS causes SyntaxError

2013-10-20 Thread Matt Black
Hey list,

A script recently introduced to cleanup old data by deleting it has caused
one of our old reporting scripts to start failing with “not_found”. I’d
encountered this once before - so I thought the simple introduction of a
reduce phase using Riak.filterNotFound would fix it.

However, now I’m receiving this error - removing the one line addition of
query.reduce("Riak.filterNotFound") gives me my old “not_found” error
straight back.

Exception: 
{"phase":1,"error":"[{<<\"lineno\">>,466},{<<\"message\">>,<<\"SyntaxError:
syntax 
error\">>},{<<\"source\">>,<<\"()\">>}]","input":"{ok,{r_object,<<\"carts\">>,<<\"dd2bcd07fa8019b2d1fc1d4832c41c74\">>,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<\"X-Riak-VTag\">>,52,68,115,107,113,49,105,69,66,109,103,79,106,87,104,75,75,97,53,98,54,65]],[[<<\"index\">>]],[[<<\"X-Riak-Deleted\">>,116,114,117,101]],[[<<\"X-Riak-Last-Modified\">>|{1381,978330,755498}]],[],[]}}},<<>>}],[{<<250,120,75,127,79,209,93,62>>,{6,63516323103}},{<<31,103,165,230,79,209,...>>,...},...],...},...}"}

Any thoughts?

Thanks y'all
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Filtering not_found in reduce JS causes SyntaxError

2013-10-20 Thread Matt Black
BTW, this cluster is running 1.4.0 still. If 1.4.2 would fix this issue I
could update.


On 21 October 2013 10:42, Matt Black  wrote:

> Hey list,
>
> A script recently introduced to cleanup old data by deleting it has caused
> one of our old reporting scripts to start failing with “not_found”. I’d
> encountered this once before - so I thought the simple introduction of a
> reduce phase using Riak.filterNotFound would fix it.
>
> However, now I’m receiving this error - removing the one line addition of
> query.reduce("Riak.filterNotFound") gives me my old “not_found” error
> straight back.
>
> Exception: 
> {"phase":1,"error":"[{<<\"lineno\">>,466},{<<\"message\">>,<<\"SyntaxError: 
> syntax 
> error\">>},{<<\"source\">>,<<\"()\">>}]","input":"{ok,{r_object,<<\"carts\">>,<<\"dd2bcd07fa8019b2d1fc1d4832c41c74\">>,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<\"X-Riak-VTag\">>,52,68,115,107,113,49,105,69,66,109,103,79,106,87,104,75,75,97,53,98,54,65]],[[<<\"index\">>]],[[<<\"X-Riak-Deleted\">>,116,114,117,101]],[[<<\"X-Riak-Last-Modified\">>|{1381,978330,755498}]],[],[]}}},<<>>}],[{<<250,120,75,127,79,209,93,62>>,{6,63516323103}},{<<31,103,165,230,79,209,...>>,...},...],...},...}"}
>
> Any thoughts?
>
> Thanks y'all
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Filtering not_found in reduce JS causes SyntaxError

2013-10-20 Thread Matt Black
The plot thickens. Having run the same query a couple more times just now -
I see a different error! (No changes we made to the code).

Exception: Error processing stream message:
exit:{ucs,{bad_utf8_character_code}}:[{xmerl_ucs,
from_utf8,
1,
[{file,

"xmerl_ucs.erl"},
 {line,
  185}]},
   {mochijson2,

json_encode_string,
2,
[{file,

"src/mochijson2.erl"},
 {line,
  186}]},
   {mochijson2,

'-json_encode_proplist/2-fun-0-',
3,
[{file,

"src/mochijson2.erl"},
 {line,
  167}]},
   {lists,
foldl,
3,
[{file,

"lists.erl"},
 {line,
  1197}]},
   {mochijson2,

json_encode_proplist,
2,
[{file,

"src/mochijson2.erl"},
 {line,
  170}]},

{riak_kv_pb_mapred,

process_stream,
3,
[{file,

"src/riak_kv_pb_mapred.erl"},
 {line,
  115}]},

{riak_api_pb_server,

process_stream,
5,
[{file,

"src/riak_api_pb_server.erl"},
 {line,
  246}]},

{riak_api_pb_server,
handle_info,
2,
[{file,

"src/riak_api_pb_server.erl"},
 {line,
  129}]}]


On 21 October 2013 11:58, Matt Black  wrote:

> BTW, this cluster is running 1.4.0 still. If 1.4.2 would fix this issue I
> could update.
>
>
> On 21 October 2013 10:42, Matt Black  wrote:
>
>> Hey list,
>>
>> A script recently introduced to cleanup old data by deleting it has
>> caused one of our old reporting scripts to start failing with “not_found”.
>> I’d encountered this once before - so I thought the simple introduction of
>> a reduce phase using Riak.filterNotFound would fix it.
>>
>> However, now I’m receiving this error - removing the one line addition of
>> query.reduce("Riak.filterNotFound") gives me my old “not_found” error
>> straight back.
>>
>> Exception: 
>> {"phase":1,"error":"[{<<\"lineno\">>,466},{<<\"message\">>,<<\"SyntaxError: 
>> syntax 
>> error\">>},{<<\"source\">>,<<\"()\">>}]","input":"{ok,{r_object,<<\"carts\">>,<<\"dd2bcd07fa8019b2d1fc1d4832c41c74\">>,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<\"X-Riak-VTag\">>,52,68,115,107,113,49,105,69,66,109,103,79,106,87,104,75,75,97,53,98,54,65]],[[<<\"index\">>]],[[<<\"X-Riak-Deleted\">>,116,114,117,101]],[[<<\"X-Riak-Last-Modified\">>|{1381,978330,755498}]],[],[]}}},<<>>}],[{<<250,120,75,127,79,209,93,62>>,{6,63516323103}},{<<31,103,165,230,79,209,...>>,...},...],...},...}"}
>>
>> Any thoughts?
>>
>> Thanks y'all
>>
>
>

Re: Fetch method returns null point exception error

2013-10-20 Thread 성동찬_Chan
This is also graph about "Queries per second".
You can see end of the graph, "SET" (Red one) reports zero.

[cid:24D4ADBF-24F7-4448-B9A3-F2C7491A6F46]


2013. 10. 17., 오전 10:59, 성동찬_Chan mailto:c...@kakao.com>> 작성:

Hi! Luke

I'm using JAVA Client. Also I don't use http, so it's protocol buffers. (Below 
method)
http://docs.basho.com/riak/latest/dev/taste-of-riak/java/

Thanks.

2013. 10. 16., 오후 11:46, Luke Bakken 
mailto:lbak...@basho.com>> 작성:

protocol buffers

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

<>___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Filtering not_found in reduce JS causes SyntaxError

2013-10-20 Thread Matt Black
I side-stepped this error by adding this little block of code into the top
of my map phase (which we are using elsewhere in the same project):

if(v.values[0].metadata['X-Riak-Deleted'] !== undefined) {
return [];
}

Unfortunately I now have a different problem, which I’ll detail in a
separate thread.

On 21 October 2013 12:15, Matt Black  wrote:

> The plot thickens. Having run the same query a couple more times just now
> - I see a different error! (No changes we made to the code).
>
> Exception: Error processing stream message: 
> exit:{ucs,{bad_utf8_character_code}}:[{xmerl_ucs,
> from_utf8,
> 1,
> [{file,
>   
> "xmerl_ucs.erl"},
>  {line,
>   185}]},
>
> {mochijson2,
> 
> json_encode_string,
> 2,
> [{file,
>   
> "src/mochijson2.erl"},
>  {line,
>   186}]},
>
> {mochijson2,
> 
> '-json_encode_proplist/2-fun-0-',
> 3,
> [{file,
>   
> "src/mochijson2.erl"},
>  {line,
>   167}]},
>{lists,
> foldl,
> 3,
> [{file,
>   
> "lists.erl"},
>  {line,
>   1197}]},
>
> {mochijson2,
> 
> json_encode_proplist,
> 2,
> [{file,
>   
> "src/mochijson2.erl"},
>  {line,
>   170}]},
>
> {riak_kv_pb_mapred,
> 
> process_stream,
> 3,
> [{file,
>   
> "src/riak_kv_pb_mapred.erl"},
>  {line,
>   115}]},
>
> {riak_api_pb_server,
> 
> process_stream,
> 5,
> [{file,
>   
> "src/riak_api_pb_server.erl"},
>  {line,
>   246}]},
>
> {riak_api_pb_server,
> 
> handle_info,
> 2,
> [{file,
>   
> "

Phase 2 Timeout

2013-10-20 Thread Matt Black
Following on from some earlier errors I was getting, I’m now kind of stuck
between a rock and a hard place.

One of our statistics reports fails with a timeout during a
query.filter_not_found() phase:

Exception: 
{"phase":2,"error":"timeout","input":"[<<\"users\">>,<<\"33782eee0470cac583b136fd063decdc\">>,{struct,[{<<\"brid\">>,<<\"33782eee0470cac583b136fd063decdc\">>},{<<\"field1\">>,<<\"2012-11-19T04:53:00Z\">>},{<<\"field2\">>,..
SNIP 
,...]}]","type":"exit","stack":"[{riak_kv_w_reduce,'-js_runner/1-fun-0-',3,[{file,\"src/riak_kv_w_reduce.erl\"},{line,283}]},{riak_kv_w_reduce,reduce,3,[{file,\"src/riak_kv_w_reduce.erl\"},{line,206}]},{riak_kv_w_reduce,maybe_reduce,2,[{file,\"src/riak_kv_w_reduce.erl\"},{line,157}]},{riak_pipe_vnode_worker,process_input,3,[{file,\"src/riak_pipe_vnode_worker.erl\"},{line,445}]},{riak_pipe_vnode_worker,wait_for_input,2,[{file,\"src/riak_pipe_vnode_worker.erl\"},{line,377}]},{gen_fsm,handle_msg,7,[{file,\"gen_fsm.erl\"},{line,494}]},{proc_lib,...}]"}

This is exactly the same problem discussed way back on this very list:


https://groups.google.com/forum/#!topic/nosql-databases/iHYDyqyidkM

Unfortunately this time I’m unable to rewrite the query to work in a
different way - removing the query.filter_not_found() phase I receive a
different error - exit:{json_encode, {bad_term, {not_found (which was
covered in more detail in my previous emails).

Any thoughts on how I can attempt to work around this?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: First stab at sizing a cluster

2013-10-20 Thread Mark Phillips
Hi Nathan,

One alternative to the pure 2i-based solution for this would be time
boxing. Sean referenced it a few months back on the list [1] and it's
worth investigating. There are a few other resources I'm failing to
remember at the moment but I'll send them along tomorrow if I do.
That said, 2i will most-likely work for your queries, too. I would
prototype both and let performance testing be your guide.

On the topic is cluster sizing, it's tough to pin it precisely before
you're up and running. That said, I would start with five of the
Softlayer Smalls at the very least.

Hope that helps.

Mark
twitter.com/pharkmillups

PS - You might also want to experiment with lower N, R, and W values
as log data tends to be immutable and you can pick up some performance
gains by cutting down on how many replicas you're storing and
querying.

[1] (Dietrich's talk Sean links to is a great resource)
http://riak.markmail.org/search/?q=timebox#query:timebox+page:1+mid:e3a7ivrn5eyw3vtz+state:results

On Sat, Oct 19, 2013 at 11:47 AM, N. Tucker
 wrote:
> Hi all, I've been experimenting with using riak to index a large
> amount of log data collected from a bunch of different app instances
> across different machines.  I have our app code instrumented such that
> it attaches secondary indexes to log entries based on some interesting
> metadata (for example, the date, the thread id, the hostname, the
> identity of the user on whose behalf we were doing something, if
> appropriate) and then submits them to riak.  So far I have this
> working against a 1-node riak cluster on a very small slice of
> production log data, which obviously doesn't really add much benefit.
> Time to see about scaling it up.
>
> Ultimately, I'd like my database to reflect an N-most-recent-days
> window of our logs, to make querying them easier than grepping
> gigabytes and gigabytes of logs across dozens of machines.  The
> secondary indexes are especially appealing, because the most common
> task is "give me all the logs associated with this user across all
> machines for a given date window".  This seems like a problem riak is
> well suited for, given the appropriate secondary indexes.
>
> Having no riak sizing experience to speak of and no outside guidance,
> my approach was basically going to be to start out with a 3 or 5 node
> cluster of SoftLayer's "small" riak nodes (see
> http://www.softlayer.com/solutions/big-data/riak-hosting ) or
> comparable hardware, then start shoveling data into it and see how
> large a window I can retain (and query against with reasonable
> performance) while still writing at full blast (assuming I can
> actually write full blast to it -- that remains to be seen).
>
> But then I realized there are probably a few people on this list that
> might be able to give me at least a rough recommendation if I can give
> some details on the data load.  The average log entry is around 140
> bytes of message and maybe another 60 bytes of metadata for secondary
> indexes.  We churn out about 400 million of these log entries per day,
> so in the neighborhood of 4500 per second.
>
> Is this something we should be able to handle on a smallish riak
> cluster using a LevelDB backend?  I'm trying to puzzle out just how
> much this scheme will end up costing us.  Also, what would be a good
> approach for pruning items as they get outside the sliding N-day
> window?  TTL? Delete query by date?  Will this be expensive?  I've
> also seen some threads recently about LevelDB never actually shrinking
> when data is deleted.  Is that a problem I'll run into quickly?
>
> Thanks in advance for any guidance you can give.  Even if the advice
> is "give it up, just use , which was designed for exactly this",
> I'm interested in that type of response, too.  Maybe I'm overlooking
> something much easier.  Stuff like splunk is worth consideration,
> although I'm a pretty big believer in reducing dependencies on outside
> services.  I'm also happy to provide more details on our use case if
> what I've provided isn't enough.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com