About Vector Clocks

2011-01-24 Thread Kresten Krab Thorup
Still trying to grok vector clocks, here's a question.

When riak realizes that two peers did successfully update the same key, those 
two riak_object's will have different (one not deriving from the other) vector 
clocks.  This could happen when e.g. synchronizing using long-haul replication.

When these are then consolidated into a single multi-valued object, how is the 
vector clock of the resulting object computed?  

In other words, why does it keep the two vclocks, one per value?  

I would think that a riak_object should be structured thus:

object ::= { bucket,  key,  [ vclock x value ] }

But, it is structured like this:

object ::= { bucket,  key,  vclock,  [ value ] }

This structure implies that a new vclock is computed by an agent doing the 
merge [involving a client id of that agent], but what happens if two agents 
independently realizes the conflict and tries to do the merge?  Will those two 
independent merges have conflicting vector clocks?

So why not keep both vector clocks around, so that independent agents can do 
merges that will then be conciliable because no new information is created when 
merging?

Kresten







___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: About Vector Clocks

2011-01-24 Thread Kresten Krab Thorup
OK, found it, ... if you want to follow my train of though, feel free to read 
along ...

Turns out vector clocks can be merged, even if one is not derived from the 
other. A vector clock is a list of {agent-id, time} pairs, which has no 
duplicate agent-id's in the list.

Merging is done by

- including all clock entries that are only in one of the clocks, and
- including the "latest" (greater) of time/clock entries that are in 
both.

This operation is commutative and transitive, and does not introduce any new 
information so it can perfectly be done on two distinct nodes (which may 
realize the conflict independently) and then later re-reconciled.

Reconciling the values is a simple operation of computing the union of the 
values (i.e. remove duplicates in the combined set).  Notice that if two agents 
did the same update, then the reconciled value is just a single value.

so .. the reconciled object ( vclock x [value] ) is in conflict exactly iff 
there is more than one value.  

When a client sees a multi-valued object, all it knows is that it in conflict.  
It can't say which agents caused the conflict, or when in terms of vector-clock 
time.  Only that it happened "before" the new vector clock time.

And it's important to develop some patterns or strategies for how to resolve 
conflicts.  Because this is one of the issues that most developers think are 
scary when they hear of these concepts.

Perhaps what I'd like to know is information like ... 

"Peter updated the object to A at 15:00, and Eve updated it to B at 
14:58"

Such information may be useful to a client in order to handle the conflict.  

If you need such information, you'll have to push it through X-Riak-Meta- 
headers; because the meta data is part of the value.

Hmm, ... cool stuff.  Love learning about this.

Kresten





On Jan 24, 2011, at 11:24 , Kresten Krab Thorup wrote:

> Still trying to grok vector clocks, here's a question.
> 
> When riak realizes that two peers did successfully update the same key, those 
> two riak_object's will have different (one not deriving from the other) 
> vector clocks.  This could happen when e.g. synchronizing using long-haul 
> replication.
> 
> When these are then consolidated into a single multi-valued object, how is 
> the vector clock of the resulting object computed?  
> 
> In other words, why does it keep the two vclocks, one per value?  
> 
> I would think that a riak_object should be structured thus:
> 
>   object ::= { bucket,  key,  [ vclock x value ] }
> 
> But, it is structured like this:
> 
>   object ::= { bucket,  key,  vclock,  [ value ] }
> 
> This structure implies that a new vclock is computed by an agent doing the 
> merge [involving a client id of that agent], but what happens if two agents 
> independently realizes the conflict and tries to do the merge?  Will those 
> two independent merges have conflicting vector clocks?
> 
> So why not keep both vector clocks around, so that independent agents can do 
> merges that will then be conciliable because no new information is created 
> when merging?
> 
> Kresten
> 
> 
> 
> 
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak_kv_multi_backend + custom backend

2011-01-24 Thread Sean Cribbs
Anthony,

Thanks for noticing that documentation error. If you would, please file an 
issue on the wiki's issue tracker: https://github.com/basho/riak_wiki/issues 
and we'll get it corrected.  If you're feeling enterprising, you can also fork, 
fix and send a pull-request for the page.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Jan 24, 2011, at 2:43 AM, Anthony Molinaro wrote:

> Hi Sean,
> 
>   Thanks, that seems to work.  Should the wiki be changed?  Currently
> this page
> 
> http://wiki.basho.com/Configuration-Files.html
> 
> has
> 
> multi_backend: list of backends to provide
> Format of each backend specification is {BackendName, BackendModule,
> BackendConfig}, where BackendName is any atom, BackendModule is the name of
> the Erlang module implementing the backend (the same values you would
> provide as "storage_backend" settings), and BackendConfig is
> a parameter that will be passed to the "start/2" function of the backend
> module.
> 
> and it sounds like BackendName has to be a binary.
> 
> Also, it has
> 
> Specify the backend to use for a bucket with
> riak_client:set_bucket(BucketName,[{backend, BackendName}] in Erlang 
> 
> but I'm a little unclear how you would invoke this call, do you attach
> with riak attach, then run the command there?  Because when I try that
> I get
> 
> ** exception error: undefined function riak_client:set_bucket/2
> 
> Seems there might be a riak_client:set_bucket/3, so that documentation may
> be out of date as well.
> 
> Thanks,
> 
> -Anthony
> 
> On Sun, Jan 23, 2011 at 02:50:32PM -0500, Sean Cribbs wrote:
>> Anthony,
>> 
>> This is something I discovered a while back - define your backend names as 
>> binaries and you'll be able to set them properly from the REST interface. 
>> Example:
>> 
>> {storage_backend, riak_kv_multi_backend},
>> {multi_backend_default, <<"bitcask">>},
>> {multi_backend,
>>[ {<<"bitcask">>, riak_kv_bitcask_backend,
>>[{data_root, "/var/lib/riak/bitcask"}]},
>>  {<<"dets">>, riak_kv_dets_backend,
>>[{riak_kv_dets_backend_root, "/var/lib/riak/dets"}]},
>>  {<<"ets">>, riak_kv_ets_backend, []},
>>  {<<"fs">>, riak_kv_fs_backend,
>>[{riak_kv_fs_backend_root, "/var/lib/riak/fs"}]},
>>  {<<"cache">>, riak_kv_cache_backend,
>>[ {riak_kv_cache_backend_memory, 100},
>>  {riak_kv_cache_backend_ttl, 600},
>>  {riak_kv_cache_backend_max_ttl, 3600}
>>]},
>>  {<<"my_backend">>, my_backend, []}
>>]},
>> 
>> Then you can set it like so
>> 
>> curl -X PUT -H "content-type: application/json" -d 
>> '{"props":{"backend":"my_backend"}}' http://127.0.0.1:8098/riak/mybucket
>> 
>> Also note that only allow_mult and n_val are supported bucket properties 
>> from the PB interface (something we'll be fixing soon).
>> 
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On Jan 23, 2011, at 12:37 PM, Anthony Molinaro wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> So I wanted to play around with creating a custom backend, and using
>>> the multi backend, but I am having problems getting anything to work.
>>> 
>>> Here's what I tried so far
>>> 
>>> in app.config
>>> 
>>> {storage_backend, riak_kv_multi_backend},
>>> {multi_backend_default, bitcask},
>>> {multi_backend,
>>>[ {bitcask, riak_kv_bitcask_backend,
>>>[{data_root, "/var/lib/riak/bitcask"}]},
>>>  {dets, riak_kv_dets_backend,
>>>[{riak_kv_dets_backend_root, "/var/lib/riak/dets"}]},
>>>  {ets, riak_kv_ets_backend, []},
>>>  {fs, riak_kv_fs_backend,
>>>[{riak_kv_fs_backend_root, "/var/lib/riak/fs"}]},
>>>  {cache, riak_kv_cache_backend,
>>>[ {riak_kv_cache_backend_memory, 100},
>>>  {riak_kv_cache_backend_ttl, 600},
>>>  {riak_kv_cache_backend_max_ttl, 3600}
>>>]},
>>>  {my_backend, my_backend, []}
>>>]},
>>> 
>>> Then I restart, open a shell and do the following
>>> 
>>> 1> {ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", 8087).
>>> {ok,<0.68.0>}
>>> 2> riakc_pb_socket:set_bucket(Pid, <<"b">>, [{backend, my_backend}]).
>>> ok
>>> 3> riakc_pb_socket:get_bucket(Pid,<<"b">>).
>>> {ok,[{n_val,3},{allow_mult,false}]}
>>> 
>>> So I didn't see my backend there, thus tried the REST API
>>> 
 curl 'http://127.0.0.1:8098/riak/b' ; echo 
>>> {"props":{"name":"b","n_val":3,"allow_mult":false,"last_write_wins":false,"precommit":[],"postcommit":[],"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"old_vclock":86400,"young_vclock":20,"big_vclock":50,"small_vclock":10,"r":"quorum","w":"quorum","dw":"quorum","rw":"quorum"}}
>>> 
>>> It's not there either.  So I try to set it with REST
>>> 
 curl -X PUT -H "Content-Type: application/json" -d 
 '{"props":{"backend":"my_backend"}}' http://127.0.0.1:8098/riak/b
>>> 
>>> Which works, in that now I have
>>> 
 curl 'http://127.0.0.1:8098/riak/b' ; 

Re: make doc fails on erlang client

2011-01-24 Thread David Smith
I have submitted a pull request with this fix -- thanks!

https://github.com/basho/riak-erlang-client/pull/13

D.

On Sun, Jan 23, 2011 at 7:45 AM, Jon Brisbin  wrote:
> Tried to do a make doc in a fresh checkout of riak-erlang-client and it
> failed on unmatched specs. Here's a diff of the two tweaks I made to get it
> to build the docs (the specs were missing an extra timeout value):
> index 42576d8..419e595 100644
> --- a/src/riakc_pb_socket.erl
> +++ b/src/riakc_pb_socket.erl
> @@ -488,6 +488,7 @@ mapred_bucket(Pid, Bucket, Query, Timeout) ->
>  %% @spec mapred_bucket(Pid :: pid(),
>  %%                     Bucket :: bucket(),
>  %%                     Query :: [riak_kv_mapred_query:mapred_queryterm()],
> +%%                     TimeoutMillisecs :: integer() | 'infinity',
>  %%                     TimeoutMillisecs :: integer() | 'infinity') ->
>  %%       {ok, {ReqId :: term(), MR_FSM_PID :: pid()}} |
>  %%       {error, Err :: term()}
> @@ -520,6 +521,7 @@ mapred_bucket_stream(Pid, Bucket, Query, ClientPid,
> Timeout) ->
>  %%                            Bucket :: bucket(),
>  %%                            Query ::
> [riak_kv_mapred_query:mapred_queryterm()],
>  %%                            ClientPid :: pid(),
> +%%                            TimeoutMillisecs :: integer() | 'infinity',
>  %%                            TimeoutMillisecs :: integer() | 'infinity')
> ->
>  %%       {ok, {ReqId :: term(), MR_FSM_PID :: pid()}} |
>  %%       {error, Err :: term()}
>
> Thanks!
> Jon Brisbin
>         Web: http://jbrisbin.com/
>     Twitter: @j_brisbin
>       Skype: jon.brisbin
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>



-- 
Dave Smith
Engineering Manager
Basho Technologies, Inc.
diz...@basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Riak Search 0.14 Released

2011-01-24 Thread Rusty Klophaus
Hello Riak Users,

We are excited to announce the release of Riak Search version 0.14!

Pre-built installations and source tarballs are available at:
http://downloads.basho.com/

Release notes are at (also copied below):
https://github.com/basho/riak_search/raw/riak_search-0.14.0/releasenotes/riak_search-0.14.0.txt

Thanks,
Basho

---
Riak Search 0.14.0 Release Notes


The majority of effort during development of Riak Search 0.14 went
toward rewriting the query parsing and planning system. This fixes all
known query planning bugs. We also managed to add quite a few new
features and performance improvements. See the highlights below for
details.

Important Configuration and Interface Changes:

- The system now uses the 'whitespace_analyzer_factory' by
  default. (It previously used the 'default_analyzer_factory', which
  has been renamed to 'standard_analyzer_factory'.)

- Indexing and searching will fail with an error message if the
  analyzer_factory configuration setting is not set at either a schema
  or field level.

- The method signature for custom Erlang and Javascript extractors has
  changed.

Highlights:

- Fixed the query parser to properly respect field-level analyzer
  settings.

- Fixed the query parser to correctly handle escaped special
  characters and terms within single-quotes and double-quotes.

- Fixed the query parser's interpretation of inclusive and exclusive
  ranges, allowing an inclusive range on one side, and an exclusive
  range on the other (mimicking Lucene).

- Fixed the execution engine to significantly speed up proximity
  searches and phrase searches. (678)

- By default new installations use all Erlang-based extractors, and
  the JVM is not started. Setting the analysis_port in etc/app.config
  will cause the JVM to start and allow the use of Java Lucene-based
  analyzers.

- System now aborts queries that would queue up too many documents in
  a result set. This is controlled by a 'max_search_results' setting
  in riak_search. Note that this only affects the Solr
  interface. Searches through the Riak Client API that feed into a
  Map/Reduce job are still allowed to execute because the system
  streams those results.

- Change handoff of Search data stored in merge_index to be more
  memory efficient.

- Added "*_date", "*_int", "*_text", and "*_txt" dynamic fields to the
  default schema.


Improvements


414 - ETS backend now fully functional (415, 795)
592 - Make parser multi-schema aware
783 - Pass Search Props as KeyData to Map/Reduce Query
788 - Add support for indexing Erlang terms / proplists
839 - Create a way to globally clear schema cache
925 - Change search-cmd commands (set_schema, etc.) to use dashes.

--
Fixed Bugs
--

186 - Qilr fails when parsing ISO8601 dates
311 - Qilr does not correctly parse negative numbers
363 - Range queries broken for negative numbers
369 - Range queries broken for ALL integer fields
405 - Update search:index_dir/N to de-index old documents first
411 - Our handling of NOT is different from Solr - "NOT X", "AND NOT
X", "AND (NOT X)"
609 - Calling search:search or search:explain with a binary hangs shell
611 - Error in inclusive/exclusive range building
612 - Single term queries shouldn't include proximity clauses
622 - schema and schemachange test fail after new parser
711 - Update new #range operator to support negative integers
729 - Make Qilr use analyzer specified in schema
732 - Word Position is thrown off by Stopwords
764 - The function search:delete_doc/2 blocks if run after search:index_dir/2
797 - Ranges with quoted terms do not return correct results
801 - Anonymous javascript extractors stored in Bucket/Keys validate
but are not implemented
802 - Schema allows default field that is not defined, but breaks when analyzing
803 - Cannot use search m/r with riak_client:mapred
832 - Query parser fails on escaped special characters
833 - Proximity searching is currently broken for Whitespace Analyzer
836 - Integer padding is ignored for dynamic fields
837 - The parser interprets hyphens as negations (NOT)
840 - JSON and raw extractors assumes a default field of "value"
849 - Default Erlang Analyzer misses 'that' and 'then' as stop words
850 - text_analyzers module is not tail-recursive
864 - Solr output chokes on dates
885 - Coordinating node exits if result set exceeds available memory
886 - Query parser error when searching on terms that contain the @ symbol
935 - Change merge_index fold to be unordered
956 - Error when setting rs_extractfun through Curl/JSON


Known Issues


362 - Sorting broken on negative numbers
399 - Handoff can potentially lead to extraneous postings pointing to
a missing or changed document
790 - Indexing data too quickly can exhaust the ETS table limit
814 - text_analyzer:default_analyzer_factory skips unicode code points
beyond 0x7f
861 - merge_index throws errors when data pa

Re: riak_kv_multi_backend + custom backend

2011-01-24 Thread Anthony Molinaro
Sean,

I'll fork and submit a pull request sometime soon.  Will things continue
to be configured with binaries, or will it switch to atoms at some point?

Also, Is there a way to configure specific buckets in the config, or to
specify default bucket properties for a particular bucket type in the config?
I see there is default_bucket_props, and you can have bucket specific
properties, but what I would want is some way to have a custom bucket type
which would require the hash function be different.  This doesn't seem like
bucket specific config (ie, config passed to start/2), but instead feels like
per bucket type default config.  I think I can work around it by specifyng
it at the same time I specify the custom backend.  Like

curl -X PUT -H "content-type: application/json" -d 
'{"props":{"backend":"my_backend"},"chash_keyfun":{"mod":"my_backend","fun":"my_hash"}}'
 http://127.0.0.1:8098/riak/mybucket

but this feels risky, as it would be easy for someone to miss.

I'd prefer being able to set that in default bucket props for a particular
backend.  That seems cleaner, and less error-prone.

Anyway, if it doesn't work in this way, any reason it couldn't?

-Anthony

On Mon, Jan 24, 2011 at 08:57:26AM -0500, Sean Cribbs wrote:
> Anthony,
> 
> Thanks for noticing that documentation error. If you would, please file an 
> issue on the wiki's issue tracker: https://github.com/basho/riak_wiki/issues 
> and we'll get it corrected.  If you're feeling enterprising, you can also 
> fork, fix and send a pull-request for the page.
> 
> Sean Cribbs 
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
> 
> On Jan 24, 2011, at 2:43 AM, Anthony Molinaro wrote:
> 
> > Hi Sean,
> > 
> >   Thanks, that seems to work.  Should the wiki be changed?  Currently
> > this page
> > 
> > http://wiki.basho.com/Configuration-Files.html
> > 
> > has
> > 
> > multi_backend: list of backends to provide
> > Format of each backend specification is {BackendName, BackendModule,
> > BackendConfig}, where BackendName is any atom, BackendModule is the name of
> > the Erlang module implementing the backend (the same values you would
> > provide as "storage_backend" settings), and BackendConfig is
> > a parameter that will be passed to the "start/2" function of the backend
> > module.
> > 
> > and it sounds like BackendName has to be a binary.
> > 
> > Also, it has
> > 
> > Specify the backend to use for a bucket with
> > riak_client:set_bucket(BucketName,[{backend, BackendName}] in Erlang 
> > 
> > but I'm a little unclear how you would invoke this call, do you attach
> > with riak attach, then run the command there?  Because when I try that
> > I get
> > 
> > ** exception error: undefined function riak_client:set_bucket/2
> > 
> > Seems there might be a riak_client:set_bucket/3, so that documentation may
> > be out of date as well.
> > 
> > Thanks,
> > 
> > -Anthony
> > 
> > On Sun, Jan 23, 2011 at 02:50:32PM -0500, Sean Cribbs wrote:
> >> Anthony,
> >> 
> >> This is something I discovered a while back - define your backend names as 
> >> binaries and you'll be able to set them properly from the REST interface. 
> >> Example:
> >> 
> >> {storage_backend, riak_kv_multi_backend},
> >> {multi_backend_default, <<"bitcask">>},
> >> {multi_backend,
> >>[ {<<"bitcask">>, riak_kv_bitcask_backend,
> >>[{data_root, "/var/lib/riak/bitcask"}]},
> >>  {<<"dets">>, riak_kv_dets_backend,
> >>[{riak_kv_dets_backend_root, "/var/lib/riak/dets"}]},
> >>  {<<"ets">>, riak_kv_ets_backend, []},
> >>  {<<"fs">>, riak_kv_fs_backend,
> >>[{riak_kv_fs_backend_root, "/var/lib/riak/fs"}]},
> >>  {<<"cache">>, riak_kv_cache_backend,
> >>[ {riak_kv_cache_backend_memory, 100},
> >>  {riak_kv_cache_backend_ttl, 600},
> >>  {riak_kv_cache_backend_max_ttl, 3600}
> >>]},
> >>  {<<"my_backend">>, my_backend, []}
> >>]},
> >> 
> >> Then you can set it like so
> >> 
> >> curl -X PUT -H "content-type: application/json" -d 
> >> '{"props":{"backend":"my_backend"}}' http://127.0.0.1:8098/riak/mybucket
> >> 
> >> Also note that only allow_mult and n_val are supported bucket properties 
> >> from the PB interface (something we'll be fixing soon).
> >> 
> >> Sean Cribbs 
> >> Developer Advocate
> >> Basho Technologies, Inc.
> >> http://basho.com/
> >> 
> >> On Jan 23, 2011, at 12:37 PM, Anthony Molinaro wrote:
> >> 
> >>> 
> >>> Hi,
> >>> 
> >>> So I wanted to play around with creating a custom backend, and using
> >>> the multi backend, but I am having problems getting anything to work.
> >>> 
> >>> Here's what I tried so far
> >>> 
> >>> in app.config
> >>> 
> >>> {storage_backend, riak_kv_multi_backend},
> >>> {multi_backend_default, bitcask},
> >>> {multi_backend,
> >>>[ {bitcask, riak_kv_bitcask_backend,
> >>>[{data_root, "/var/lib/riak/bitcask"}]},
> >>>  {dets, riak_kv_dets_backend,
> >>>[{riak_kv_dets_backend_root, "/var/lib/riak/dets"}]},
> >>>  {ets, riak_

Re: Riak Search 0.14 Released

2011-01-24 Thread Scott Gonyea
One concern from me is calling it standard_analyzer_factory... That name is 
semi-in-use by Solr:


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory


And did not have the same behavior as the (previously) Default Tokenizer. 
That'll have a lot of potential to confuse people coming from Solr. I'd suggest 
calling it something like Generic Analyzer Factory--or at least sticking some 
scary wording around it in the wiki.


Scott
On Monday, January 24, 2011 at 10:53 AM, Rusty Klophaus wrote:

> Hello Riak Users,
> 
> 
> We are excited to announce the release of Riak Search version 0.14!
> 
> Pre-built installations and source tarballs are available at: 
> http://downloads.basho.com/
> 
> Release notes are at (also copied below): 
> https://github.com/basho/riak_search/raw/riak_search-0.14.0/releasenotes/riak_search-0.14.0.txt
> 
> Thanks,
> Basho
> 
> 
> --- Riak Search 0.14.0 Release Notes 
>  The majority of effort during development of 
> Riak Search 0.14 went toward rewriting the query parsing and planning system. 
> This fixes all known query planning bugs. We also managed to add quite a few 
> new features and performance improvements. See the highlights below for 
> details. Important Configuration and Interface Changes: - The system now uses 
> the 'whitespace_analyzer_factory' by default. (It previously used the 
> 'default_analyzer_factory', which has been renamed to 
> 'standard_analyzer_factory'.) - Indexing and searching will fail with an 
> error message if the analyzer_factory configuration setting is not set at 
> either a schema or field level. - The method signature for custom Erlang and 
> Javascript extractors has changed. Highlights: - Fixed the query parser to 
> properly respect field-level analyzer settings. - Fixed the query parser to 
> correctly handle escaped special  characters and 
terms within single-quotes and double-quotes. - Fixed the query parser's 
interpretation of inclusive and exclusive ranges, allowing an inclusive range 
on one side, and an exclusive range on the other (mimicking Lucene). - Fixed 
the execution engine to significantly speed up proximity searches and phrase 
searches. (678) - By default new installations use all Erlang-based extractors, 
and the JVM is not started. Setting the analysis_port in etc/app.config will 
cause the JVM to start and allow the use of Java Lucene-based analyzers. - 
System now aborts queries that would queue up too many documents in a result 
set. This is controlled by a 'max_search_results' setting in riak_search. Note 
that this only affects the Solr interface. Searches through the Riak Client API 
that feed into a Map/Reduce job are still allowed to execute because the system 
streams those results. - Change handoff of Search data stored in merge_index to 
be more memory efficient. - Added "*_date", "*_int", "*_text", an
d "*_txt" dynamic fields to the default schema.  Improvements 
 414 - ETS backend now fully functional (415, 795) 592 - Make 
parser multi-schema aware 783 - Pass Search Props as KeyData to Map/Reduce 
Query 788 - Add support for indexing Erlang terms / proplists 839 - Create a 
way to globally clear schema cache 925 - Change search-cmd commands 
(set_schema, etc.) to use dashes. -- Fixed Bugs -- 186 - Qilr 
fails when parsing ISO8601 dates 311 - Qilr does not correctly parse negative 
numbers 363 - Range queries broken for negative numbers 369 - Range queries 
broken for ALL integer fields 405 - Update search:index_dir/N to de-index old 
documents first 411 - Our handling of NOT is different from Solr - "NOT X", 
"AND NOT X", "AND (NOT X)" 609 - Calling search:search or search:explain with a 
binary hangs shell 611 - Error in inclusive/exclusive range building 612 - 
Single term queries shouldn't include proximity clauses 622 - schema and 
schemachange tes
t fail after new parser 711 - Update new #range operator to support negative 
integers 729 - Make Qilr use analyzer specified in schema 732 - Word Position 
is thrown off by Stopwords 764 - The function search:delete_doc/2 blocks if run 
after search:index_dir/2 797 - Ranges with quoted terms do not return correct 
results 801 - Anonymous javascript extractors stored in Bucket/Keys validate 
but are not implemented 802 - Schema allows default field that is not defined, 
but breaks when analyzing 803 - Cannot use search m/r with riak_client:mapred 
832 - Query parser fails on escaped special characters 833 - Proximity 
searching is currently broken for Whitespace Analyzer 836 - Integer padding is 
ignored for dynamic fields 837 - The parser interprets hyphens as negations 
(NOT) 840 - JSON and raw extractors assumes a default field of "value" 849 - 
Default Erlang Analyzer misses 'that' and 'then' as stop words 850 - 
text_analyzers module is not tail-recursive 864 - Solr output chokes on dates 8
85 - Coordinating node e

Re: riak_kv_multi_backend + custom backend

2011-01-24 Thread Sean Cribbs
Anthony,

Default bucket properties are defined in riak_core, not riak_kv, so you can't 
really set the hash function as a result of setting something else.  
Furthermore, you really only have to set these once, so you could configure 
your application to check the bucket props on startup and set them 
appropriately.

"backend" is a property only used by the multi_backend (which isn't used that 
frequently), so it seems a little presumptuous to special-case that property 
(turning it into an atom).  There are a few other technical reasons that you 
don't want to arbitrarily turn binaries into atoms.  I agree that it's not 
especially intuitive but I think it's a small point of pain, especially if we 
fix the documentation.

Sean Cribbs 
Developer Advocate
Basho Technologies, Inc.
http://basho.com/

On Jan 24, 2011, at 2:01 PM, Anthony Molinaro wrote:

> Sean,
> 
> I'll fork and submit a pull request sometime soon.  Will things continue
> to be configured with binaries, or will it switch to atoms at some point?
> 
> Also, Is there a way to configure specific buckets in the config, or to
> specify default bucket properties for a particular bucket type in the config?
> I see there is default_bucket_props, and you can have bucket specific
> properties, but what I would want is some way to have a custom bucket type
> which would require the hash function be different.  This doesn't seem like
> bucket specific config (ie, config passed to start/2), but instead feels like
> per bucket type default config.  I think I can work around it by specifyng
> it at the same time I specify the custom backend.  Like
> 
> curl -X PUT -H "content-type: application/json" -d 
> '{"props":{"backend":"my_backend"},"chash_keyfun":{"mod":"my_backend","fun":"my_hash"}}'
>  http://127.0.0.1:8098/riak/mybucket
> 
> but this feels risky, as it would be easy for someone to miss.
> 
> I'd prefer being able to set that in default bucket props for a particular
> backend.  That seems cleaner, and less error-prone.
> 
> Anyway, if it doesn't work in this way, any reason it couldn't?
> 
> -Anthony
> 
> On Mon, Jan 24, 2011 at 08:57:26AM -0500, Sean Cribbs wrote:
>> Anthony,
>> 
>> Thanks for noticing that documentation error. If you would, please file an 
>> issue on the wiki's issue tracker: https://github.com/basho/riak_wiki/issues 
>> and we'll get it corrected.  If you're feeling enterprising, you can also 
>> fork, fix and send a pull-request for the page.
>> 
>> Sean Cribbs 
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>> 
>> On Jan 24, 2011, at 2:43 AM, Anthony Molinaro wrote:
>> 
>>> Hi Sean,
>>> 
>>>  Thanks, that seems to work.  Should the wiki be changed?  Currently
>>> this page
>>> 
>>> http://wiki.basho.com/Configuration-Files.html
>>> 
>>> has
>>> 
>>> multi_backend: list of backends to provide
>>> Format of each backend specification is {BackendName, BackendModule,
>>> BackendConfig}, where BackendName is any atom, BackendModule is the name of
>>> the Erlang module implementing the backend (the same values you would
>>> provide as "storage_backend" settings), and BackendConfig is
>>> a parameter that will be passed to the "start/2" function of the backend
>>> module.
>>> 
>>> and it sounds like BackendName has to be a binary.
>>> 
>>> Also, it has
>>> 
>>> Specify the backend to use for a bucket with
>>> riak_client:set_bucket(BucketName,[{backend, BackendName}] in Erlang 
>>> 
>>> but I'm a little unclear how you would invoke this call, do you attach
>>> with riak attach, then run the command there?  Because when I try that
>>> I get
>>> 
>>> ** exception error: undefined function riak_client:set_bucket/2
>>> 
>>> Seems there might be a riak_client:set_bucket/3, so that documentation may
>>> be out of date as well.
>>> 
>>> Thanks,
>>> 
>>> -Anthony
>>> 
>>> On Sun, Jan 23, 2011 at 02:50:32PM -0500, Sean Cribbs wrote:
 Anthony,
 
 This is something I discovered a while back - define your backend names as 
 binaries and you'll be able to set them properly from the REST interface. 
 Example:
 
 {storage_backend, riak_kv_multi_backend},
 {multi_backend_default, <<"bitcask">>},
 {multi_backend,
   [ {<<"bitcask">>, riak_kv_bitcask_backend,
   [{data_root, "/var/lib/riak/bitcask"}]},
 {<<"dets">>, riak_kv_dets_backend,
   [{riak_kv_dets_backend_root, "/var/lib/riak/dets"}]},
 {<<"ets">>, riak_kv_ets_backend, []},
 {<<"fs">>, riak_kv_fs_backend,
   [{riak_kv_fs_backend_root, "/var/lib/riak/fs"}]},
 {<<"cache">>, riak_kv_cache_backend,
   [ {riak_kv_cache_backend_memory, 100},
 {riak_kv_cache_backend_ttl, 600},
 {riak_kv_cache_backend_max_ttl, 3600}
   ]},
 {<<"my_backend">>, my_backend, []}
   ]},
 
 Then you can set it like so
 
 curl -X PUT -H "content-type: application/json" -d 
 '{"props":{"backend":"my_backend"}}' http://127.0.0.1:8098/r

running erlang map phases via REST API

2011-01-24 Thread Brendan
is it possible to pass erlang anonymous funs via the REST API? i took a
javascript query, replaced "language":"javascript" with
"language":"erlang" and changed the source to an anonymous fun, but i
end up getting an error from the REST API back.am i doing something
wrong here, or can erlang functions only be called using the "module"
and "function" fields?

the anonymous erlang fun was pulled from the Phase Functions->Map
Function examples section of the http://wiki.basho.com/MapReduce.html
wiki page.

[dev.a]brendan@build01:~/riak$ curl -X PUT -d 'stuff'
http://127.0.0.1:8098/riak/bucket/object
[dev.a]brendan@build01:~/riak$ curl -X GET
http://127.0.0.1:8098/riak/bucket/object; echo
stuff
[dev.a]brendan@build01:~/riak$ cat erl2
{
"inputs": [
[
"bucket",
"object"
]
],
"query": [
{
"map": {
"language": "erlang",
"source":
"fun(Value,_Keydata,_Arg)->[[riak_object:get_value(Value)]] end."
}
}
]
}
[dev.a]brendan@build01:~/riak$ curl -X POST -H
"content-type:application/json" http://localhost:8098/mapred --data @erl2
500 Internal Server
ErrorInternal Server ErrorThe server
encountered an error while processing this request:{error,badarg,
   [{erlang,binary_to_list,[undefined]},
{riak_kv_mapred_json,bin_to_atom,1},
{riak_kv_mapred_json,parse_step,2},
{riak_kv_mapred_json,parse_query,2},
{riak_kv_mapred_json,parse_request,1},
{riak_kv_wm_mapred,verify_body,2},
{riak_kv_wm_mapred,malformed_request,2},
   
{webmachine_resource,resource_call,3}]}mochiweb+webmachine
web server



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: running erlang map phases via REST API

2011-01-24 Thread Dan Reverri
Hi Brendan,

Anonymous Erlang functions are not currently supported in map reduce phases.
Is this a feature the community would find valuable?

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
d...@basho.com


On Mon, Jan 24, 2011 at 12:14 PM, Brendan  wrote:

> is it possible to pass erlang anonymous funs via the REST API? i took a
> javascript query, replaced "language":"javascript" with
> "language":"erlang" and changed the source to an anonymous fun, but i
> end up getting an error from the REST API back.am i doing something
> wrong here, or can erlang functions only be called using the "module"
> and "function" fields?
>
> the anonymous erlang fun was pulled from the Phase Functions->Map
> Function examples section of the http://wiki.basho.com/MapReduce.html
> wiki page.
>
> [dev.a]brendan@build01:~/riak$ curl -X PUT -d 'stuff'
> http://127.0.0.1:8098/riak/bucket/object
> [dev.a]brendan@build01:~/riak$ curl -X GET
> http://127.0.0.1:8098/riak/bucket/object; echo
> stuff
> [dev.a]brendan@build01:~/riak$ cat erl2
> {
>"inputs": [
>[
>"bucket",
>"object"
>]
>],
>"query": [
>{
>"map": {
>"language": "erlang",
>"source":
> "fun(Value,_Keydata,_Arg)->[[riak_object:get_value(Value)]] end."
>}
>}
>]
> }
> [dev.a]brendan@build01:~/riak$ curl -X POST -H
> "content-type:application/json" http://localhost:8098/mapred --data @erl2
> 500 Internal Server
> ErrorInternal Server ErrorThe server
> encountered an error while processing this request:{error,badarg,
>   [{erlang,binary_to_list,[undefined]},
>{riak_kv_mapred_json,bin_to_atom,1},
>{riak_kv_mapred_json,parse_step,2},
>{riak_kv_mapred_json,parse_query,2},
>{riak_kv_mapred_json,parse_request,1},
>{riak_kv_wm_mapred,verify_body,2},
>{riak_kv_wm_mapred,malformed_request,2},
>
>
> {webmachine_resource,resource_call,3}]}mochiweb+webmachine
> web server
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: running erlang map phases via REST API

2011-01-24 Thread Eric Moritz
I would find this useful. It's easier to run an anonymous function than to
deploy an erlang module to a cluster.
On Jan 24, 2011 3:34 PM, "Dan Reverri"  wrote:
> Hi Brendan,
>
> Anonymous Erlang functions are not currently supported in map reduce
phases.
> Is this a feature the community would find valuable?
>
> Thanks,
> Dan
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> d...@basho.com
>
>
> On Mon, Jan 24, 2011 at 12:14 PM, Brendan  wrote:
>
>> is it possible to pass erlang anonymous funs via the REST API? i took a
>> javascript query, replaced "language":"javascript" with
>> "language":"erlang" and changed the source to an anonymous fun, but i
>> end up getting an error from the REST API back.am i doing something
>> wrong here, or can erlang functions only be called using the "module"
>> and "function" fields?
>>
>> the anonymous erlang fun was pulled from the Phase Functions->Map
>> Function examples section of the http://wiki.basho.com/MapReduce.html
>> wiki page.
>>
>> [dev.a]brendan@build01:~/riak$ curl -X PUT -d 'stuff'
>> http://127.0.0.1:8098/riak/bucket/object
>> [dev.a]brendan@build01:~/riak$ curl -X GET
>> http://127.0.0.1:8098/riak/bucket/object; echo
>> stuff
>> [dev.a]brendan@build01:~/riak$ cat erl2
>> {
>> "inputs": [
>> [
>> "bucket",
>> "object"
>> ]
>> ],
>> "query": [
>> {
>> "map": {
>> "language": "erlang",
>> "source":
>> "fun(Value,_Keydata,_Arg)->[[riak_object:get_value(Value)]] end."
>> }
>> }
>> ]
>> }
>> [dev.a]brendan@build01:~/riak$ curl -X POST -H
>> "content-type:application/json" http://localhost:8098/mapred --data @erl2
>> 500 Internal Server
>> ErrorInternal Server ErrorThe server
>> encountered an error while processing this
request:{error,badarg,
>> [{erlang,binary_to_list,[undefined]},
>> {riak_kv_mapred_json,bin_to_atom,1},
>> {riak_kv_mapred_json,parse_step,2},
>> {riak_kv_mapred_json,parse_query,2},
>> {riak_kv_mapred_json,parse_request,1},
>> {riak_kv_wm_mapred,verify_body,2},
>> {riak_kv_wm_mapred,malformed_request,2},
>>
>>
>>
{webmachine_resource,resource_call,3}]}mochiweb+webmachine
>> web server
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: running erlang map phases via REST API

2011-01-24 Thread Brendan
having an anonymous fun would seem useful on a larger cluster - you
wouldn't need to distribute a new file/module to all machines in the
cluster (as i think you must do now?).

the docs around calling an erlang map phase are a bit confusing
(http://wiki.basho.com/MapReduce.html):
> Function source can be specified directly in the query by using the
> "source" spec field. Function source can also be loaded from a
> pre-stored riak object by providing "bucket" and "key" fields in the
> spec. Erlang map functions can be specified using the "module" and
> "function" fields in the spec.
it wasn't immediately obvious to me that the erlang functions *must* be
specified with module+function, especially since an anonymous fun is
used later on as an example in the same document.

the wiki doc also doesn't indicate where to put any custom erlang
map/reduce functions... do they go into riak-0.14.0/deps/riak_kv/src/
and then need to be built into the release?



On 11-01-24 12:34 PM, Dan Reverri wrote:
> Hi Brendan,
>
> Anonymous Erlang functions are not currently supported in map reduce
> phases. Is this a feature the community would find valuable?
>
> Thanks,
> Dan
>
> Daniel Reverri
> Developer Advocate
> Basho Technologies, Inc.
> d...@basho.com 
>
>
> On Mon, Jan 24, 2011 at 12:14 PM, Brendan  > wrote:
>
> is it possible to pass erlang anonymous funs via the REST API? i
> took a
> javascript query, replaced "language":"javascript" with
> "language":"erlang" and changed the source to an anonymous fun, but i
> end up getting an error from the REST API back.am 
> i doing something
> wrong here, or can erlang functions only be called using the "module"
> and "function" fields?
>
> the anonymous erlang fun was pulled from the Phase Functions->Map
> Function examples section of the http://wiki.basho.com/MapReduce.html
> wiki page.
>
> [dev.a]brendan@build01:~/riak$ curl -X PUT -d 'stuff'
> http://127.0.0.1:8098/riak/bucket/object
> [dev.a]brendan@build01:~/riak$ curl -X GET
> http://127.0.0.1:8098/riak/bucket/object; echo
> stuff
> [dev.a]brendan@build01:~/riak$ cat erl2
> {
>"inputs": [
>[
>"bucket",
>"object"
>]
>],
>"query": [
>{
>"map": {
>"language": "erlang",
>"source":
> "fun(Value,_Keydata,_Arg)->[[riak_object:get_value(Value)]] end."
>}
>}
>]
> }
> [dev.a]brendan@build01:~/riak$ curl -X POST -H
> "content-type:application/json" http://localhost:8098/mapred
> --data @erl2
> 500 Internal Server
> ErrorInternal Server ErrorThe server
> encountered an error while processing this
> request:{error,badarg,
>   [{erlang,binary_to_list,[undefined]},
>{riak_kv_mapred_json,bin_to_atom,1},
>{riak_kv_mapred_json,parse_step,2},
>{riak_kv_mapred_json,parse_query,2},
>{riak_kv_mapred_json,parse_request,1},
>{riak_kv_wm_mapred,verify_body,2},
>{riak_kv_wm_mapred,malformed_request,2},
>
> 
> {webmachine_resource,resource_call,3}]}mochiweb+webmachine
> web server
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com 
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[no subject]

2011-01-24 Thread Sebastjan Trepca
Hey, 


Congrats on new release!


I'm considering porting our search engine to riak-search. We're currently using 
Solr and are quite happy with it except for indexing times where riak could 
probably be much faster.


Our requirements:
- faceting
- float fields
- multi-value fields
- date fields 
- sorting by floats and dates 
- fast updates
- dynamic fields 
- stemmed/unstemmed string fields (useful to get good data from faceting)


AFAIK you support dynamic fields, faceting and date fields. What about other 
points?


Thanks!


Sebastjan


--
http://ly.st



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Re: Search Indexing only specific content-types?

2011-01-24 Thread Mark Phillips
Indeed. contrib.basho.com doesn't have any Search-specific functions
at the moment, but we definitely want to add some if people have
anything to share. I'm sure a pre-commit hook that checks the
content-type of to-be-indexed data would be hugely useful to a lot of
users.

Mark

On Fri, Jan 21, 2011 at 5:06 PM, Eric Moritz  wrote:
> I'm sure the basho folks would love to have that in the function contrib :)
>
> http://contrib.basho.com/
>
> -- Forwarded message --
> From: "Gordon Tillman" 
> Date: Jan 21, 2011 7:55 PM
> Subject: Re: Search Indexing only specific content-types?
> To: "Eric Moritz" 
>
> Thanks for the tip Eric,
> I implemented my own precommit hook that checks the content-type of the
> object being stored.  If it is application/json then it calls the precommit
> function defined by the search module.  If not, it just returns the object.
> Seems to work fine!
> --gordon
> On Jan 21, 2011, at 18:17, Eric Moritz wrote:
>
> This might not be the correct answer but you could write your own version if
> the search post_commit hook.
>
> On Jan 21, 2011 6:46 PM, "Gordon Tillman"  wrote:
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Memory Requirements for Riak Search index (merge_index)

2011-01-24 Thread Rusty Klophaus
Hi Alexander and Gordon,

merge_index is inspired by bitcask, but it's not a strict derivative. It
does *not* keep the keydir structure in memory, however it keeps some hints
in memory. The memory usage does grow with the number of terms, albeit
slowly, since it is mostly keeping signatures and offset information, and
the hints are compressed.

To get into the details, merge_index maintains one in-memory offset entry
for (approximately) every 64kb block of index. The offset entry consists of:

- A few bytes for bookkeeping.
- 200 bytes of fixed overhead for a bloom filter.
- A short entry for each term header in the block consisting of:
  - Offset and count information for that term.
  - Two highly compressible signatures.

Then the whole structure is compressed. In practice, memory usage has not
been a problem even for large vocabularies.

Best,
Rusty


On Sat, Jan 22, 2011 at 11:36 AM, Alexander Sicular wrote:

> Without knowing exactly, I'm gonna go with yes. I happen to be under the
> impression that the merge_index_backend is a bitcask derivative. But I would
> love to hear otherwise from someone @basho.
>
> -Alexander Sicular
>
> @siculars
>
> On Jan 22, 2011, at 9:50 AM, Gordon Tillman wrote:
>
> > Greetings All,
> >
> > It is my understanding the only backend that is compatible with Riak
> Search indexes is the merge_index_backend.
> >
> > I am wondering if merge_index backend has a similar memory footprint as
> bitcask; i.e., must the keydir structure for merge_index fit entirely in RAM
> as is the case with bitcask?
> >
> > Many thanks!
> >
> > --gordon
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re:

2011-01-24 Thread Rusty Klophaus
Hi Sebastjan,

Thanks! To run down the list:

- *Faceting:* Faceting is not supported at this time.

- *Float Fields / Date Fields: *Riak Search treats all fields as text
(comparisons happen lexicographically), so you would need to encode and/or
pad Date and Float fields. ie: Transform a float of 79.99 (for example) into
"79.99". Transform dates into "-MM-DD-HH-MM-SS" in GMT, etc.

- *Multi-Value Fields: *Multi-valued fields are supported, but are merged
together when outputting in Solr. This may or may not be what your
application expects.

- *Sorting: *Unfortunately, there is currently an open issue that
effectively breaks sorting: https://issues.basho.com/show_bug.cgi?id=867.
Depending on the size of your expected result sets, you may be able to get
by sorting the results in your application rather than in Search.

- *Fast Updates: *Fast updates are absolutely supported! Riak Search was
designed for real-time indexing from the ground up. (ie: No lag between
indexing a document and being able to search on it.)

- *Stemming: *Stemming/unstemming is not supported out of the box. You
could, however, write a custom analyzer to do this or leverage an existing
Lucene analyzer.

Best,
Rusty

On Mon, Jan 24, 2011 at 4:17 PM, Sebastjan Trepca  wrote:

>  Hey,
>
> Congrats on new release!
>
> I'm considering porting our search engine to riak-search. We're currently
> using Solr and are quite happy with it except for indexing times where riak
> could probably be much faster.
>
> Our requirements:
>  - faceting
>  - float fields
>  - multi-value fields
>  - date fields
>  - sorting by floats and dates
>  - fast updates
>  - dynamic fields
>  - stemmed/unstemmed string fields (useful to get good data from faceting)
>
> AFAIK you support dynamic fields, faceting and date fields. What about
> other points?
>
> Thanks!
>
> Sebastjan
>
> --
> http://ly.st
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re:

2011-01-24 Thread Sebastjan Trepca
Cool. Thanks for a quick reply.


Any estimates when faceting will be supported?


Sebastjan
On Monday, 24 January 2011 at 21:56, Rusty Klophaus wrote:

> Hi Sebastjan,
> 
> 
> Thanks! To run down the list:
> 
> 
> - Faceting: Faceting is not supported at this time.
> 
> 
> - Float Fields / Date Fields: Riak Search treats all fields as text 
> (comparisons happen lexicographically), so you would need to encode and/or 
> pad Date and Float fields. ie: Transform a float of 79.99 (for example) into 
> "79.99". Transform dates into "-MM-DD-HH-MM-SS" in GMT, etc.
> 
> 
> - Multi-Value Fields: Multi-valued fields are supported, but are merged 
> together when outputting in Solr. This may or may not be what your 
> application expects.
> 
> 
> - Sorting: Unfortunately, there is currently an open issue that effectively 
> breaks sorting: https://issues.basho.com/show_bug.cgi?id=867. Depending on 
> the size of your expected result sets, you may be able to get by sorting the 
> results in your application rather than in Search.
> 
> 
> - Fast Updates: Fast updates are absolutely supported! Riak Search was 
> designed for real-time indexing from the ground up. (ie: No lag between 
> indexing a document and being able to search on it.)
> 
> 
> - Stemming: Stemming/unstemming is not supported out of the box. You could, 
> however, write a custom analyzer to do this or leverage an existing Lucene 
> analyzer. 
> 
> 
> Best,
> Rusty
> 
> On Mon, Jan 24, 2011 at 4:17 PM, Sebastjan Trepca  wrote:
> 
> > Hey, 
> > 
> > 
> > Congrats on new release!
> > 
> > 
> > I'm considering porting our search engine to riak-search. We're currently 
> > using Solr and are quite happy with it except for indexing times where riak 
> > could probably be much faster.
> > 
> > 
> > Our requirements:
> > - faceting
> > - float fields
> > - multi-value fields
> > - date fields 
> > - sorting by floats and dates 
> > - fast updates
> > - dynamic fields 
> > - stemmed/unstemmed string fields (useful to get good data from faceting)
> > 
> > 
> > AFAIK you support dynamic fields, faceting and date fields. What about 
> > other points?
> > 
> > 
> > Thanks!
> > 
> > 
> > Sebastjan
> > 
> > 
> > --
> > http://ly.st
> > 
> > 
> > 
> > 
> > ___
> >  riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> > 
> > 
> 
> 
> 
> 
> 
> 




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak_kv_multi_backend + custom backend

2011-01-24 Thread Anthony Molinaro

On Mon, Jan 24, 2011 at 02:15:37PM -0500, Sean Cribbs wrote:
> Default bucket properties are defined in riak_core, not riak_kv, so you can't 
> really set the hash function as a result of setting something else.  
> Furthermore, you really only have to set these once, so you could configure 
> your application to check the bucket props on startup and set them 
> appropriately.

Do you mean in the start/2 of the custom backend, or somewhere else?  I thought
the start/2 function was passed the partition number (which in turn I thought
was a result of calling the hash), but that's just conjecture as I've not had
a chance to trace through the code.  If that is the case it seems like it would
be too late, if you are talking about my application talking to riak, I thought
there was no way via the pb client to set these options.  I'd like to avoid
using a mix of pb and HTTP in my application as it just seems to complicate
things.  I'll probably just use curl when I first set things up for now, but
figured I'd put these issues out there in case others have these problems
or the basho developers are bored and looking for things to do ;)
 
> "backend" is a property only used by the multi_backend (which isn't used that 
> frequently), so it seems a little presumptuous to special-case that property 
> (turning it into an atom).  There are a few other technical reasons that you 
> don't want to arbitrarily turn binaries into atoms.  I agree that it's not 
> especially intuitive but I think it's a small point of pain, especially if we 
> fix the documentation.

Oh, I assumed since the documentation said atom it was meant to be that, and
most other parameters are either strings, integers or atoms.  So binaries are
a bit surprising.  But I don't have a strong opinion, so I'll just document
what's working.

-Anthony

-- 

Anthony Molinaro   

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Using Key Filters to fetch object keys efficiently

2011-01-24 Thread Gordon Tillman
Greetings All,

I have a use case for our app where I need to fetch a list of keys that match 
some pattern and was hoping to be able to use key filters for that.

In my test I defined a key filter for the input phase of mapred and then 
defined just a single map phase that returns the object key.   But there is 
considerable overhead with that map phase because (I'm assuming this part) Riak 
is having to load each object to provide the necessary inputs to the map 
function.

Is there a way to do this without Riak having to actually load the objects?

Many thanks,

--gordon
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for Jan. 21 - 23

2011-01-24 Thread Mark Phillips
Evening, Afternoon, Morning to All,

Here's a short Recap for the last few days.

Enjoy.

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups



Riak Recap for Jan. 21 - 23

1) Riak Search was officially released this morning
(http://bit.ly/hGUtbr) and it looks like Timothée Peignier
(@cyberdelia) has already taken care of the Homebrew updates.
Installing the latest Riak Search on you Mac is just a 'brew install
riak-search' away. Thanks, Timothée!

* For those of you aren't Mac and/or Homebrew users, check out
http://wiki.basho.com/Riak-Search---Installation-and-Setup.html to get
Riak Search up and running on your platform of choice.

2) This one is a few days late but still worth passing along: riak-js,
Riak's node.js client, has some new enhancements in the area of PB
parsing thanks to a sizable chunk of code from Chris Moos.

* All the details are here---> https://github.com/frank06/riak-js

3) Q --- Is it possible to use key filtering in regexp like manner?
let's say i have set of keys like: mykey:tag:anothertag:anothertag.
Can I do query like mykey:tag:.*?:anothertag? (from hoodoos via #riak)

   A --- Yes. Use the "matches" filter.

* More on Key Filters here ---> http://wiki.basho.com/Key-Filters.html

4) seancribbs and hoodoos had a quick chat in #riak about backing up a
Riak cluster that's using Bitcask

* Gist here ---> https://gist.github.com/794290

5) Here are a few teasers from @j_brisbin about the latest
Riak-related code he is hacking on (hint: it involves RabbitMQ).

* http://twitter.com/j_brisbin/status/29178611613302784
* http://twitter.com/j_brisbin/status/29555108333223936

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


solr interface faceting

2011-01-24 Thread Matthew Shaw
Hi Guys,

any idea on when faceting via the solr interface will be available?

Cheers,
Matt.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com