Is it possible/wise to modify riak search index objects?
Hi, I am using riak to store (relatively large) text files. I store them as normal riak objects where the value is the text of the file. Now I want to index and search them. All is fine, I just enabled the "standard" search pre-commit hook for that bucket and they get indexed nicely. But, there is one tricky requirement. I need to be able to index and search some metadata about these files. For example date of submission, size of file, type (internal business logic) of file etc. I have been thinking quite a lot about this recently. Asked several times on #riak. I got one answer suggesting that I create a second "metadata" riak object for each file, link it to the "file object" and index it separately. That's not really what I want, because I need to be able to execute "combined" queries, like value: AND date:. So, here is the ideal solution that I'm thinking about It would be great if it's possible to modify the riak search index object. After the file is submitted, and after it's indexed, I could just fetch the index and just add some more fields to it. I see there is a bucket with the search index objects that's automatically created by riak search. So I guess it is indeed possible, though I don't know what to expect. Is it a good idea? If not, what else could I do in order to solve the problem? Regards, Metin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is it possible/wise to modify riak search index objects?
I was thinking about this too, but as I said, these text files are sometimes quite big. Sometimes megabytes. Rarely - tens of megabytes. They are all "write once, read quite a lot". So having them as JSON is probably going to put quite a lot of load onto riak and my application (deserialize a big chunk of JSON on every read). Of course, I might be wrong, I'll have to benchmark it probably, but I don't really feel very comfortable about it. Besides of potentially being a performance issue, it also feels quite ugly to me. Have you done this? How big files? How's the performance? On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote: > Turn your text into a json obj. Maybe something like this: > > { size: 100 > Name: bla > Date: 1/1/2012 > Raw_txt: txt > } > > > @siculars > http://siculars.posterous.com > > Sent from my iRotaryPhone > > On Jul 20, 2012, at 17:49, Metin Akat wrote: > > > Hi, > > > > I am using riak to store (relatively large) text files. I store them as > normal riak objects where the value is the text of the file. Now I want to > index and search them. All is fine, I just enabled the "standard" search > pre-commit hook for that bucket and they get indexed nicely. But, there is > one tricky requirement. I need to be able to index and search some metadata > about these files. For example date of submission, size of file, type > (internal business logic) of file etc. > > > > I have been thinking quite a lot about this recently. Asked several > times on #riak. I got one answer suggesting that I create a second > "metadata" riak object for each file, link it to the "file object" and > index it separately. That's not really what I want, because I need to be > able to execute "combined" queries, like value: AND date: date>. > > > > So, here is the ideal solution that I'm thinking about It would be > great if it's possible to modify the riak search index object. After the > file is submitted, and after it's indexed, I could just fetch the index and > just add some more fields to it. > > I see there is a bucket with the search index objects that's > automatically created by riak search. So I guess it is indeed possible, > though I don't know what to expect. Is it a good idea? If not, what else > could I do in order to solve the problem? > > > > Regards, > > Metin > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is it possible/wise to modify riak search index objects?
Yes, but that would require me to write a custom search analyzer to parse this, upload erlang code to riak etc, right? Or is there something I don't know? Please, elaborate On Sat, Jul 21, 2012 at 11:06 AM, Alexander Sicular wrote: > The overhead would be in parsing. But you could skip all that if you > prepended constant length data to your text. Something like : > > Field:Val field:Val text > > Where field and Val length are constant. > > Maybe like a guid:100 > > Where that guid is known to you to be the file size. > > > > @siculars > http://siculars.posterous.com > > Sent from my iRotaryPhone > > On Jul 21, 2012, at 2:16, Metin Akat wrote: > > I was thinking about this too, but as I said, these text files are > sometimes quite big. Sometimes megabytes. Rarely - tens of megabytes. They > are all "write once, read quite a lot". So having them as JSON is probably > going to put quite a lot of load onto riak and my application (deserialize > a big chunk of JSON on every read). Of course, I might be wrong, I'll have > to benchmark it probably, but I don't really feel very comfortable about > it. Besides of potentially being a performance issue, it also feels quite > ugly to me. Have you done this? How big files? How's the performance? > > On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote: > >> Turn your text into a json obj. Maybe something like this: >> >> { size: 100 >> Name: bla >> Date: 1/1/2012 >> Raw_txt: txt >> } >> >> >> @siculars >> http://siculars.posterous.com >> >> Sent from my iRotaryPhone >> >> On Jul 20, 2012, at 17:49, Metin Akat wrote: >> >> > Hi, >> > >> > I am using riak to store (relatively large) text files. I store them as >> normal riak objects where the value is the text of the file. Now I want to >> index and search them. All is fine, I just enabled the "standard" search >> pre-commit hook for that bucket and they get indexed nicely. But, there is >> one tricky requirement. I need to be able to index and search some metadata >> about these files. For example date of submission, size of file, type >> (internal business logic) of file etc. >> > >> > I have been thinking quite a lot about this recently. Asked several >> times on #riak. I got one answer suggesting that I create a second >> "metadata" riak object for each file, link it to the "file object" and >> index it separately. That's not really what I want, because I need to be >> able to execute "combined" queries, like value: AND date:> date>. >> > >> > So, here is the ideal solution that I'm thinking about It would be >> great if it's possible to modify the riak search index object. After the >> file is submitted, and after it's indexed, I could just fetch the index and >> just add some more fields to it. >> > I see there is a bucket with the search index objects that's >> automatically created by riak search. So I guess it is indeed possible, >> though I don't know what to expect. Is it a good idea? If not, what else >> could I do in order to solve the problem? >> > >> > Regards, >> > Metin >> > ___ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Is it possible/wise to modify riak search index objects?
Wow, didn't know about that. Thank you! On Sat, Jul 21, 2012 at 11:29 AM, Alexander Sicular wrote: > Na. It should index via the standard analyzer which splits on spaces, > among other things (check riak handbook - great resource). The "guid:val" > would index as a string so guid:100 should show up in a search for > [guid:050 TO guid:250]. Try it out. > > > > @siculars > http://siculars.posterous.com > > Sent from my iRotaryPhone > > On Jul 21, 2012, at 4:17, Metin Akat wrote: > > Yes, but that would require me to write a custom search analyzer to parse > this, upload erlang code to riak etc, right? Or is there something I don't > know? Please, elaborate > > On Sat, Jul 21, 2012 at 11:06 AM, Alexander Sicular wrote: > >> The overhead would be in parsing. But you could skip all that if you >> prepended constant length data to your text. Something like : >> >> Field:Val field:Val text >> >> Where field and Val length are constant. >> >> Maybe like a guid:100 >> >> Where that guid is known to you to be the file size. >> >> >> >> @siculars >> http://siculars.posterous.com >> >> Sent from my iRotaryPhone >> >> On Jul 21, 2012, at 2:16, Metin Akat wrote: >> >> I was thinking about this too, but as I said, these text files are >> sometimes quite big. Sometimes megabytes. Rarely - tens of megabytes. They >> are all "write once, read quite a lot". So having them as JSON is probably >> going to put quite a lot of load onto riak and my application (deserialize >> a big chunk of JSON on every read). Of course, I might be wrong, I'll have >> to benchmark it probably, but I don't really feel very comfortable about >> it. Besides of potentially being a performance issue, it also feels quite >> ugly to me. Have you done this? How big files? How's the performance? >> >> On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote: >> >>> Turn your text into a json obj. Maybe something like this: >>> >>> { size: 100 >>> Name: bla >>> Date: 1/1/2012 >>> Raw_txt: txt >>> } >>> >>> >>> @siculars >>> http://siculars.posterous.com >>> >>> Sent from my iRotaryPhone >>> >>> On Jul 20, 2012, at 17:49, Metin Akat wrote: >>> >>> > Hi, >>> > >>> > I am using riak to store (relatively large) text files. I store them >>> as normal riak objects where the value is the text of the file. Now I want >>> to index and search them. All is fine, I just enabled the "standard" search >>> pre-commit hook for that bucket and they get indexed nicely. But, there is >>> one tricky requirement. I need to be able to index and search some metadata >>> about these files. For example date of submission, size of file, type >>> (internal business logic) of file etc. >>> > >>> > I have been thinking quite a lot about this recently. Asked several >>> times on #riak. I got one answer suggesting that I create a second >>> "metadata" riak object for each file, link it to the "file object" and >>> index it separately. That's not really what I want, because I need to be >>> able to execute "combined" queries, like value: AND date:>> date>. >>> > >>> > So, here is the ideal solution that I'm thinking about It would be >>> great if it's possible to modify the riak search index object. After the >>> file is submitted, and after it's indexed, I could just fetch the index and >>> just add some more fields to it. >>> > I see there is a bucket with the search index objects that's >>> automatically created by riak search. So I guess it is indeed possible, >>> though I don't know what to expect. Is it a good idea? If not, what else >>> could I do in order to solve the problem? >>> > >>> > Regards, >>> > Metin >>> > ___ >>> > riak-users mailing list >>> > riak-users@lists.basho.com >>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> >> > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak taking up lots memory on a write heavy bucket?
Hi list, I have a bucket with allow_mult=false, last_write_wins=true. I use it to store text files (up to 1 megabyte of text). Some of these objects are write heavy. The application overwrites them several times a minute (at times). Now I see my dev riak node (where I'm still developing the feature) is taking up 450M of reserved memory. Is this normal? is this caused by the bucket/objects in question? Is it dangerous, will it start eating even more memory, or will it auto balance itself (seems likely for now)? How can I diagnose what is taking up this memory? Riaknostic doesn't show anything of particular interest. Also, what are the correct settings to be used for such kind of buckets? It's storing data of no high value (n_val=1) which is autoexpiring (bitcask). It is also indexed in riak search. (btw, is the search index for an object auto deleted when that object expires?). ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riakc doesn't save metadata properly?
Oh, I see, thanks for this clarification, much appreciated. I think this should be written somewhere in bold, as 3 days later, I still wasn't able to find anything on google. On Tue, Oct 23, 2012 at 12:08 AM, David Parfitt wrote: > Hello Metin - > > Sorry for the delay. At the moment, all metadata needs to be stored > under the <<"X-Riak-Meta">> dict key. The following example > illustrates how this works: > > %% To store metadata: > Object = riakc_obj:new(<<"groceries">>, <<"mine">>, <<"eggs & bacon">>). > MetaData = dict:from_list([{<<"X-Riak-Meta">>, [{"Foo", "Bar"}]}]), > Object2 = riakc_obj:update_metadata(Object, MetaData). > riakc_pb_socket:put(Pid, Object2). > > %% To retrieve metadata: > {ok, O} = riakc_pb_socket:get(Pid, <<"groceries">>, <<"mine">>). > {ok, MD} = dict:find(<<"X-Riak-Meta">>, riakc_obj:get_metadata(O)). > > We're kicking around ways to improve this in the future. > > Cheers - > Dave > > On Fri, Oct 19, 2012 at 8:31 AM, Metin Akat wrote: > > I am trying to do something like: > > > > Meta1 = dict:store(<<"ver">>, Ver, Meta0), > > Obj1 = riakc_obj:update_metadata(Obj, Meta1), > > > > and then save the object. But on a subsequent read, the metadata k/v > pair is > > not there. > > > > Is there anything special that needs to be done for this to work, or is > this > > illegal way to use Riak? > > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riakc doesn't save metadata properly?
Haha, well, the bold part was a bit of a joke, but any docs will be perfect. Thanks :) On Tue, Oct 23, 2012 at 12:25 AM, David Parfitt wrote: > Hello Metin - > > I'm trying to get that into the docs right now. Will italics do? :-) > https://github.com/basho/riak-erlang-client/pull/74 > > Cheers - > Dave > > On Mon, Oct 22, 2012 at 5:22 PM, Metin Akat wrote: > > Oh, I see, thanks for this clarification, much appreciated. > > I think this should be written somewhere in bold, as 3 days later, I > still > > wasn't able to find anything on google. > > > > > > On Tue, Oct 23, 2012 at 12:08 AM, David Parfitt > wrote: > >> > >> Hello Metin - > >> > >> Sorry for the delay. At the moment, all metadata needs to be stored > >> under the <<"X-Riak-Meta">> dict key. The following example > >> illustrates how this works: > >> > >> %% To store metadata: > >> Object = riakc_obj:new(<<"groceries">>, <<"mine">>, <<"eggs & bacon">>). > >> MetaData = dict:from_list([{<<"X-Riak-Meta">>, [{"Foo", "Bar"}]}]), > >> Object2 = riakc_obj:update_metadata(Object, MetaData). > >> riakc_pb_socket:put(Pid, Object2). > >> > >> %% To retrieve metadata: > >> {ok, O} = riakc_pb_socket:get(Pid, <<"groceries">>, <<"mine">>). > >> {ok, MD} = dict:find(<<"X-Riak-Meta">>, riakc_obj:get_metadata(O)). > >> > >> We're kicking around ways to improve this in the future. > >> > >> Cheers - > >> Dave > >> > >> On Fri, Oct 19, 2012 at 8:31 AM, Metin Akat > wrote: > >> > I am trying to do something like: > >> > > >> > Meta1 = dict:store(<<"ver">>, Ver, Meta0), > >> > Obj1 = riakc_obj:update_metadata(Obj, Meta1), > >> > > >> > and then save the object. But on a subsequent read, the metadata k/v > >> > pair is > >> > not there. > >> > > >> > Is there anything special that needs to be done for this to work, or > is > >> > this > >> > illegal way to use Riak? > >> > > >> > ___ > >> > riak-users mailing list > >> > riak-users@lists.basho.com > >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > > > > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com