Is it possible/wise to modify riak search index objects?

2012-07-20 Thread Metin Akat
Hi,

I am using riak to store (relatively large) text files. I store them as
normal riak objects where the value is the text of the file. Now I want to
index and search them. All is fine, I just enabled the "standard" search
pre-commit hook for that bucket and they get indexed nicely. But, there is
one tricky requirement. I need to be able to index and search some metadata
about these files. For example date of submission, size of file, type
(internal business logic) of file etc.

I have been thinking quite a lot about this recently. Asked several times
on #riak. I got one answer suggesting that I create a second "metadata"
riak object for each file, link it to the "file object" and index it
separately. That's not really what I want, because I need to be able to
execute "combined" queries, like value: AND date:.

So, here is the ideal solution that I'm thinking about It would be
great if it's possible to modify the riak search index object. After the
file is submitted, and after it's indexed, I could just fetch the index and
just add some more fields to it.
I see there is a bucket with the search index objects that's automatically
created by riak search. So I guess it is indeed possible, though I don't
know what to expect. Is it a good idea? If not, what else could I do in
order to solve the problem?

Regards,
Metin
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is it possible/wise to modify riak search index objects?

2012-07-20 Thread Metin Akat
I was thinking about this too, but as I said, these text files are
sometimes quite big.  Sometimes megabytes. Rarely - tens of megabytes. They
are all "write once, read quite a lot". So having them as JSON is probably
going to put quite a lot of load onto riak and my application (deserialize
a big chunk of JSON on every read). Of course, I might be wrong, I'll have
to benchmark it probably, but I don't really feel very comfortable about
it. Besides of potentially being a performance issue, it also feels quite
ugly to me. Have you done this? How big files? How's the performance?

On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote:

> Turn your text into a json obj. Maybe something like this:
>
> { size: 100
> Name: bla
> Date: 1/1/2012
> Raw_txt: txt
> }
>
>
> @siculars
> http://siculars.posterous.com
>
> Sent from my iRotaryPhone
>
> On Jul 20, 2012, at 17:49, Metin Akat  wrote:
>
> > Hi,
> >
> > I am using riak to store (relatively large) text files. I store them as
> normal riak objects where the value is the text of the file. Now I want to
> index and search them. All is fine, I just enabled the "standard" search
> pre-commit hook for that bucket and they get indexed nicely. But, there is
> one tricky requirement. I need to be able to index and search some metadata
> about these files. For example date of submission, size of file, type
> (internal business logic) of file etc.
> >
> > I have been thinking quite a lot about this recently. Asked several
> times on #riak. I got one answer suggesting that I create a second
> "metadata" riak object for each file, link it to the "file object" and
> index it separately. That's not really what I want, because I need to be
> able to execute "combined" queries, like value: AND date: date>.
> >
> > So, here is the ideal solution that I'm thinking about It would be
> great if it's possible to modify the riak search index object. After the
> file is submitted, and after it's indexed, I could just fetch the index and
> just add some more fields to it.
> > I see there is a bucket with the search index objects that's
> automatically created by riak search. So I guess it is indeed possible,
> though I don't know what to expect. Is it a good idea? If not, what else
> could I do in order to solve the problem?
> >
> > Regards,
> > Metin
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is it possible/wise to modify riak search index objects?

2012-07-21 Thread Metin Akat
Yes, but that would require me to write a custom search analyzer to parse
this, upload erlang code to riak etc, right? Or is there something I don't
know? Please, elaborate

On Sat, Jul 21, 2012 at 11:06 AM, Alexander Sicular wrote:

> The overhead would be in parsing. But you could skip all that if you
> prepended constant length data to your text. Something like :
>
> Field:Val field:Val text
>
> Where field and Val length are constant.
>
> Maybe like a guid:100
>
> Where that guid is known to you to be the file size.
>
>
>
> @siculars
> http://siculars.posterous.com
>
> Sent from my iRotaryPhone
>
> On Jul 21, 2012, at 2:16, Metin Akat  wrote:
>
> I was thinking about this too, but as I said, these text files are
> sometimes quite big.  Sometimes megabytes. Rarely - tens of megabytes. They
> are all "write once, read quite a lot". So having them as JSON is probably
> going to put quite a lot of load onto riak and my application (deserialize
> a big chunk of JSON on every read). Of course, I might be wrong, I'll have
> to benchmark it probably, but I don't really feel very comfortable about
> it. Besides of potentially being a performance issue, it also feels quite
> ugly to me. Have you done this? How big files? How's the performance?
>
> On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote:
>
>> Turn your text into a json obj. Maybe something like this:
>>
>> { size: 100
>> Name: bla
>> Date: 1/1/2012
>> Raw_txt: txt
>> }
>>
>>
>> @siculars
>> http://siculars.posterous.com
>>
>> Sent from my iRotaryPhone
>>
>> On Jul 20, 2012, at 17:49, Metin Akat  wrote:
>>
>> > Hi,
>> >
>> > I am using riak to store (relatively large) text files. I store them as
>> normal riak objects where the value is the text of the file. Now I want to
>> index and search them. All is fine, I just enabled the "standard" search
>> pre-commit hook for that bucket and they get indexed nicely. But, there is
>> one tricky requirement. I need to be able to index and search some metadata
>> about these files. For example date of submission, size of file, type
>> (internal business logic) of file etc.
>> >
>> > I have been thinking quite a lot about this recently. Asked several
>> times on #riak. I got one answer suggesting that I create a second
>> "metadata" riak object for each file, link it to the "file object" and
>> index it separately. That's not really what I want, because I need to be
>> able to execute "combined" queries, like value: AND date:> date>.
>> >
>> > So, here is the ideal solution that I'm thinking about It would be
>> great if it's possible to modify the riak search index object. After the
>> file is submitted, and after it's indexed, I could just fetch the index and
>> just add some more fields to it.
>> > I see there is a bucket with the search index objects that's
>> automatically created by riak search. So I guess it is indeed possible,
>> though I don't know what to expect. Is it a good idea? If not, what else
>> could I do in order to solve the problem?
>> >
>> > Regards,
>> > Metin
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is it possible/wise to modify riak search index objects?

2012-07-21 Thread Metin Akat
Wow, didn't know about that. Thank you!

On Sat, Jul 21, 2012 at 11:29 AM, Alexander Sicular wrote:

> Na. It should index via the standard analyzer which splits on spaces,
> among other things (check riak handbook - great resource). The "guid:val"
> would index as a string so guid:100 should show up in a search for
> [guid:050 TO guid:250]. Try it out.
>
>
>
> @siculars
> http://siculars.posterous.com
>
> Sent from my iRotaryPhone
>
> On Jul 21, 2012, at 4:17, Metin Akat  wrote:
>
> Yes, but that would require me to write a custom search analyzer to parse
> this, upload erlang code to riak etc, right? Or is there something I don't
> know? Please, elaborate
>
> On Sat, Jul 21, 2012 at 11:06 AM, Alexander Sicular wrote:
>
>> The overhead would be in parsing. But you could skip all that if you
>> prepended constant length data to your text. Something like :
>>
>> Field:Val field:Val text
>>
>> Where field and Val length are constant.
>>
>> Maybe like a guid:100
>>
>> Where that guid is known to you to be the file size.
>>
>>
>>
>> @siculars
>> http://siculars.posterous.com
>>
>> Sent from my iRotaryPhone
>>
>> On Jul 21, 2012, at 2:16, Metin Akat  wrote:
>>
>> I was thinking about this too, but as I said, these text files are
>> sometimes quite big.  Sometimes megabytes. Rarely - tens of megabytes. They
>> are all "write once, read quite a lot". So having them as JSON is probably
>> going to put quite a lot of load onto riak and my application (deserialize
>> a big chunk of JSON on every read). Of course, I might be wrong, I'll have
>> to benchmark it probably, but I don't really feel very comfortable about
>> it. Besides of potentially being a performance issue, it also feels quite
>> ugly to me. Have you done this? How big files? How's the performance?
>>
>> On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular wrote:
>>
>>> Turn your text into a json obj. Maybe something like this:
>>>
>>> { size: 100
>>> Name: bla
>>> Date: 1/1/2012
>>> Raw_txt: txt
>>> }
>>>
>>>
>>> @siculars
>>> http://siculars.posterous.com
>>>
>>> Sent from my iRotaryPhone
>>>
>>> On Jul 20, 2012, at 17:49, Metin Akat  wrote:
>>>
>>> > Hi,
>>> >
>>> > I am using riak to store (relatively large) text files. I store them
>>> as normal riak objects where the value is the text of the file. Now I want
>>> to index and search them. All is fine, I just enabled the "standard" search
>>> pre-commit hook for that bucket and they get indexed nicely. But, there is
>>> one tricky requirement. I need to be able to index and search some metadata
>>> about these files. For example date of submission, size of file, type
>>> (internal business logic) of file etc.
>>> >
>>> > I have been thinking quite a lot about this recently. Asked several
>>> times on #riak. I got one answer suggesting that I create a second
>>> "metadata" riak object for each file, link it to the "file object" and
>>> index it separately. That's not really what I want, because I need to be
>>> able to execute "combined" queries, like value: AND date:>> date>.
>>> >
>>> > So, here is the ideal solution that I'm thinking about It would be
>>> great if it's possible to modify the riak search index object. After the
>>> file is submitted, and after it's indexed, I could just fetch the index and
>>> just add some more fields to it.
>>> > I see there is a bucket with the search index objects that's
>>> automatically created by riak search. So I guess it is indeed possible,
>>> though I don't know what to expect. Is it a good idea? If not, what else
>>> could I do in order to solve the problem?
>>> >
>>> > Regards,
>>> > Metin
>>> > ___
>>> > riak-users mailing list
>>> > riak-users@lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak taking up lots memory on a write heavy bucket?

2012-07-26 Thread Metin Akat
Hi list,

I have a bucket with allow_mult=false, last_write_wins=true.  I use it to
store text files (up to 1 megabyte of text). Some of these objects are
write heavy. The application overwrites them several times a minute (at
times).  Now I see my dev riak node (where I'm still developing the
feature) is taking up 450M of reserved memory. Is this normal? is this
caused by the bucket/objects in question? Is it dangerous, will it start
eating even more memory, or will it auto balance itself (seems likely for
now)? How can I diagnose what is taking up this memory? Riaknostic doesn't
show anything of particular interest.
Also, what are the correct settings to be used for such kind of buckets?
It's storing data of no high value (n_val=1) which is autoexpiring
(bitcask). It is also indexed in riak search. (btw, is the search index for
an object auto deleted when that object expires?).
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riakc doesn't save metadata properly?

2012-10-22 Thread Metin Akat
Oh, I see, thanks for this clarification, much appreciated.
I think this should be written somewhere in bold, as 3 days later, I still
wasn't able to find anything on google.

On Tue, Oct 23, 2012 at 12:08 AM, David Parfitt  wrote:

> Hello Metin -
>
>   Sorry for the delay. At the moment, all metadata needs to be stored
> under the <<"X-Riak-Meta">> dict key. The following example
> illustrates how this works:
>
> %% To store metadata:
> Object = riakc_obj:new(<<"groceries">>, <<"mine">>, <<"eggs & bacon">>).
> MetaData  = dict:from_list([{<<"X-Riak-Meta">>, [{"Foo", "Bar"}]}]),
> Object2 = riakc_obj:update_metadata(Object, MetaData).
> riakc_pb_socket:put(Pid, Object2).
>
> %% To retrieve metadata:
> {ok, O} = riakc_pb_socket:get(Pid, <<"groceries">>, <<"mine">>).
> {ok, MD} = dict:find(<<"X-Riak-Meta">>, riakc_obj:get_metadata(O)).
>
> We're kicking around ways to improve this in the future.
>
> Cheers -
> Dave
>
> On Fri, Oct 19, 2012 at 8:31 AM, Metin Akat  wrote:
> > I am trying to do something like:
> >
> > Meta1 = dict:store(<<"ver">>, Ver, Meta0),
> > Obj1 = riakc_obj:update_metadata(Obj, Meta1),
> >
> > and then save the object. But on a subsequent read, the metadata k/v
> pair is
> > not there.
> >
> > Is there anything special that needs to be done for this to work, or is
> this
> > illegal way to use Riak?
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riakc doesn't save metadata properly?

2012-10-22 Thread Metin Akat
Haha, well, the bold part was a bit of a joke, but any docs will be
perfect. Thanks :)

On Tue, Oct 23, 2012 at 12:25 AM, David Parfitt  wrote:

> Hello Metin -
>
> I'm trying to get that into the docs right now. Will italics do? :-)
> https://github.com/basho/riak-erlang-client/pull/74
>
> Cheers -
> Dave
>
> On Mon, Oct 22, 2012 at 5:22 PM, Metin Akat  wrote:
> > Oh, I see, thanks for this clarification, much appreciated.
> > I think this should be written somewhere in bold, as 3 days later, I
> still
> > wasn't able to find anything on google.
> >
> >
> > On Tue, Oct 23, 2012 at 12:08 AM, David Parfitt 
> wrote:
> >>
> >> Hello Metin -
> >>
> >>   Sorry for the delay. At the moment, all metadata needs to be stored
> >> under the <<"X-Riak-Meta">> dict key. The following example
> >> illustrates how this works:
> >>
> >> %% To store metadata:
> >> Object = riakc_obj:new(<<"groceries">>, <<"mine">>, <<"eggs & bacon">>).
> >> MetaData  = dict:from_list([{<<"X-Riak-Meta">>, [{"Foo", "Bar"}]}]),
> >> Object2 = riakc_obj:update_metadata(Object, MetaData).
> >> riakc_pb_socket:put(Pid, Object2).
> >>
> >> %% To retrieve metadata:
> >> {ok, O} = riakc_pb_socket:get(Pid, <<"groceries">>, <<"mine">>).
> >> {ok, MD} = dict:find(<<"X-Riak-Meta">>, riakc_obj:get_metadata(O)).
> >>
> >> We're kicking around ways to improve this in the future.
> >>
> >> Cheers -
> >> Dave
> >>
> >> On Fri, Oct 19, 2012 at 8:31 AM, Metin Akat 
> wrote:
> >> > I am trying to do something like:
> >> >
> >> > Meta1 = dict:store(<<"ver">>, Ver, Meta0),
> >> > Obj1 = riakc_obj:update_metadata(Obj, Meta1),
> >> >
> >> > and then save the object. But on a subsequent read, the metadata k/v
> >> > pair is
> >> > not there.
> >> >
> >> > Is there anything special that needs to be done for this to work, or
> is
> >> > this
> >> > illegal way to use Riak?
> >> >
> >> > ___
> >> > riak-users mailing list
> >> > riak-users@lists.basho.com
> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> >
> >
> >
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com