Re: [R] Stringr / Regular Expressions advice

Sarah Goslee Fri, 27 Jun 2014 02:54:07 -0700

Hi,

It's a good idea to copy back to the list, not just to mo, to keep the
discussion all in one place.


On Thursday, June 26, 2014, VINCENT DEAN BOYCE <vincentdeanbo...@gmail.com>
wrote:

> Sarah,
>
> Great feedback and direction. Here is the data I am working with*:
>
> > dput(head(data_log, 20))
>
> structure(list(x_reading = c(455L, 451L, 458L, 463L, 462L, 460L,
> 448L, 449L, 450L, 451L, 445L, 440L, 439L, 445L, 448L, 447L, 440L,
> 439L, 440L, 434L), y_reading = c(502L, 503L, 502L, 502L, 495L,
> 505L, 480L, 483L, 489L, 488L, 489L, 456L, 497L, 476L, 470L, 474L,
> 469L, 482L, 484L, 477L), z_reading = c(454L, 454L, 452L, 452L,
> 446L, 459L, 456L, 451L, 451L, 455L, 438L, 462L, 437L, 455L, 470L,
> 455L, 460L, 463L, 458L, 458L)), .Names = c("x_reading", "y_reading",
> "z_reading"), row.names = c(NA, 20L), class = "data.frame")
>
> *however, I am unsure why the letter "L" has been appended to each
> numerical string.
>

It denotes values stored as integers, and is nothing you need to worry
about.


> In any event, as you can see there are three columns of data named
> x_reading, y_reading and z_reading. I would like to detect patterns among
> them.
>
> For instance, let's say the pattern I wish to detect is 455, 502, 454
> across the three columns respectively. As you can see in the data, this is
> found in the first row.This particular string reoccurs numerous times
> within the dataset is what I wish to quantify - how many times the string
> 455, 502, 454 appears.
>
> Your thoughts?
>

Did you try the code I provided? It does what I think you're looking for.

Sarah


> Many thanks,
>
> Vincent
>
>
> On Thu, Jun 26, 2014 at 4:46 PM, Sarah Goslee <sarah.gos...@gmail.com
> <javascript:_e(%7B%7D,'cvml','sarah.gos...@gmail.com');>> wrote:
>
>> Hi,
>>
>> On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
>> <vincentdeanbo...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','vincentdeanbo...@gmail.com');>> wrote:
>> > Hello,
>> >
>> > Using R,  I've loaded a .cvs file comprised of several hundred rows and
>> 3
>> > columns of data. The data within maps the output of a triaxial
>> > accelerometer, a sensor which measures an object's acceleration along
>> the
>> > x,y and z axes. The data for each respective column sequentially
>> > oscillates, and ranges numerically from 100 to 500.
>>
>> If your data are numeric, why are you using stringr?
>>
>> It would be easier to provide you with an answer if we knew what your
>> data looked like.
>>
>> dput(head(yourdata, 20))
>>
>> and paste that into your non-HTML email.
>>
>> > I want create a function that parses the data and detects patterns
>> across
>> > the three columns.
>> >
>> > For instance, I would like to detect instances when the values for the
>> x,y
>> > and z columns equal 150, 200, 300 respectively. Additionally, when a
>> match
>> > is detected, I would like to know how many times the pattern appears.
>>
>> That's easy enough:
>>
>> fakedata <- data.frame(matrix(c(
>> 100, 100, 200,
>> 150, 200, 300,
>> 100, 350, 100,
>> 400, 200, 300,
>> 200, 500, 200,
>> 150, 200, 300,
>> 150, 200, 300),
>> ncol=3, byrow=TRUE))
>>
>> v.to.match <- c(150, 200, 300)
>>
>> v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))
>>
>> # which rows match
>> which(v.matches)
>>
>> # how many rows match
>> sum(v.matches)
>>
>> > I have been successful using str_detect to provide a Boolean, however it
>> > seems to only work on a single vector, i.e, "400" , not a range of
>> values
>> > i.e "400 - 450". See below:
>>
>> This is where I get confused, and where we need sample data. Are your
>> data numeric, as you state above, or some other format?
>>
>> If your data are character, and like "400 - 450", you can still match
>> them with the code I suggested above.
>>
>> > # this works
>> >> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>> >
>> > # this also works, but doesn't detect the particular range, rather the
>> > existence of the numbers
>> >> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")
>>
>> Are you trying to match any numeric value in the range 400-450? Again,
>> actual data.
>>
>> > Also, it appears that I can only apply it to a single column, not to all
>> > three columns. However I may be mistaken.
>>
>> You answer your own question unwittingly - apply().
>>
>> Sarah
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>
>

-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Stringr / Regular Expressions advice

Reply via email to