2012/2/27 Petr Savicky [via R] <ml-node+s789695n4423895...@n4.nabble.com>

> On Sun, Feb 26, 2012 at 11:39:01AM -0800, mari681 wrote:
>
> > SORRY!
> >
> > The data in MyTable are tagsets of photos,  like this:
> >
> >       V1         V2       V3      V4      V5       V6        V7   V8
> > 230    green nailpolish   barrym       0       0        0         0    0
> > 231       ny      green brooklyn cleanup   clean  gowanus volunteer  gcc
> > 232    green       saul  lecture       0       0        0         0    0
> > 233    green     colors    cores  market colores marakesh   mercado malu
> > 234       ny      green brooklyn cleanup   clean  gowanus volunteer  gcc
> > 235    green       saul  lecture       0       0        0         0    0
> > 236 portrait        pet    white   green     cat    canon    square  eos
> >
> >                          V9   V10  V11      V12 V13 V14 V15
> > 230                       0     0    0        0   0   0   0
> > 231 gowanuscanalconservancy     0    0        0   0   0   0
> > 232                       0     0    0        0   0   0   0
> > 233               malugreen maroc souk marrocos   0   0   0
> > 234 gowanuscanalconservancy     0    0        0   0   0   0
> > 235                       0     0    0        0   0   0   0
> > 236                      is  eyes mark   taiwan  ii mk2  5d
> >
> >
> > while data of MyVector is a list of tags (none of the columns in
> particular)
> > whose frequency in MyTable has to be computed. Like this:
> >
> > [1] "life"  "wood"  "pink"  "house" "green" "fall"
>
> Hi.
>
> Just to be sure, in all the previous solutions, "malugreen" is not an
> occurence of "green". Is this correct?
>

correct!!

>
> > MyTable has 21 millions rows and 15 columns, and the data is
> "character",
> > they are words.
>
> Do you use the argument stringsAsFactors=FALSE, when reading the data
> from a file? Otherwise, character data are converted to a factor.
> The discussed solutions work in both cases, however, if we try to
> prepare simplified data for testing efficiency, we should use the
> same column class as in the real situation.
>
> Ok. Thanks!


> > When I tried the loop my computer crashed in the meaning that it freezed
> > (froze?) and didn't allow me to do anything. The morning after I forced
> it
> > off and rebooted.
>
> This does not seem to be a consequence of a too long computation.
> A possible cause can be too large memory requirements. How large memory
> the R process uses after loading the data? Try gc() command after loading
> the data and compare with the amount of memory available. On a Linux
> machine, it is also possible to see the memory usage with "top" command
> in the row, where R is reported.
>
> Petr.
>

I should have tried before with a sample of data, rather than with the
whole table. I'll try again with all your suggestions. Thanks!!!

marianna


>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4423895&i=0>mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4423895.html
>  To unsubscribe from loop for a large database, click 
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4422052&code=bWFyaWFubmEuYm9sb2duZXNpQGdtYWlsLmNvbXw0NDIyMDUyfDY2MTc1ODA1OA==>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: 
http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4424086.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to