2012/2/27 Petr Savicky [via R] <ml-node+s789695n4423895...@n4.nabble.com>
> On Sun, Feb 26, 2012 at 11:39:01AM -0800, mari681 wrote: > > > SORRY! > > > > The data in MyTable are tagsets of photos, like this: > > > > V1 V2 V3 V4 V5 V6 V7 V8 > > 230 green nailpolish barrym 0 0 0 0 0 > > 231 ny green brooklyn cleanup clean gowanus volunteer gcc > > 232 green saul lecture 0 0 0 0 0 > > 233 green colors cores market colores marakesh mercado malu > > 234 ny green brooklyn cleanup clean gowanus volunteer gcc > > 235 green saul lecture 0 0 0 0 0 > > 236 portrait pet white green cat canon square eos > > > > V9 V10 V11 V12 V13 V14 V15 > > 230 0 0 0 0 0 0 0 > > 231 gowanuscanalconservancy 0 0 0 0 0 0 > > 232 0 0 0 0 0 0 0 > > 233 malugreen maroc souk marrocos 0 0 0 > > 234 gowanuscanalconservancy 0 0 0 0 0 0 > > 235 0 0 0 0 0 0 0 > > 236 is eyes mark taiwan ii mk2 5d > > > > > > while data of MyVector is a list of tags (none of the columns in > particular) > > whose frequency in MyTable has to be computed. Like this: > > > > [1] "life" "wood" "pink" "house" "green" "fall" > > Hi. > > Just to be sure, in all the previous solutions, "malugreen" is not an > occurence of "green". Is this correct? > correct!! > > > MyTable has 21 millions rows and 15 columns, and the data is > "character", > > they are words. > > Do you use the argument stringsAsFactors=FALSE, when reading the data > from a file? Otherwise, character data are converted to a factor. > The discussed solutions work in both cases, however, if we try to > prepare simplified data for testing efficiency, we should use the > same column class as in the real situation. > > Ok. Thanks! > > When I tried the loop my computer crashed in the meaning that it freezed > > (froze?) and didn't allow me to do anything. The morning after I forced > it > > off and rebooted. > > This does not seem to be a consequence of a too long computation. > A possible cause can be too large memory requirements. How large memory > the R process uses after loading the data? Try gc() command after loading > the data and compare with the amount of memory available. On a Linux > machine, it is also possible to see the memory usage with "top" command > in the row, where R is reported. > > Petr. > I should have tried before with a sample of data, rather than with the whole table. I'll try again with all your suggestions. Thanks!!! marianna > > ______________________________________________ > [hidden email] <http://user/SendEmail.jtp?type=node&node=4423895&i=0>mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4423895.html > To unsubscribe from loop for a large database, click > here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4422052&code=bWFyaWFubmEuYm9sb2duZXNpQGdtYWlsLmNvbXw0NDIyMDUyfDY2MTc1ODA1OA==> > . > NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4424086.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.