[R] Optimization problem

2010-06-17 Thread José E. Lozano
Hello, I'm facing a problem of optimization, I've already solved but I'm trying to find other answers to this problem to improve the solution. Well, to make it short: I have to set/install a number of devices in a building, and I have to give service to a number of "customers", or better say, to

Re: [R] Optimization problem

2010-06-17 Thread José E. Lozano
> How about smoothing the percentages, and then take the second derrivative to find the inflection point? > > which.max(diff(diff((lowess(percentages)$y This solution is what I've been using so far. The only difference is that I am smoothing the 1st derivative, since its the one I want to be s

Re: [R] Optimization problem

2010-06-17 Thread José E. Lozano
Hello: > Here is a general approach using smoothing using the Gasser-Mueller kernel, > which is implemented in the "lokern" package. The optimal bandwidth for > derivative estimation is automatically chosen using a plug-in approximation. > The code and the results are attached here. Maybe am I

Re: [R] Optimization problem

2010-06-18 Thread José E. Lozano
>> How about smoothing the percentages, and then take the second >> derrivative to find the inflection point? >> >> which.max(diff(diff((lowess(percentages)$y > > This solution is what I've been using so far. The only difference is that I am smoothing the 1st derivative, since its > the one

Re: [R] Optimization problem

2010-06-18 Thread José E. Lozano
> I don't see why one would want to pretend that the function is continuous. It isn't. > The x variable devices is discrete. > Moreover, the whole solution space is small: the possible solutions are integers in the range of maybe 20-30. Yes, you are right, what I'd like to think is that the outco

[R] Manage huge database

2008-09-21 Thread José E. Lozano
Hello, Recently I have been trying to open a huge database with no success. It’s a 4GB csv plain text file with around 2000 rows and over 500,000 columns/variables. I have try with The SAS System, but it reads only around 5000 columns, no more. R hangs up when opening. Is there any

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
Hello, Yihui > You can treat it as a database and use ODBC to fetch data from the CSV > file using SQL. See the package RODBC for details about database > connections. (I have dealt with similar problems before with RODBC) Thanks for your tip, I have used RODBC before to read data from MSAccess a

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> I wouldn't call a 4GB csv text file a 'database'. Obviously, a csv it's not a database itself, I tried to mean (though it seems I was not understood) that I had a huge database, exported to csv file by the people who created it (and I don’t have any idea of the original format of the database).

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> Maybe you've not lurked on R-help for long enough :) Apologies! Probably. > So, how much "design" is in this data? If none, and what you've > basically got is a 2000x50 grid of numbers, then maybe a more raw Exactly, raw data, but a little more complex since all the 50 variables are i

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> What are you going to do with the data once you have read it in? Are > all the data items numeric? If they are numeric, you would need at > least 8GB to hold one copy and probably a machine with 32GB if you > wanted to do any manipulation on the data. Well, I will use only sets of variables to

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
> So is each line just ACCGTATAT etc etc? Exacty, A_G, A_A, G_G and the such. > If you have fixed width fields in a file, so that every line is the > same length, then you can use random access methods to get to a > particular value - just multiply the line length by the row number you Nice hint

Re: [R] Manage huge database

2008-09-22 Thread José E. Lozano
Hello: I've been reading all the replies, and I think I have some good ideas to work on. Right now the code I programmed is running, It has been running in a batch process 20h now, and It has imported 1750 rows out of 2000. I will read docs for the bioconductor package, and I will check the gawk