Luigi, Duncan answered part of your question. My feedback is to consider looking at your data using other tools besides str().
There are ways in base R to get lists of row or column names or count them or ask what types they are and so forth. Printing an entire large object is hard but printing many subsets can give you a handle on it. You may also want to use packages in the tidyverse such as dplyr and work with tibbles as a mild variation on a data.frame. I am not sure what you are hoping to do with str() besides getting the number of rows and columns but consider: dim(df) nrow(df) ncol(df) To get names: names(df) colnames(df) rownames(df) To get many kinds of info about columns in your data.frame, various functional methods like this can be used: sapply(df, typeof) The above will tell you for each column if it is an integer or double or other things. To do more interesting things there are packages. The psych package, for example, lets you get some metrics about each column: psych::describe(df) And you can use various methods of subsetting to limit what you are looking at and only show or print a manageable amount. You seem to be asking about sanity checking in your subject line and that depends on what you want to check. Clearly that can include making sure various columns of data are valid in being of the expected data type or not having any NA values or even removing outliers and so on. Tools are there for much of that including the few I mention. Your data may seem huge but I have worked on much larger ones. One suggestion is to consider trimming some of that data before working on it IF some is not needed. Both base R and the tidyverse have lots to offer to do such things. -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Luigi Marongiu Sent: Thursday, August 5, 2021 9:16 AM To: r-help <r-help@r-project.org> Subject: [R] Sanity check in loading large dataframe Hello, I am using a large spreadsheet (over 600 variables). I tried `str` to check the dimensions of the spreadsheet and I got ``` > (str(df)) 'data.frame': 302 obs. of 626 variables: $ record_id : int 1 1 1 1 1 1 1 1 1 1 ... .... $ v1_medicamento___aceta : int 1 NA NA NA NA NA NA NA NA NA ... [list output truncated] NULL ``` I understand that `[list output truncated]` means that there are more variables than those allowed by str to be displayed as rows. Thus I increased the row's output with: ``` > (str(df, list.len=1000)) 'data.frame': 302 obs. of 626 variables: $ record_id : int 1 1 1 1 1 1 1 1 1 1 ... ... NULL ``` Does `NULL` mean that some of the variables are not closed? (perhaps a missing comma somewhere) Is there a way to check the sanity of the data and avoid that some separator is not in the right place? Thank you -- Best regards, Luigi ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.