Hi All— I am using the R tabulizer package to extract tables from pdf files. The output is a set of lists of matrices. The package extracts tables and a lot of extra stuff which is nearly impossible to clean with RegEx. So, I want to clean it manually. To do so I need to (1) combine all lists in a single list or data frame and (2) then write the single entity to a text file to edit it. I could not figure out how.
I tried something like this but did not work. lapply(MyTables, function(x) lapply(x,write.table(file="temp.txt",append = TRUE))) Any help is greatly appreciated. Here is my code: install.packages("rJava") ;library(rJava) install.packages("tabulizer");library(tabulizer) MyPath <- "C:/Users/name/Documents/tEMP" ExtTable <- function (Path,CalOrd){ FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE) MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream")) if(CalOrd == "Yes"){ MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames)) MyOFiles <- match(MyOFiles,month.name) MyNFiles <- MyFiles[order(MyOFiles)]} else MyFiles } MyTables <- ExtTable(Path=MyPath,CalOrd = "No") Here is cleaned portion of the output: The whole output consists of 3 lists, each contains 12, 15, and 12 sub-lists. [[2]][[2]] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "" "Avg." "+_ lo" "n" "Med." "" "Avg." "+_ lo" "n" "Med." [2,] "SiOz" "44.0" "1.26" "375" "44.1" "Nb" "4.8" "6.3" "58" "2.7" [3,] "T i O 2" "0.09" "0.09" "561" "0.09" "Mo(b)" "50" "30" "3" "35" [4,] "A1203" "2.27" "1.10" "375" "2.20" "Ru(b)" "12.4" "4.1" "3" "12" [5,] "FeO total" "8.43" "1.14" "375" "8.19" "Pd(b)" "3.9" "2.1" "19" "4.1" [6,] "MnO" "0.14" "0.03" "366" "0.14" "Ag(b)" "6.8" "8.3" "17" "4.8" [7,] "MgO" "41.4" "3.00" "375" "41.2" "Cd(b)" "41" "14" "16" "37" [8,] "CaO" "2.15" "1.11" "374" "2.20" "In(b)" "12" "4" "19" "12" [9,] "Na20" "0.24" "0.16" "341" "0.21" "Sn(b)" "54" "31" "6" "36" [10,] "K20" "0.054" "0.11" "330" "0.028" "Sb(b)" "3.9" "3.9" "11" "3.2" [11,] "P205" "0.056" "0.11" "233" "0.030" "Te(b)" "11" "4" "18" "10" [12,] "Total" "98.88" "" "" "98.43" "Cs(b)" "10" "16" "17" "1.5" [13,] "" "" "" "" "" "Ba" "33" "52" "75" "17" [14,] "Mg-value" "89.8" "1.1" "375" "90.0" "La" "2.60" "5.70" "208" "0.77" [15,] "Ca/AI" "1.28" "1.6" "374" "1.35" "Ce" "6.29" "11.7" "197" "2.08" [16,] "AI/Ti" "22" "29" "361" "22" "Pr" "0.56" "0.87" "40" "0.21" [17,] "F e / M n" "60" "10" "366" "59" "Nd" "2.67" "4.31" "162" "1.52" [18,] "" "" "" "" "" "Sm" "0.47" "0.69" "214" "0.25" [19,] "Li" "1.5" "0.3" "6" "1.5" "Eu" "0.16" "0.21" "201" "0.097" [20,] "B" "0.53" "0.07" "6" "0.55" "Gd" "0.60" "0.83" "67" "0.31" [21,] "C" "110" "50" "13" "93" "Tb" "0.070" "0.064" "146" "0.056" [22,] "F" "88" "71" "15" "100" "Dy" "0.51" "0.35" "58" "0.47" [23,] "S" "157" "77" "22" "152" "Ho" "0.12" "0.14" "54" "0.090" [24,] "C1" "53" "45" "15" "75" "Er" "0.30" "0.22" "52" "0.28" [25,] "Sc" "12.2" "6.4" "220" "12.0" "Tm" "0.038" "0.026" "40" "0.035" [26,] "V" "56" "21" "132" "53" "Yb" "0.26" "0.14" "201" "0.27" [27,] "Cr" "2690" "705" "325" "2690" "Lu" "0.043" "0.023" "172" "0.045" [28,] "Co" "112" "10" "166" "111" "Hf" "0.27" "0.30" "71" "0.17" [29,] "Ni" "2160" "304" "308" "2140" "Ta" "0.40" "0.51" "38" "0.23" [30,] "Cu" "11" "9" "94" "9" "W(b)" "7.2" "5.2" "6" "4.0" [31,] "Zn" "65" "20" "129" "60" "Re(b)" "0.13" "0.11" "18" "0.09" [32,] "Ga" "2.4" "1.3" "49" "2.4" "Os(b)" "4.0" "1.8" "18" "3.7" [33,] "Ge" "0.96" "0.19" "19" "0.92" "Ir(b)" "3.7" "0.9" "34" "3.0" [34,] "As" "0.11" "0.07" "7" "0.10" "Pt(b)" "7" "-" "1" "-" [35,] "Se" "0.041" "0.056" "18" "0.025" "Au(b)" "0.65" "0.53" "30" "0.5" [36,] "Br" "0.01" "0.01" "6" "0.01" "Tl(b)" "1.2" "1.0" "13" "0.9" [37,] "Rb" "1,9" "4.8" "97" "0.38" "Pb" "0.16" "0.11" "17" "0.16" [38,] "Sr" "49" "60" "110" "20" "Bi(b)" "1.7" "0.7" "13" "1.6" [39,] "Y" "4.4" "5.5" "86" "3.1" "Th*" "0.71" "1.2" "71" "0.22" [40,] "Zr" "21" "42" "82" "8.0" "U" "0.12" "0.23" "48" "0.040" [[2]][[4]] [,1] [,2] [,3] [,4] [,5] [,6] [1,] "" "Spinel peridotites" "" "Garnet peridotites" "" "Primitive" [2,] "" "Avg. Meal." "M-A sp" "M-A gt B-M" "Jordan" "mantle" [3,] "SiO 2" "44.0 44.1" "44.15" "44.99 45.00" "45.55" "44.8" [4,] "TiO 2" "0.09 0.09" "0.07" "0.06 0.08" "0.11" "0.21" [5,] "A1203" "2.27 2.20" "1.96" "1.40 1.31" "1.43" "4.45" [6,] "Cr203" "0.39 0.39" "0.44" "0.32 0.38" "0.34" "0.43" [7,] "FeOtotal" "8.43 8.19" "8.28" "7.89 6.97" "7.61" "8.40" [8,] "Mn O" "0.14 0.14" "0.12" "0.11 0.13" "0.11" "0.14" [9,] "MgO" "41.4 41.2" "42.25" "42.60 44.86" "43.55" "37.2" [10,] "NiO" "0.27 0.27" "0.27" "0.26 0.29" "-" "0.24" [11,] "CaO" "2.15 2.20" "2.08" "0.82 0.77" "1.05" "3.60" [12,] "Na 20" "0.24 0.21" "0.18" "0.11 0.09" "0.14" "0.34" [13,] "K 2 0" "0.054 0.028" "0.05" "0.04 0.10" "0.11" "0.028" [14,] "P205" "0.056 0.030" "0.02" "- 0.01" "-" "0.022" [15,] "Total" "99.49 99.05" "99.87" "98.60 100.00" "100.00" "99.86" [16,] "Mg-value" "89.8 90.0" "90.1" "90.6 92.0" "91.1" "88.8" [17,] "olivine" "62 63" "67" "65 68" "66" "56 57" [18,] "opx" "24 24" "22" "28 25" "28" "22 17" [19,] "cpx" "12 11" "9" "3 2" "3" "19 10" [20,] "spinel" "2 2" "2" "- -" "-" "3 -" Here is portion of the output for str(MyTables): str(MyTables) List of 3 $ :List of 12 $ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-" "ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson, 1992). During this period, Northern Europe" ... ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ... ..$ : chr [1:56, 1:2] "in the till is mainly of local origin, although some cob-" "bles and boulders may have been transported over sev-" "eral kilometres. The moraine formations in the study" "area are mostly gravelly and sandy tills, locally hum-" ... ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance of all" "variables, while FA is based on the correlation structure" "of the variables. The model of factor analysis allows that" "the common factors do not explain the total variation of" ... ..$ : chr [1:54, 1:7] "lished examples of the use of factor analysis, it is neglec-" "ted that regional geochemical (and environmental) data" "almost never follow a normal distribution. Continuing Method" "with factor analysis in such a case must lead to biased" ... ..$ : chr [1:16, 1:2] "shows the factor loadings of the different variables" "entering each factor. Names of variables with an abso-" "lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8 results of factor analyses using a selection of all" ... ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact that on the" "basis of the foregoing discussion it should probably not" "be used with these data. Do these results warrant the use" "of a quite work-intensive method? Unfortunately not," ... ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ... ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g. geochemical asso-" "ciations and/or pathfinder elements for different types of" "ore deposits) was used to select further sub-sets of vari-" "ables. In geochemistry, the selection of elements entered" ... ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and is most likely" "indicative of alteration processes related to a deep-" "seated fault. It was revealed again in a factor analysis" "carried out with all those elements extracted by aqua" ... ..$ : chr [1:50, 1:2] "well justified in stating that it is not very scientific to" "play with the selection of elements and number of fac-" "tors extracted until one â\200\230â\200\230findsâ\200\231â\200\231 an â\200\230â\200\230interestingâ\200\231â\200\231 result." "On the other hand, even all the different results pre-" ... ..$ : chr [1:24, 1:2] "Niemelä, J., Ekman, I., Lukashov, A. (Eds.), 1993. Quaternary" "Deposits of Finland and Northwestern Part of Russian Fed-" "eration and Their Resources 1:1,000,000. Geological Survey" "of Finland, Espoo, Finland." ... $ :List of 15 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.