I second Mike's proposal - it works, e.g. https://github.com/HenrikBengtsson/affxparser/blob/5bf1a9162904c56d59c4735a8d7eb427e4f085e4/R/readCcg.R#L535-L583
Here's an outline. Say each row consists of tuple (iiii=4-byte integer, ffff=4-byte float, ss=2 byte integer) so that the byte-by-byte content of your file look like this: iiiiffffss iiiiffffss iiiiffffss ... iiiiffffss Then read this is as raw bytes (file_size can also be a very large number in case it's unknown): raw <- readBin(con, what="raw", n=file_size) Turn into a (4+4+2)-by-K raw matrix: raw <- matrix(raw, nrow=4+4+2) so that your raw bytes has the following layout: iii ... i iii ... i iii ... i iii ... i fff ... f fff ... f fff ... f fff ... f sss ... s sss ... s Then extract the three submatrices of interest: iiii <- raw[1:4,] ffff <- raw[5:8,] ss <- raw[9:10,] Here you can discard raw, i.e. rm(list="raw"). Since R stores matrices in a column-by-column order internally, your bytes are already in the proper order. Finally, re-read these with appropriate readBin() settings, e.g. i <- readBin(iiii, what="integer", size=4L) f <- readBin(ffff, what="double", size=4L) s <- readBin(ss, what="integer", size=2L) Put into a 3-by-K data.frame: data <- data.frame(i=i, f=f, s=s) /Henrik On Sun, Sep 18, 2016 at 8:02 AM, Philippe de Rochambeau <phi...@free.fr> wrote: > I would gladly examine your example, Mike. > Cheers, > Philippe > >> Le 18 sept. 2016 à 16:05, Michael Sumner <mdsum...@gmail.com> a écrit : >> >> >> >>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau <phi...@free.fr> wrote: >>> Please find below code that attempts to read ints, longs and floats from a >>> binary file (which is a simplification of my original program). >>> Please disregard the R inefficiencies, such as using rbind, for now. >>> I’ve also included Java code to generate the binary file. >>> The output shows that, at one point, anInt becomes undefined. >>> Unfortunately, I couldn’t find the correct R function to determine whether >>> inInt is undefined or not, as is.null, is.nan, and is.infinite don’t work. >>> Any help would be much appreciated. >>> Many thanks in advance. >>> Philippe >>> >>> ——————— >>> [1] "anInt = 1" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 2" >>> [1] "aFloat = 3.44440007209778" >>> [1] "--------------------------" >>> [1] "anInt = 2" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 22" >>> [1] "aFloat = 13.4644002914429" >>> [1] "--------------------------" >>> [1] "anInt = 3" >>> [1] "is.null FALSE" >>> [1] "is.nan FALSE" >>> [1] "is.infinite FALSE" >>> [1] "aLong = 55" >>> [1] "aFloat = 45.4444007873535" >>> [1] "--------------------------" >>> [1] "anInt = " >>> [1] "is.null FALSE" >>> [1] "is.nan " >>> [1] "is.infinite " >>> [1] "aLong = " >>> [1] "aFloat = " >>> [1] "--------------------------" >>> [,1] [,2] [,3] >>> [1,] 1 2 3.4444 >>> [2,] 2 22 13.4644 >>> [3,] 3 55 45.4444 >>> [4,] Integer,0 Integer,0 Numeric,0 >>> > >>> >>> ----------- >>> >>> >>> ————————————————————— >>> >>> readFile <- function(inputPath) { >>> URL <- file(inputPath, "rb") >>> PLT <- matrix(nrow=0, ncol=3) >>> counte <- 0 >>> max <- 4 >>> while (counte < max) { >>> anInt <- readBin(con=URL, what=integer(), size=4, n=1, endian="big") >>> print(paste("anInt =", anInt)) >>> #if (! (anInt == 0)) { print(paste("empty int")); break } >>> print(paste("is.null ", is.null(anInt))) >>> print(paste("is.nan ", is.nan(anInt))) >>> print(paste("is.infinite ", is.infinite(anInt))) >>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big") >>> print(paste("aLong =", aLong)) >>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big") >>> print(paste("aFloat =", aFloat)) >>> print("--------------------------") >>> PLT <- rbind(PLT, list(anInt, aLong, aFloat)) >>> counte <- counte + 1 >>> } # end while >>> close(URL) >>> PLT >>> } >>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin" >>> PLT2 <- readFile(fichier) >>> print(PLT2) >>> ————————————————————— >>> >>> import java.io.*; >>> >>> public class Main { >>> >>> Main() { >>> writeData(); >>> } >>> >>> public static void main(String[] args) { >>> new Main(); >>> } >>> >>> public void writeData() { >>> >>> final String path = >>> "/Users/philippe/Desktop/datatests/data0.bin"; >>> >>> DataOutputStream dos; >>> try { >>> dos = new DataOutputStream(new >>> BufferedOutputStream(new FileOutputStream(path))); >>> // big endian write! ("high byte first") , see >>> https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html >>> dos.writeInt(1); >>> dos.writeLong(2L); >>> dos.writeFloat(3.4444F); >>> >>> dos.writeInt(2); >>> dos.writeLong(22L); >>> dos.writeFloat(13.4644F); >>> >>> dos.writeInt(3); >>> dos.writeLong(55L); >>> dos.writeFloat(45.4444F); >>> >>> dos.close(); >>> } catch (FileNotFoundException e) { >>> e.printStackTrace(); >>> } catch (IOException ioe) { >>> ioe.printStackTrace(); >>> } >>> >>> } >>> >>> } >>> >>> >>> ————————————————————— >>> >>> >>> >>> >>> >>> >>> > Le 17 sept. 2016 à 20:45, Philippe de Rochambeau <phi...@free.fr> a écrit >>> > : >>> > >>> > Hi Jim, >>> > this is exactly the answer I was look for. Many thanks. I didn’t R had a >>> > pack function, as in PERL. >>> > To answer your earlier question, I am trying to update legacy code to >>> > read a binary file with unknown size, over a network, slice up it into >>> > rows each containing an integer, an integer, a long, a short, a float and >>> > a float, and stuff the rows into a matrix. >> >> >> >> It's possible to read all rows fast as raw(), then parse in a vectorised way >> with matrix indexing to group the bytes appropriately. There is an example >> on the mailing list somewhere, but otherwise I can show an example if that's >> of interest. >> >> >> Cheers, Mike >> >> >>> > Best regards, >>> > Philippe >>> > >>> >> Le 17 sept. 2016 à 20:38, jim holtman <jholt...@gmail.com >>> >> <mailto:jholt...@gmail.com>> a écrit : >>> >> >>> >> Here is an example of how to do it: >>> >> >>> >> x <- 1:10 # integer values >>> >> xf <- seq(1.0, 2, by = 0.1) # floating point >>> >> >>> >> setwd("d:/temp") >>> >> >>> >> # create file to write to >>> >> output <- file('integer.bin', 'wb') >>> >> writeBin(x, output) # write integer >>> >> writeBin(xf, output) # write reals >>> >> close(output) >>> >> >>> >> >>> >> library(pack) >>> >> library(readr) >>> >> >>> >> # read all the data at once >>> >> allbin <- read_file_raw('integer.bin') >>> >> >>> >> # decode the data into a list >>> >> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin)) >>> >> >>> >> >>> >> >>> >> >>> >> Jim Holtman >>> >> Data Munger Guru >>> >> >>> >> What is the problem that you are trying to solve? >>> >> Tell me what you want to do, not how you want to do it. >>> >> >>> >> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenism...@gmail.com >>> >> <mailto:sezenism...@gmail.com><mailto:sezenism...@gmail.com >>> >> <mailto:sezenism...@gmail.com>>> wrote: >>> >> I noticed same issue but didnt care much :) >>> >> >>> >> On Sat, Sep 17, 2016, 18:01 jim holtman <jholt...@gmail.com >>> >> <mailto:jholt...@gmail.com> <mailto:jholt...@gmail.com >>> >> <mailto:jholt...@gmail.com>>> wrote: >>> >> Your example was not reproducible. Also how do you "break" out of the >>> >> "while" loop? >>> >> >>> >> >>> >> Jim Holtman >>> >> Data Munger Guru >>> >> >>> >> What is the problem that you are trying to solve? >>> >> Tell me what you want to do, not how you want to do it. >>> >> >>> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phi...@free.fr >>> >> <mailto:phi...@free.fr> <mailto:phi...@free.fr <mailto:phi...@free.fr>>> >>> >> wrote: >>> >> >>> >>> Hello, >>> >>> the following function, which stores numeric values extracted from a >>> >>> binary file, into an R matrix, is very slow, especially when the said >>> >>> file >>> >>> is several MB in size. >>> >>> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the >>> >>> latter case is true, how do you « readBin » in Rcpp (I’m a total Rcpp >>> >>> newbie)? >>> >>> Many thanks. >>> >>> Best regards, >>> >>> phiroc >>> >>> >>> >>> >>> >>> ------------- >>> >>> >>> >>> # inputPath is something like http://myintranet/getData >>> >>> <http://myintranet/getData><http://myintranet/getData >>> >>> <http://myintranet/getData>>? >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData >>> >>> <http://myintranet/getData> <http://myintranet/getData >>> >>> <http://myintranet/getData>>? >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin> >>> >>> >>> >>> PLTreader <- function(inputPath){ >>> >>> URL <- file(inputPath, "rb") >>> >>> PLT <- matrix(nrow=0, ncol=6) >>> >>> compteurDePrints = 0 >>> >>> compteurDeLignes <- 0 >>> >>> maxiPrints = 5 >>> >>> displayData <- FALSE >>> >>> while (TRUE) { >>> >>> periodIndex <- readBin(URL, integer(), size=4, n=1, >>> >>> endian="little") # int (4 bytes) >>> >>> eventId <- readBin(URL, integer(), size=4, n=1, >>> >>> endian="little") # int (4 bytes) >>> >>> dword1 <- readBin(URL, integer(), size=4, signed=FALSE, >>> >>> n=1, endian="little") # int >>> >>> dword2 <- readBin(URL, integer(), size=4, signed=FALSE, >>> >>> n=1, endian="little") # int >>> >>> if (dword1 < 0) { >>> >>> dword1 = dword1 + 2^32-1; >>> >>> } >>> >>> eventDate = (dword2*2^32 + dword1)/1000 >>> >>> repNum <- readBin(URL, integer(), size=2, n=1, >>> >>> endian="little") # short (2 bytes) >>> >>> exp <- readBin(URL, numeric(), size=4, n=1, >>> >>> endian="little") # float (4 bytes, strangely enough, would expect 8) >>> >>> loss <- readBin(URL, numeric(), size=4, n=1, >>> >>> endian="little") # float (4 bytes) >>> >>> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, >>> >>> repNum, exp, loss)) >>> >>> } # end while >>> >>> return(PLT) >>> >>> close(URL) >>> >>> } >>> >>> >>> >>> ---------------- >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> ______________________________________________ >>> >>> R-help@r-project.org <mailto:R-help@r-project.org> >>> >>> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> mailing >>> >>> list -- To UNSUBSCRIBE and more, see >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>> <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help >>> >>> <https://stat.ethz.ch/mailman/listinfo/r-help>> >>> >>> PLEASE do read the posting guide http://www.R-project.org/ >>> >>> <http://www.r-project.org/> <http://www.r-project.org/ >>> >>> <http://www.r-project.org/>> >>> >>> posting-guide.html >>> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >>> >> [[alternative HTML version deleted]] >>> >> >>> >> ______________________________________________ >>> >> R-help@r-project.org <mailto:R-help@r-project.org> >>> >> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> mailing list >>> >> -- To UNSUBSCRIBE and more, see >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> <https://stat.ethz.ch/mailman/listinfo/r-help><https://stat.ethz.ch/mailman/listinfo/r-help >>> >> <https://stat.ethz.ch/mailman/listinfo/r-help>> >>> >> PLEASE do read the posting guide >>> >> http://www.R-project.org/posting-guide.html >>> >> <http://www.r-project.org/posting-guide.html> >>> >> <http://www.r-project.org/posting-guide.html >>> >> <http://www.r-project.org/posting-guide.html>> >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To >>> > UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > <https://stat.ethz.ch/mailman/listinfo/r-help> >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > <http://www.r-project.org/posting-guide.html> >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Dr. Michael Sumner >> Software and Database Engineer >> Australian Antarctic Division >> 203 Channel Highway >> Kingston Tasmania 7050 Australia >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.