Thanks Henrik, that's it. Fwiw I found this old post too, I am still surprised this doesn't seem to get used a lot(?). It's a "neat trick" for row-wise binary, without compiled code.
http://cyclemumner.blogspot.com.au/2010/06/read-las-data-with-r.html?m=1 Also you should look at Paul Murrell's hexView package, and associated R Journal paper. Cheers, Mike On Mon, 19 Sep 2016, 02:20 Henrik Bengtsson <henrik.bengts...@gmail.com> wrote: > I second Mike's proposal - it works, e.g. > > https://github.com/HenrikBengtsson/affxparser/blob/5bf1a9162904c56d59c4735a8d7eb427e4f085e4/R/readCcg.R#L535-L583 > > Here's an outline. Say each row consists of tuple (iiii=4-byte > integer, ffff=4-byte float, ss=2 byte integer) so that the > byte-by-byte content of your file look like this: > > iiiiffffss > iiiiffffss > iiiiffffss > ... > iiiiffffss > > Then read this is as raw bytes (file_size can also be a very large > number in case it's unknown): > > raw <- readBin(con, what="raw", n=file_size) > > Turn into a (4+4+2)-by-K raw matrix: > > raw <- matrix(raw, nrow=4+4+2) > > so that your raw bytes has the following layout: > > iii ... i > iii ... i > iii ... i > iii ... i > fff ... f > fff ... f > fff ... f > fff ... f > sss ... s > sss ... s > > Then extract the three submatrices of interest: > > iiii <- raw[1:4,] > ffff <- raw[5:8,] > ss <- raw[9:10,] > > Here you can discard raw, i.e. rm(list="raw"). > > Since R stores matrices in a column-by-column order internally, your > bytes are already in the proper order. Finally, re-read these with > appropriate readBin() settings, e.g. > > i <- readBin(iiii, what="integer", size=4L) > f <- readBin(ffff, what="double", size=4L) > s <- readBin(ss, what="integer", size=2L) > > Put into a 3-by-K data.frame: > > data <- data.frame(i=i, f=f, s=s) > > /Henrik > > On Sun, Sep 18, 2016 at 8:02 AM, Philippe de Rochambeau <phi...@free.fr> > wrote: > > I would gladly examine your example, Mike. > > Cheers, > > Philippe > > > >> Le 18 sept. 2016 à 16:05, Michael Sumner <mdsum...@gmail.com> a écrit : > >> > >> > >> > >>> On Sun, 18 Sep 2016, 19:04 Philippe de Rochambeau <phi...@free.fr> > wrote: > >>> Please find below code that attempts to read ints, longs and floats > from a binary file (which is a simplification of my original program). > >>> Please disregard the R inefficiencies, such as using rbind, for now. > >>> I’ve also included Java code to generate the binary file. > >>> The output shows that, at one point, anInt becomes undefined. > Unfortunately, I couldn’t find the correct R function to determine whether > inInt is undefined or not, as is.null, is.nan, and is.infinite don’t work. > >>> Any help would be much appreciated. > >>> Many thanks in advance. > >>> Philippe > >>> > >>> ——————— > >>> [1] "anInt = 1" > >>> [1] "is.null FALSE" > >>> [1] "is.nan FALSE" > >>> [1] "is.infinite FALSE" > >>> [1] "aLong = 2" > >>> [1] "aFloat = 3.44440007209778" > >>> [1] "--------------------------" > >>> [1] "anInt = 2" > >>> [1] "is.null FALSE" > >>> [1] "is.nan FALSE" > >>> [1] "is.infinite FALSE" > >>> [1] "aLong = 22" > >>> [1] "aFloat = 13.4644002914429" > >>> [1] "--------------------------" > >>> [1] "anInt = 3" > >>> [1] "is.null FALSE" > >>> [1] "is.nan FALSE" > >>> [1] "is.infinite FALSE" > >>> [1] "aLong = 55" > >>> [1] "aFloat = 45.4444007873535" > >>> [1] "--------------------------" > >>> [1] "anInt = " > >>> [1] "is.null FALSE" > >>> [1] "is.nan " > >>> [1] "is.infinite " > >>> [1] "aLong = " > >>> [1] "aFloat = " > >>> [1] "--------------------------" > >>> [,1] [,2] [,3] > >>> [1,] 1 2 3.4444 > >>> [2,] 2 22 13.4644 > >>> [3,] 3 55 45.4444 > >>> [4,] Integer,0 Integer,0 Numeric,0 > >>> > > >>> > >>> ----------- > >>> > >>> > >>> ————————————————————— > >>> > >>> readFile <- function(inputPath) { > >>> URL <- file(inputPath, "rb") > >>> PLT <- matrix(nrow=0, ncol=3) > >>> counte <- 0 > >>> max <- 4 > >>> while (counte < max) { > >>> anInt <- readBin(con=URL, what=integer(), size=4, n=1, > endian="big") > >>> print(paste("anInt =", anInt)) > >>> #if (! (anInt == 0)) { print(paste("empty int")); break } > >>> print(paste("is.null ", is.null(anInt))) > >>> print(paste("is.nan ", is.nan(anInt))) > >>> print(paste("is.infinite ", is.infinite(anInt))) > >>> aLong <- readBin(URL, integer(), size=8, n=1, endian="big") > >>> print(paste("aLong =", aLong)) > >>> aFloat <- readBin(URL, numeric(), size=4, n=1, endian="big") > >>> print(paste("aFloat =", aFloat)) > >>> print("--------------------------") > >>> PLT <- rbind(PLT, list(anInt, aLong, aFloat)) > >>> counte <- counte + 1 > >>> } # end while > >>> close(URL) > >>> PLT > >>> } > >>> fichier <- "/Users/philippe/Desktop/datatests/data0.bin" > >>> PLT2 <- readFile(fichier) > >>> print(PLT2) > >>> ————————————————————— > >>> > >>> import java.io.*; > >>> > >>> public class Main { > >>> > >>> Main() { > >>> writeData(); > >>> } > >>> > >>> public static void main(String[] args) { > >>> new Main(); > >>> } > >>> > >>> public void writeData() { > >>> > >>> final String path = > "/Users/philippe/Desktop/datatests/data0.bin"; > >>> > >>> DataOutputStream dos; > >>> try { > >>> dos = new DataOutputStream(new > BufferedOutputStream(new FileOutputStream(path))); > >>> // big endian write! ("high byte first") , see > https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html > >>> dos.writeInt(1); > >>> dos.writeLong(2L); > >>> dos.writeFloat(3.4444F); > >>> > >>> dos.writeInt(2); > >>> dos.writeLong(22L); > >>> dos.writeFloat(13.4644F); > >>> > >>> dos.writeInt(3); > >>> dos.writeLong(55L); > >>> dos.writeFloat(45.4444F); > >>> > >>> dos.close(); > >>> } catch (FileNotFoundException e) { > >>> e.printStackTrace(); > >>> } catch (IOException ioe) { > >>> ioe.printStackTrace(); > >>> } > >>> > >>> } > >>> > >>> } > >>> > >>> > >>> ————————————————————— > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Le 17 sept. 2016 à 20:45, Philippe de Rochambeau <phi...@free.fr> a > écrit : > >>> > > >>> > Hi Jim, > >>> > this is exactly the answer I was look for. Many thanks. I didn’t R > had a pack function, as in PERL. > >>> > To answer your earlier question, I am trying to update legacy code > to read a binary file with unknown size, over a network, slice up it into > rows each containing an integer, an integer, a long, a short, a float and a > float, and stuff the rows into a matrix. > >> > >> > >> > >> It's possible to read all rows fast as raw(), then parse in a > vectorised way with matrix indexing to group the bytes appropriately. There > is an example on the mailing list somewhere, but otherwise I can show an > example if that's of interest. > >> > >> > >> Cheers, Mike > >> > >> > >>> > Best regards, > >>> > Philippe > >>> > > >>> >> Le 17 sept. 2016 à 20:38, jim holtman <jholt...@gmail.com <mailto: > jholt...@gmail.com>> a écrit : > >>> >> > >>> >> Here is an example of how to do it: > >>> >> > >>> >> x <- 1:10 # integer values > >>> >> xf <- seq(1.0, 2, by = 0.1) # floating point > >>> >> > >>> >> setwd("d:/temp") > >>> >> > >>> >> # create file to write to > >>> >> output <- file('integer.bin', 'wb') > >>> >> writeBin(x, output) # write integer > >>> >> writeBin(xf, output) # write reals > >>> >> close(output) > >>> >> > >>> >> > >>> >> library(pack) > >>> >> library(readr) > >>> >> > >>> >> # read all the data at once > >>> >> allbin <- read_file_raw('integer.bin') > >>> >> > >>> >> # decode the data into a list > >>> >> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", > allbin)) > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> Jim Holtman > >>> >> Data Munger Guru > >>> >> > >>> >> What is the problem that you are trying to solve? > >>> >> Tell me what you want to do, not how you want to do it. > >>> >> > >>> >> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN < > sezenism...@gmail.com <mailto:sezenism...@gmail.com><mailto: > sezenism...@gmail.com <mailto:sezenism...@gmail.com>>> wrote: > >>> >> I noticed same issue but didnt care much :) > >>> >> > >>> >> On Sat, Sep 17, 2016, 18:01 jim holtman <jholt...@gmail.com > <mailto:jholt...@gmail.com> <mailto:jholt...@gmail.com <mailto: > jholt...@gmail.com>>> wrote: > >>> >> Your example was not reproducible. Also how do you "break" out of > the > >>> >> "while" loop? > >>> >> > >>> >> > >>> >> Jim Holtman > >>> >> Data Munger Guru > >>> >> > >>> >> What is the problem that you are trying to solve? > >>> >> Tell me what you want to do, not how you want to do it. > >>> >> > >>> >> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau < > phi...@free.fr <mailto:phi...@free.fr> <mailto:phi...@free.fr <mailto: > phi...@free.fr>>> > >>> >> wrote: > >>> >> > >>> >>> Hello, > >>> >>> the following function, which stores numeric values extracted from > a > >>> >>> binary file, into an R matrix, is very slow, especially when the > said file > >>> >>> is several MB in size. > >>> >>> Should I rewrite the function in inline C or in C/C++ using Rcpp? > If the > >>> >>> latter case is true, how do you « readBin » in Rcpp (I’m a total > Rcpp > >>> >>> newbie)? > >>> >>> Many thanks. > >>> >>> Best regards, > >>> >>> phiroc > >>> >>> > >>> >>> > >>> >>> ------------- > >>> >>> > >>> >>> # inputPath is something like http://myintranet/getData < > http://myintranet/getData><http://myintranet/getData < > http://myintranet/getData>>? > >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData < > http://myintranet/getData> <http://myintranet/getData < > http://myintranet/getData>>? > >>> >>> pathToFile=/usr/lib/xxx/yyy/data.bin> > >>> >>> > >>> >>> PLTreader <- function(inputPath){ > >>> >>> URL <- file(inputPath, "rb") > >>> >>> PLT <- matrix(nrow=0, ncol=6) > >>> >>> compteurDePrints = 0 > >>> >>> compteurDeLignes <- 0 > >>> >>> maxiPrints = 5 > >>> >>> displayData <- FALSE > >>> >>> while (TRUE) { > >>> >>> periodIndex <- readBin(URL, integer(), size=4, n=1, > >>> >>> endian="little") # int (4 bytes) > >>> >>> eventId <- readBin(URL, integer(), size=4, n=1, > >>> >>> endian="little") # int (4 bytes) > >>> >>> dword1 <- readBin(URL, integer(), size=4, > signed=FALSE, > >>> >>> n=1, endian="little") # int > >>> >>> dword2 <- readBin(URL, integer(), size=4, > signed=FALSE, > >>> >>> n=1, endian="little") # int > >>> >>> if (dword1 < 0) { > >>> >>> dword1 = dword1 + 2^32-1; > >>> >>> } > >>> >>> eventDate = (dword2*2^32 + dword1)/1000 > >>> >>> repNum <- readBin(URL, integer(), size=2, n=1, > >>> >>> endian="little") # short (2 bytes) > >>> >>> exp <- readBin(URL, numeric(), size=4, n=1, > >>> >>> endian="little") # float (4 bytes, strangely enough, would expect > 8) > >>> >>> loss <- readBin(URL, numeric(), size=4, n=1, > >>> >>> endian="little") # float (4 bytes) > >>> >>> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate, > >>> >>> repNum, exp, loss)) > >>> >>> } # end while > >>> >>> return(PLT) > >>> >>> close(URL) > >>> >>> } > >>> >>> > >>> >>> ---------------- > >>> >>> [[alternative HTML version deleted]] > >>> >>> > >>> >>> ______________________________________________ > >>> >>> R-help@r-project.org <mailto:R-help@r-project.org> <mailto: > R-help@r-project.org <mailto:R-help@r-project.org>> mailing list -- To > UNSUBSCRIBE and more, see > >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>< > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>> > >>> >>> PLEASE do read the posting guide http://www.R-project.org/ < > http://www.r-project.org/> <http://www.r-project.org/ < > http://www.r-project.org/>> > >>> >>> posting-guide.html > >>> >>> and provide commented, minimal, self-contained, reproducible code. > >>> >> > >>> >> [[alternative HTML version deleted]] > >>> >> > >>> >> ______________________________________________ > >>> >> R-help@r-project.org <mailto:R-help@r-project.org> <mailto: > R-help@r-project.org <mailto:R-help@r-project.org>> mailing list -- To > UNSUBSCRIBE and more, see > >>> >> https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>< > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help>> > >>> >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html> < > http://www.r-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html>> > >>> >> and provide commented, minimal, self-contained, reproducible code. > >>> > > >>> > > >>> > [[alternative HTML version deleted]] > >>> > > >>> > ______________________________________________ > >>> > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > >>> > https://stat.ethz.ch/mailman/listinfo/r-help < > https://stat.ethz.ch/mailman/listinfo/r-help> > >>> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html < > http://www.r-project.org/posting-guide.html> > >>> > and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Dr. Michael Sumner > >> Software and Database Engineer > >> Australian Antarctic Division > >> 203 Channel Highway > >> Kingston Tasmania 7050 Australia > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.