Just to hopefully complete this for the record.
Johannes attachment didn't make it to the list, but
in an off-list conversation, I believe the problem
is not trimming of text (i.e. removing leading and trailing whitespace)
but apparent truncation of text in an XML node.
The xmlEventParse() function is intended for very large documents
for which memory consumption may be problematic. As a result,
it passes text in an XML node to event handler functions in
segments, e.g. of up to approximatly 3000 characters.
If one expects all the text for a node to appear in
the first call, this may not be the case for large
strings. Instead, one should concatenate them across
calls to the text handler and process them when the
end of the node is encountered.

  D.

Johannes Graumann wrote:
Hi Duncan,

Thanks for your thoughts. "trim=FALSE" does not fix my issues, so I attach pared down versions of my script and data file. Thanks for any further hint.

Joh

Duncan Temple Lang wrote:

Hi Johannes

  I would "guess" that the trimming of the text occurs because
you do not specify trim = FALSE in the call to xmlEventParse().
If you specify this, you might well get the results you expect.
If not, can you post the actual file you are reading so we can
reproduce your results.

   D.

Johannes Graumann wrote:
Hello,

I wrote the function below and have the problem, that the "text" bit
returns only a trimmed version (686 chars as far as I can see) of the
content under the "fetchPeaks" condition.
Any hunches why that might be?

Thanks for pointer, Joh

xmlEventParse(fileName,
    list(
      startElement=function(name, attrs){
if(name == "scan"){
if(.GlobalEnv$ms2Scan == TRUE & .GlobalEnv$scanDone == TRUE){
cat(.GlobalEnv$scanNum,"\n")
MakeSpektrumEntry()
}
.GlobalEnv$scanDone <- FALSE
.GlobalEnv$fetchPrecMz <- FALSE
.GlobalEnv$fetchPeaks <- FALSE
.GlobalEnv$ms2Scan <- FALSE
if(attrs[["msLevel"]] == "2"){
.GlobalEnv$ms2Scan <- TRUE
.GlobalEnv$scanNum <- as.integer(attrs[["num"]])
}
} else if(name == "precursorMz" & .GlobalEnv$ms2Scan == TRUE){
.GlobalEnv$fetchPrecMz <- TRUE
} else if(name == "peaks" & .GlobalEnv$ms2Scan == TRUE){
.GlobalEnv$fetchPeaks <- TRUE
}
      },
      text=function(text){
if(.GlobalEnv$fetchPrecMz == TRUE){
.GlobalEnv$precursorMz <- as.numeric(text)
.GlobalEnv$fetchPrecMz <- FALSE
}
if(.GlobalEnv$fetchPeaks == TRUE){
.GlobalEnv$peaks <- text
.GlobalEnv$fetchPeaks <- FALSE
.GlobalEnv$scanDone <- TRUE
}
      }
    )
  )

sessionInfo()
R version 2.9.0 beta (2009-04-03 r48277)
x86_64-pc-linux-gnu

locale:

LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] caMassClass_1.6 MASS_7.2-46     digest_0.3.1    caTools_1.9
 [5] bitops_1.0-4.1  rpart_3.1-43    nnet_7.2-46     e1071_1.5-19
 [9] class_7.2-46    PROcess_1.19.1  Icens_1.15.2    survival_2.35-4
[13] RCurl_0.94-1    XML_2.3-0       rkward_0.5.0

loaded via a namespace (and not attached):
[1] tools_2.9.0

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code.


------------------------------------------------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to