Re: [R] dealing with a messy dataset

2017-10-05 Thread Boris Steipe
Just for the record - and posterity: this is the Wrong way to go about defining a fixed width format and the strategy has a significant probability of corrupting data in ways that are hard to spot and hard to debug. If you _have_ the specification, then _use_ the specification. Consider what yo

Re: [R] dealing with a messy dataset

2017-10-05 Thread jean-philippe
dear Jim, Yes I fixed the problem. Thanks again all of you for your contribution! This worked : start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 70, 76, 78, 83, 88, +93, 114, 122, 127) data1<-read_fwf("lvg_table2.txt",skip=70, fwf_widths(diff(start))) Well now I know how to

Re: [R] dealing with a messy dataset

2017-10-05 Thread jim holtman
You should be able to use that header information to create the correct parameters to the read_fwf function to read in the data. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Oct 5, 2017 at 11:02 AM

Re: [R] dealing with a messy dataset

2017-10-05 Thread Boris Steipe
Since you have an authoritative description of the format, by all means use that - not a guess based on a visual inspection of where data appears in a sample row. B. > On Oct 5, 2017, at 11:02 AM, jean-philippe > wrote: > > dear Jim, > > Thanks for your reply and your proposition. > >

Re: [R] dealing with a messy dataset

2017-10-05 Thread jean-philippe
dear Jim, Thanks for your reply and your proposition. I forgot to provide the header of the dataframe, here it is: Byte-by-byte Description of file: lvg_table2.dat ---

Re: [R] dealing with a messy dataset

2017-10-05 Thread jim holtman
It looks like fixed width. I just used the last position of each field to get the size and used the 'readr' package; > input <- "And XVIII 000214.5+450520 0.69 17 9 0.00 -8.7 26.8 6.44 6.78 < 6.65 -44 0.5 MESSIER031 0.6 1.54 + PAndAS-03 000356.4+40531

Re: [R] dealing with a messy dataset

2017-10-05 Thread jean-philippe
dear Boris, Thanks for your answer! Yes it seems to be a fixed-width format. I didn't remember this type of datasets since I am not used to analyze and process them. Thanks anyway, it seems to fix the problem (I just need to reflect a bit more on the width of each feature)! Cheers Jean-

Re: [R] dealing with a messy dataset

2017-10-05 Thread Boris Steipe
Is this a fixed width format? If so, read.fwf() in base, or read_fwf() in the readr package will solve the problem. You may need to trim trailing spaces though. B. > On Oct 5, 2017, at 10:12 AM, jean-philippe > wrote: > > dear R-users, > > > I am facing a quite regular and basic problem

[R] dealing with a messy dataset

2017-10-05 Thread jean-philippe
dear R-users, I am facing a quite regular and basic problem when it comes to dealing with datasets, but I cannot find any satisfying answer so far. I have a messy dataset of galaxies like that : And XVIII 000214.5+450520 0.69 17 9 0.00 -8.7 26.8 6.44 6.78 < 6.65 -44 0.5 MESSI