Unfortunately, the data is too large to fit in memory -- I must process
it in a stream.

I will look at some libraries, hoping to find an idiomatic solution. I
am sure that I am not the first one encountering this pattern.

On Thu, Nov 03 2016, Jeffrey Sarnoff wrote:

> or split the string into rows of strings and rows into individual 
> value-keeper strings and put that into a matrix of strings and process the 
> matrix, tracking row and col and checking for "error"
>
> On Thursday, November 3, 2016 at 5:15:06 AM UTC-4, Jeffrey Sarnoff wrote:
>>
>> Or, redefine the question :>
>>
>> If you are not tied to string processing, reading the test_file  as a 
>> string (if it is) and then splitting the string
>> ```julia
>>    rowstrings = map(String, split(test_file, '\n')) # need the map to 
>> avoid SubString results, if it matters
>>    # then split the rows on ';' and convert to ?Float64 with NaN for error 
>> or ?Nullable Ints
>>    # and put the values in a matrix, processing the matrix you have the 
>> rows and cols
>> ```
>>
>>
>> On Thursday, November 3, 2016 at 4:34:53 AM UTC-4, Tamas Papp wrote:
>>>
>>> Jeffrey, 
>>>
>>> Thanks, but my question was about how to have line and column in the 
>>> error message. So I would like to have an error message like this: 
>>>
>>> ERROR: Failed to parse "error" as type Int64 in column 2, line 3. 
>>>
>>> My best idea so far: catch the error at each level, and add i and line 
>>> number. But this requires two try-catch-end blocks with rethrow. 
>>>
>>> Extremely convoluted mess with rethrow here: 
>>> https://gist.github.com/tpapp/6f67ff36a228f47a1792e011d9b0fc13 
>>>
>>> It does what I want, but it is ugly. A simpler solution would be 
>>> appreciated. I am sure I am missing something. 
>>>
>>> Best, 
>>>
>>> Tamas 
>>>
>>> On Thu, Nov 03 2016, Jeffrey Sarnoff wrote: 
>>>
>>> > Tamas, 
>>> > 
>>> > running this 
>>> > 
>>> > 
>>> > 
>>> > typealias AkoString Union{String, SubString{String}} 
>>> > 
>>> > function parsefield{T <: Real, S <: AkoString}(::Type{T}, str::S) 
>>> >     result = T(0) 
>>> >     try 
>>> >         result = parse(T, str) 
>>> >     catch ArgumentError 
>>> >         errormsg = string("Failed to parse \"",str,"\" as type ", T) 
>>> >         throw(ErrorException(errormsg)) 
>>> >     end 
>>> >     return result 
>>> > end 
>>> > 
>>> > function parserow(schema, strings) 
>>> >     # keep i for reporting column, currently not used 
>>> >     [parsefield(T, string) for (i, (T, string)) in 
>>> enumerate(zip(schema, 
>>> > strings))] 
>>> > end 
>>> > 
>>> > function parsefile(io, schema) 
>>> >     line = 1 
>>> >     while !eof(io) 
>>> >         strings = split(chomp(readline(io)), ';') 
>>> >         parserow(schema, strings) 
>>> >         line += 1 # currently not used, use for error reporting 
>>> >     end 
>>> > end 
>>> > 
>>> > test_file = """ 
>>> > 1;2;3 
>>> > 4;5;6 
>>> > 7;8;error 
>>> > """ 
>>> > 
>>> > parsefile(IOBuffer(test_file), fill(Int, 3)) 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > by evaluating parsefile(...), results in 
>>> > 
>>> > 
>>> > 
>>> > julia> parsefile(IOBuffer(test_file), fill(Int, 3)) 
>>> > ERROR: Failed to parse "error" as type Int64 
>>> >  in parsefield(::Type{Int64}, ::SubString{String}) at ./REPL[2]:7 
>>> >  in (::##1#2)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at 
>>> > ./<missing>:0 
>>> >  in collect_to!(::Array{Int64,1}, 
>>> > 
>>> ::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2},
>>>  
>>>
>>> > ::Int64, ::Tuple{Int64,Tuple{Int64,Int64}}) at ./array.jl:340 
>>> >  in 
>>> > 
>>> collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2})
>>>  
>>>
>>> > at ./array.jl:308 
>>> >  in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}}, 
>>> ::Array{DataType,1}) 
>>> > at ./REPL[4]:5 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > On Wednesday, November 2, 2016 at 1:01:30 PM UTC-4, Tamas Papp wrote: 
>>> >> 
>>> >> This is a conceptual question. Consider the following (extremely 
>>> >> stylized, but self-contained) code 
>>> >> 
>>> >> parsefield{T <: Real}(::Type{T}, string) = parse(T, string) 
>>> >> 
>>> >> function parserow(schema, strings) 
>>> >>     # keep i for reporting column, currently not used 
>>> >>     [parsefield(T, string) for (i, (T, string)) in 
>>> enumerate(zip(schema, 
>>> >> strings))] 
>>> >> end 
>>> >> 
>>> >> function parsefile(io, schema) 
>>> >>     line = 1 
>>> >>     while !eof(io) 
>>> >>         strings = split(chomp(readline(io)), ';') 
>>> >>         parserow(schema, strings) 
>>> >>         line += 1 # currently not used, use for error reporting 
>>> >>     end 
>>> >> end 
>>> >> 
>>> >> test_file = """ 
>>> >> 1;2;3 
>>> >> 4;5;6 
>>> >> 7;8;error 
>>> >> """ 
>>> >> 
>>> >> parsefile(IOBuffer(test_file), fill(Int, 3)) 
>>> >> 
>>> >> This will fail with an error message 
>>> >> 
>>> >> ERROR: ArgumentError: invalid base 10 digit 'e' in "error" 
>>> >>  in tryparse_internal(::Type{Int64}, ::SubString{String}, ::Int64, 
>>> >> ::Int64, ::Int64 
>>> >> , ::Bool) at ./parse.jl:88 
>>> >>  in parse(::Type{Int64}, ::SubString{String}) at ./parse.jl:152 
>>> >>  in parsefield(::Type{Int64}, ::SubString{String}) at ./REPL[152]:1 
>>> >>  in (::##5#6)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at 
>>> >> ./<missing>:0 
>>> >>  in collect_to!(::Array{Int64,1}, 
>>> >> ::Base.Generator{Enumerate{Base.Zip2{Array{DataTy 
>>> >> pe,1},Array{SubString{String},1}}},##5#6}, ::Int64, 
>>> >> ::Tuple{Int64,Tuple{Int64,Int64 
>>> >> }}) at ./array.jl:340 
>>> >>  in 
>>> >> 
>>> collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{
>>>  
>>>
>>> >> 
>>> >> String},1}}},##5#6}) at ./array.jl:308 
>>> >>  in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}}, 
>>> >> ::Array{DataType,1}) at ./RE 
>>> >> PL[154]:5 
>>> >> 
>>> >> Instead, I would like to report something like this: 
>>> >> 
>>> >> ERROR: Failed to parse "error" as Int on line 3, column 3. 
>>> >> 
>>> >> What's the idiomatic way of doing this in Julia? My problem is that 
>>> >> parsefield fails without knowing line or column (i in parserow). I 
>>> could 
>>> >> catch and rethrow, constructing an error object gradually. Or I could 
>>> >> pass line and column numbers to parserow and parsefield for error 
>>> >> reporting, but that seems somehow inelegant (I have seen it in code 
>>> >> though). 
>>> >> 
>>> >> Best, 
>>> >> 
>>> >> Tamas 
>>> >> 
>>>
>>

Reply via email to