Re: [julia-users] Question: Forcing readtable to create string type on import

Ralph Smith Thu, 03 Nov 2016 20:30:07 -0700

Unless I misunderstand,

df1 = readtable(file1,eltypes=[String,String,String])



seems to be what you want.

If you're new to Julia, the fact that a "vector of types" really means 
exactly that may be surprising. 

Let us hope that the new versions of DataFrames include a parser that 
doesn't treat most 10-digit numbers as Int32 on systems like yours.

On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote:
>
> Spoke too soon.  
> Again I simple want the CSV column that is read in to not be an int32, but 
> a string.
>
> Still having issues casting the CSV file back into a Dataframe.
> Its hard to understand why the Julia system is attempting to determine the 
> type of the columns when I use readtable and I have no control over this.
>
> Why can I not say:
> df1 = readtable(file1; types=Dict(1=>String)) # assuming your account 
> number is column # 1
>
> *Reading the Julia spec-Advanced Options for Reading CSV Files*
> *readtable accepts the following optional keyword arguments:*
>
> *eltypes::Vector{DataType} – Specify the types of all columns. Defaults to 
> [].*
>
>
> *df1 = readtable(file1, Int32::Vector(String))*
>
> I get 
> *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}*
>
> Is this even an option?  Or how about convert the df1_CSV to 
> df1_dataframe?  
> *df1_dataframe = convert(dataframe, df1_CSV)*
> Since the CSV .read seems to give more granular control.
>
>
> On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
>>
>> Great, that worked for forcing the column into a string type.
>> Thanks
>>
>> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
>>>
>>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/
>>>
>>> In this case, you'd do:
>>>
>>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account 
>>> number is column # 1
>>> df2 = CSV.read(file2; types=Dict(1=>String))
>>>
>>> -Jacob
>>>
>>>
>>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <leant...@gmail.com> 
>>> wrote:
>>>
>>>> Using v0.5.0
>>>> I have two different 10,000 line CSV files that I am reading into two 
>>>> different dataframe variables using the readtable function.
>>>> Each table has in common a ten digit account_number that I would like 
>>>> to use as an index and join into one master file.
>>>>
>>>> Here is the account number example in the original CSV from file1:
>>>> 8018884596
>>>> 8018893530
>>>> 8018909633
>>>>
>>>> When I do a readtable of this CSV into file1 then do a* 
>>>> typeof(file1[:account_number])* I get:
>>>> *DataArrays.DataArray(Int32,1)*
>>>>  -571049996
>>>>  -571041062
>>>>  -571024959
>>>>
>>>> when I do a 
>>>> *typeof(file2[:account_number])*
>>>> *DataArrays.DataArray(String,1)*
>>>>
>>>>
>>>> *Question:  *
>>>> My CSV files give no guidance that account_number should be Int32 or 
>>>> string type.  How do I force it to make both account_number elements type 
>>>> String?
>>>>
>>>> I would like this join command to work:
>>>> *new_account_join = join(file1, file2, on =:account_number,kind = 
>>>> :left)*
>>>>
>>>> But I am getting this error:
>>>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, 
>>>> got Array{*
>>>> *Array{Symbol,1},1}*
>>>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, 
>>>> ::DataFrames.DataFrame, ::D*
>>>> *ataFrames.DataFrame) at .\<missing>:0*
>>>>
>>>>
>>>> Any help would be appreciated.  
>>>>
>>>>
>>>>
>>>

Re: [julia-users] Question: Forcing readtable to create string type on import

Reply via email to