Unless I misunderstand, df1 = readtable(file1,eltypes=[String,String,String])
seems to be what you want. If you're new to Julia, the fact that a "vector of types" really means exactly that may be surprising. Let us hope that the new versions of DataFrames include a parser that doesn't treat most 10-digit numbers as Int32 on systems like yours. On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote: > > Spoke too soon. > Again I simple want the CSV column that is read in to not be an int32, but > a string. > > Still having issues casting the CSV file back into a Dataframe. > Its hard to understand why the Julia system is attempting to determine the > type of the columns when I use readtable and I have no control over this. > > Why can I not say: > df1 = readtable(file1; types=Dict(1=>String)) # assuming your account > number is column # 1 > > *Reading the Julia spec-Advanced Options for Reading CSV Files* > *readtable accepts the following optional keyword arguments:* > > *eltypes::Vector{DataType} – Specify the types of all columns. Defaults to > [].* > > > *df1 = readtable(file1, Int32::Vector(String))* > > I get > *ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}* > > Is this even an option? Or how about convert the df1_CSV to > df1_dataframe? > *df1_dataframe = convert(dataframe, df1_CSV)* > Since the CSV .read seems to give more granular control. > > > On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote: >> >> Great, that worked for forcing the column into a string type. >> Thanks >> >> On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote: >>> >>> You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/ >>> >>> In this case, you'd do: >>> >>> df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account >>> number is column # 1 >>> df2 = CSV.read(file2; types=Dict(1=>String)) >>> >>> -Jacob >>> >>> >>> On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <leant...@gmail.com> >>> wrote: >>> >>>> Using v0.5.0 >>>> I have two different 10,000 line CSV files that I am reading into two >>>> different dataframe variables using the readtable function. >>>> Each table has in common a ten digit account_number that I would like >>>> to use as an index and join into one master file. >>>> >>>> Here is the account number example in the original CSV from file1: >>>> 8018884596 >>>> 8018893530 >>>> 8018909633 >>>> >>>> When I do a readtable of this CSV into file1 then do a* >>>> typeof(file1[:account_number])* I get: >>>> *DataArrays.DataArray(Int32,1)* >>>> -571049996 >>>> -571041062 >>>> -571024959 >>>> >>>> when I do a >>>> *typeof(file2[:account_number])* >>>> *DataArrays.DataArray(String,1)* >>>> >>>> >>>> *Question: * >>>> My CSV files give no guidance that account_number should be Int32 or >>>> string type. How do I force it to make both account_number elements type >>>> String? >>>> >>>> I would like this join command to work: >>>> *new_account_join = join(file1, file2, on =:account_number,kind = >>>> :left)* >>>> >>>> But I am getting this error: >>>> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, >>>> got Array{* >>>> *Array{Symbol,1},1}* >>>> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, >>>> ::DataFrames.DataFrame, ::D* >>>> *ataFrames.DataFrame) at .\<missing>:0* >>>> >>>> >>>> Any help would be appreciated. >>>> >>>> >>>> >>>