Re: How to extract specific columns/line items from a CSV file?

Peter M. Brigham Tue, 27 Sep 2016 05:39:26 -0700

I’m late to this party (these days too busy to peruse the list more than 
episodically).


Don’t know if this would be faster or slower or equivalent, but it’s a modular 
function:

function getColumns pData, pColNbrs, pDelim
   -- Extract specified columns from a table in order
   --    pData: a tab-delimited table (delim may be overridden with pDelim)
   --    pColNbrs: A comma separated list of columns or column ranges
   --       to be returned in the order indicated, for example "2,7,5"
   --       or an ascending column range: "3-5"
   --       or a descending column range: "5-3"
   --       or a combination "7,4-5,2,11-9"
   -- based on deleteColumns() by Hugh Senior & Michael Doub et. al., use-LC 
list
   --     adjusted by Peter M. Brigham, pmb...@gmail.com
   -- requires howMany()
   
   if pColNbrs = empty then return pData
   if pDelim = empty then put tab into pDelim
   -- first expand column ranges
   repeat for each item pCol in pColNbrs
      put offset("-",pCol) into dashPos
      if dashPos = 0 then
         put pCol & comma after tColsToReturn
      else if dashPos = 1 then
         -- for column numbers like -1, -2, etc.
         put howMany(pDelim,line 1 of pData) + 1 into nbrItems
         put pCol + 1 + nbrItems & comma after tColsToReturn
      else
         -- a column range
         put char 1 to dashPos-1 of pCol into firstColNbr
         put char dashPos+1 to -1 of pCol into lastColNbr
         if firstColNbr < lastColNbr then
            repeat with i = firstColNbr to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         else
            repeat with i = firstColNbr down to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         end if  
      end if       
   end repeat
   delete char -1 of tColsToReturn
   -- then extract columns in order
   set the columnDelimiter to pDelim
   split pData by column
   repeat for each item n in tColsToReturn
      add 1 to x
      put pData[n] into rData[x]
   end repeat
   combine rData by column
   return rData
end getColumns

function howmany pStr, pContainer, pCaseSens
   -- how many times pStr occurs in pContainer
   -- note that howmany("xx","xxxxxx") returns 3, not 5
   --     ie,  overlapping matches are not counted
   -- by Peter M. Brigham, pmb...@gmail.com — freeware
   
   if pCaseSens = empty then put false into pCaseSens
   set the casesensitive to pCaseSens
   if pStr is not in pContainer then return 0
   put len(pContainer) into origLength
   replace pStr with char 2 to -1 of pStr in pContainer
   return origLength - len(pContainer)
end howmany


> On Sep 20, 2016, at 5:14 PM, Keith Clarke <keith.cla...@me.com> wrote:
> 
> Thanks Mike (& Craig) for confirming that iteration of some sort is the way 
> to go. 
> 
> It’s interesting that the url container can be addressed directly, instead of 
> loading the file - I’d not seen that before. I’m currently pulling each file 
> into a regular variable, so will probably stick with that as only a few 
> (existing) lines of code would be saved and the net overall effect on timing 
> / resources is probably marginal if the file gets loaded behind the scenes 
> anyway.
> 
> As the column names/numbers vary by CSV file name - which I have built-out in 
> a ColumnsOfInterest field - it looks like I’ll also be safe to iterate 
> through these items to build the row extraction line dynamically, too.
> 
> Great, thanks again - I have a direction of travel.
> Best,
> Keith..
> 
>> On 20 Sep 2016, at 21:32, Mike Bonner <bonnm...@gmail.com> wrote:
>> 
>> If a file you can do this..
>> 
>> repeat for each line tline in url "file:yourfilename.csv"
>> put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
>> return after tExtracted
>> end repeat
>> delete the last char of tExtracted -- remove the extraneous return
>> 
>> Or you can put them into an array or whatever. It should be VERY fast.  If
>> the files are very large, you can instead open the file for read, read a
>> line, extract the data as above, and repeat till the end of the file. I
>> think using repeat for each with a file url loads the whole thing at once
>> then goes line to line, but I'm not sure of that.
>> 
>> On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <keith.cla...@me.com> wrote:
>> 
>>> Hi folks,
>>> I’ve a large number of sizeable CSV files from each of which I need to
>>> extract just 2 or 3 specific columns. Creating nested iterations through
>>> every item in every line seems somewhat inefficient - as does loading each
>>> full file into memory - so I feel I must be missing a trick here.
>>> 
>>> Does Livecode support any elegant way of directly manipulating or
>>> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
>>> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
>>> from a CSV container or direct from a file/URL?
>>> Best,
>>> Keith..
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: How to extract specific columns/line items from a CSV file?

Reply via email to