On Jul 26, 2009, at 6:32 AM, Aaron Burghardt wrote:

Neither, you want an array of dictionaries where each row of CSV is a dictionary in which the values keyed to column names and each row of CSV is one dictionary object in the array.

  This is a bit more complicated than that, actually.

There's a bit of a catch-22 here. On the one hand, you have a performance consideration. On the other, you have an ease-of- programming consideration. Using NSDictionary is easier, but for moderately-sized files it is noticeably slow, for large files, it's unusably so.

If you go the dictionary route, using the keys to identify the "fields" in each row, you're storing *way* more than just the individual field contents. You're storing a copy of your field identifier keys for every field, for every row. Best-case scenario, you're storing a pointer to some object that represents the "column" to which the fields belong, but this defeats the ease-of-use with bindings as you need string keys. As I mentioned above, with increasingly large files, this dramatically increases your reading/ writing time and uses a lot of memory. But at least you get the ability to easily use bindings and to sort, all for free, performance be damned.

If you go another route (an array of arrays of strings), it's far more efficient, but adds a few programming complexities:

1 - How do you sort by a column? There's no key for sort descriptors and sorting via selector provides no way to pass additional information (such as column index or identifier).

2 - To what do you bind? The same limitation that causes concern in problem #1 makes #2 difficult ... and there is little by way of over- the-counter laxative to make #2 less difficult.

3 - If you intend to allow reordering of columns (built-in NSTableView feature) or even adding/removing columns, how do you handle keeping the columns mapped to the correct fields in the row array in the absence of an associative array (dictionary)?

The easiest solution to all three of these problems (in my opinion) is to make a "row" a custom class and a helper class (we'll call it "ColumnMapper" - one mapper shared among all rows). The row's internal storage can still be an array of strings for low overhead, but the Row class has a trick up its sleeve. It overrides -valueForUndefinedKey: so that it can still look up associative values (like a dictionary) but without storing them. The storage occurs once in the ColumnMapper.

When asked for a field value for a column, a Row asks the ColumnMapper for the index (the index in its storage array) for the field the column represents. Likewise for storing a field value. This works because, since Row doesn't respond to these column ids as keys, it KVC falls back to -valueForUndefinedKey: and our Row class overrides this and relies on the central ColumnMapper to determine where in its internal storage the value for that column ID is located.

This solves the sorting issue quite nicely too, if you sort using descriptors. Since NSSortDescriptor uses KVC, it "just works". Don't forget to google around for "Finder-like sorting" ... the built-in methods make a mess of alphanumeric strings. I leave implementing that to your imagination ... it's actually really easy if you spend a few minutes with Google.

Note also this approach requires that all rows have the same number of columns/fields. Your parsing logic will have to account for this by either automatically adjusting (fraught with complexities and assumptions) or rejecting the file and informing the user of the first row where trouble begins - ie, the first row where the number of fields/columns differ from the rest. You really should take this route anyway, since the missing field in a row might be somewhere other than the end ... so what do you do with the remaining fields in the row? They are probably in the wrong column and there's no way to know because of CSV's inherent lack of solid structure.

The only remaining problem is bindings. If you want to be able to handle any CSV file (ie, the "fields" are unknown), I'm afraid there's no way to use bindings in IB. You'll have to create the table columns (and bind them) in code once you've parsed your file and determined the number of columns. In this regard, you might find it just as easy (if not easier) to eschew Cocoa Bindings altogether and just use the NSTableDatasource protocol. It gives you more precise control over what to refresh and when. Trust me, this will come up.

Of course for very large files, both methods will be slow (and memory-intensive), and the problem becomes far more complex because then you need to start considering low-level solutions that don't ignore encodings. The anthesis to this concern is that, as the complexity and size increase, the likelihood that a human will want to see it as a table they will manually manipulate decreases (or at least, the reasonableness of the request does). At that magic tipping point, it's easy to argue that a GUI editor is no longer feasible and most of this problem goes away.

  Good luck and happy coding! :-)

--
I.S.




_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to