On Mon, Apr 13, 2015 at 10:58 AM, Fabien <fabien.mauss...@gmail.com> wrote: > Now, to my questions: > 1. Does that seem reasonable?
A big issue is the use of pickle, which is: * Often suboptimal performance wise (e.g. you can't load only subsets of the data) * Makes forwards/backwards compatibility very difficult * Can make python 2/3 migrations harder * Creates data files which are difficult to analyze/fix by hand if they get broken * Is schemaless, and can accidentally include irrelevant data you didn't mean to store, making all of the above worse. * Means you have to be very careful who wrote the pickles, or you open a remote code execution vulnerability. It's common for people to forget that code is unsafe, and get themselves pwned. Security is always better if you don't do anything bad in the first place, than if you do something bad but try to manage the context in which the bad thing is done. Cap'n Proto might be a decent alternatives that gives you good performance, by letting you process only the bits of the file you want to. It is also not a walking security nightmare. > 2. Should Watershed be an object or should it be a simple dictionary? I > thought that an object could be nice, because it could take care of some > operations such as plotting and logging. Currently I defined a class > Watershed, but its attributes are defined and filled by A, B and C (this > seems a bit wrong to me). It is usually very confusing for attributes to be defined anywhere other than __init__. It's very really confusing for them to be defined by some random other function living somewhere else. > I could give more responsibilities to this class > but it might become way too big: since the whole purpose of the tool is to > work on watersheds, making a Watershed class actually sounds like a code > smell (http://en.wikipedia.org/wiki/God_object) Whether they are methods or not doesn't make this any more or less of a god object -- if it stores all this data used by all these different things, it is already a bit off. > 3. The operation A opens an external file, reads data out of it and writes > it in Watershed object. Is it a bad idea to multiprocess this? (I guess it > is, since the file might be read twice at the same time) That does sound like a bad idea, for the reason you gave. It might be possible to read it once, and share it among many processes. -- Devin -- https://mail.python.org/mailman/listinfo/python-list