Dear Kenneth,
Now I have figured out what goes wrong, but not why.
I am running with Python 3.6.3 compiled with foss/2017b. I have two versions
of Tensorflow 1.4; the one built from source using your .eb, and a “binary”
variant which is installing from the binary package just like my 1.2.1
easyconfig.
I am running a cut-down version of a production script made by a student. He
did not focus on optimization, as soon as it ran fast enough on his gaming PC
at home he concentrated on the science :-) The script is probably not ideal
for timing purposes, as it does some on-the-fly generation of the training set,
but while running on CPUs only that is insignificant. On a GPU it will
probably dominate the runtime, I will address that later.
I run it CPU-only on a compute node with 16 CPU cores. For some reason that I
do not understand at all, the version of TensorFlow built from source decides
to only use two cores (on top I can see the Python process maxing out at 200%),
whereas the pre-built version uses the majority of the cores (top shows it
maxes out around 900 - 1000%). This is the reason for the discrepancy in run
time.
I tried adding
config = tf.ConfigProto()
config.intra_op_parallelism_threads = 16
config.inter_op_parallelism_threads = 16
with tf.Session(config=config) as sess:
to the script, but that did not change anything. I have no clue what the
problem is. I guess I will just continue my timing on two cores only, and
worry about this later…
I also tried to use the timing script you recommended, but regardless of which
version of TensorFlow I use it crashes with this error:
File
"/home/niflheim/schiotz/development/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py",
line 23, in <module>
from tensorflow.contrib.data.python.ops import interleave_ops
ImportError: cannot import name 'interleave_ops’
Best regards
Jakob
> On 8 Jan 2018, at 21:34, Kenneth Hoste <[email protected]> wrote:
>
> On 08/01/2018 21:28, Jakob Schiøtz wrote:
>>> On 8 Jan 2018, at 20:27, Kenneth Hoste <[email protected]> wrote:
>>>
>>> On 08/01/2018 15:48, Jakob Schiøtz wrote:
>>>> Hi Kenneth,
>>>>
>>>> I have now tested your TensorFlow 1.4.0 eb on our machines with a
>>>> real-world script. It works, but it runs three times slower than with the
>>>> prebuild TensorFlow 1.2.1 :-(
>>>>
>>>> The prebuild version complains that it was build without AVX2 etc, so I do
>>>> not really understand why it is so much slower to use the version compiled
>>>> from source - assuming of course that there is not a factor three
>>>> performance loss between 1.2.1 and 1.4.0; which seems unlikely.
>>> Wow, that must be wrong somehow...
>>>
>>> Is this on the GPU systems?
>>> You're not comparing a GPU-enabled TF 1.2 with a CPU-only TF 1.4 built with
>>> EB, are you?
>>> If you are, then a only factor 3 slower using only CPU is actually quite
>>> impressive vs GPU-enabled build. ;-)
>> No, I am comparing not-GPU enabled versions running on a machine without a
>> GPU. So that is not the problem.
>>
>> I am running a custom script training one of my students’ model. I agree
>> the result is suspicious, and I am rerunning it now (in the queue).
>>
>> I will try the benchmark you mentioned below as well; and report back - but
>> it may be a few days…
>>
>> By the way, could the difference be due to the compiler (Intel versus foss)?
>> That would be an unusually large difference, but my own MD code (ASAP)
>> displays almost a factor two difference.
>
> Which is which? Did you install the binary wheel on top of a Python built
> with foss or Intel?
>
> That could certainly matter, but I would be very surprised if it's more than
> 10-20% to be honest.
>
> I saw 10% performance loss for TF 1.4 built with intel/2017b vs foss/2017b
> (on top of Python 3.6.3) on Haswell (so the foss build was slightly faster).
>
>
> regards,
>
> Kenneth
>
>>
>> Jakob
>>
>>
>>> How are you benchmarking this exactly?
>>> When I was trying with the script from
>>> https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks,
>>> I saw 7x better performance when building TF 1.4.0 from source on Intel
>>> Haswell (no GPU) compared to a conda install (which is basically the same
>>> as using the binary wheel).
>>> On a GPU system (NVIDIA K40) with the TF 1.4.0 binary wheel I saw another
>>> 8x performance increase over the EB-installed-from-source CPU-only TF 1.4.0
>>> installation.
>>>
>>> Here's the command I was running (don't forget the change --device when
>>> running on a GPU system):
>>>
>>> python tf_cnn_benchmarks.py --device cpu --batch_size=32 --model=resnet50
>>> --variable_update=parameter_server --data_format NHWC
>>>
>>>
>>> regards,
>>>
>>> Kenneth
>>>
>>>> Best regards
>>>>
>>>> Jakob
>>>>
>>>>
>>>>> On 5 Jan 2018, at 13:57, Kenneth Hoste <[email protected]> wrote:
>>>>>
>>>>> On 04/01/2018 16:37, Jakob Schiøtz wrote:
>>>>>> Dear Kenneth, Pablo and Maxime,
>>>>>>
>>>>>> Thanks for your feedback. Yes, I will try to see if I can build from
>>>>>> source, but I will focus on the foss toolchain since we use that one for
>>>>>> our Python here (we do not have the Intel MPI license, and the iomkl
>>>>>> toolchain could not built Python last time I tried).
>>>>>>
>>>>>> I assume the reason for building from source is to ensure consistent
>>>>>> library versions etc. If that proves very difficult, could we perhaps
>>>>>> in the interim have builds (with a -bin suffix?) using the prebuilt
>>>>>> wheels?
>>>>> The main reason for building from source is performance and compatibility
>>>>> with the OS.
>>>>>
>>>>> The binary wheels that are available for TensorFlow are not compatible
>>>>> with older OS versions like CentOS 6, as I experienced first-hand when
>>>>> trying to get it to work on an older (GPU) system.
>>>>> Since the compilation from source with CUDA support didn't work yet, I
>>>>> had to resort to injecting a newer glibc version in the 'python' binary,
>>>>> which was not fun (well...).
>>>>>
>>>>> For CPU-only installations, you really have no other option than building
>>>>> from source, since the binary wheels were not built with AVX2
>>>>> instructions for example, which leads to large performance losses (some
>>>>> quick benchmarking showed a 7x increase in performance for TF 1.4 built
>>>>> with foss/2017b over using the binary wheel).
>>>>>
>>>>> For GPU installations, a similar concern arises, although it may be less
>>>>> severe there, depending on what CUDA compute capabilities the binary
>>>>> wheels were built with (I only tested the wheels on old systems with
>>>>> NVIDIA K20x/K40 GPUs, so there I doubt you'll get much performance
>>>>> increase when building from source).
>>>>>
>>>>> If it turns out to be too difficult or time-consuming to get the build
>>>>> from source with CUDA support to work, then we can of course progress
>>>>> with sticking to the binary wheel releases for now, I'm not going to
>>>>> oppose that.
>>>>>
>>>>>
>>>>> regards,
>>>>>
>>>>> Kenneth
>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Jakob
>>>>>>
>>>>>>
>>>>>>> On 4 Jan 2018, at 15:29, Kenneth Hoste <[email protected]> wrote:
>>>>>>>
>>>>>>> Dear Jakob,
>>>>>>>
>>>>>>> On 04/01/2018 10:23, Jakob Schiøtz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I made a TensorFlow easyconfig a while ago depending on Python with
>>>>>>>> the foss toolchain; and including a variant with GPU support (PR
>>>>>>>> 4904). The latter has not yet been merged, probably because it is
>>>>>>>> annoying to have something that can only build on a machine with a GPU
>>>>>>>> (it fails the sanity check otherwise, as TensorFlow with GPU support
>>>>>>>> cannot load on a machine without it).
>>>>>>> Not being able to test this on a non-GPU system is a bit unfortunate,
>>>>>>> but that's not a reason that it hasn't been merged yet, that's mostly
>>>>>>> due to a lack of time from my side to get back to it...
>>>>>>>
>>>>>>>> Since I made that PR, two newer releases of TensorFlow have appeared
>>>>>>>> (1.3 and 1.4). There are easyconfigs for 1.3 with the Intel tool
>>>>>>>> chain. I am considering making easyconfigs for TensorFlow 1.4 with
>>>>>>>> Python-3.6.3-foss-2017b (both with and without GPU support), but first
>>>>>>>> I would like to know if anybody else is doing this - it is my
>>>>>>>> impression that somebody who actually know what they are doing may be
>>>>>>>> working on TensorFlow. :-)
>>>>>>> I have spent quite a bit of time puzzling together an easyblock that
>>>>>>> supports building TensorFlow from source, see [1].
>>>>>>>
>>>>>>> It already works for non-GPU installations (see [2] for example), but
>>>>>>> it's not entirely finished yet because:
>>>>>>>
>>>>>>> * building from source with CUDA support does not work yet, the build
>>>>>>> fails with strange Bazel errors...
>>>>>>>
>>>>>>> * there are some issues when the TensorFlow easyblock is used together
>>>>>>> with --use-ccache and the Intel compilers;
>>>>>>> because two compiler wrappers are used, they end up calling each
>>>>>>> other resulting in a "fork bomb" style situation...
>>>>>>>
>>>>>>> I would really like to get it finished and have easyconfigs available
>>>>>>> for TensorFlow 1.4 and newer where we properly build TensorFlow from
>>>>>>> source rather than using the binary wheels...
>>>>>>>
>>>>>>> Are you up for giving it a try, and maybe helping out with the problems
>>>>>>> mentioned above?
>>>>>>>
>>>>>>>
>>>>>>> regards,
>>>>>>>
>>>>>>> Kenneth
>>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/easybuilders/easybuild-easyblocks/pull/1287
>>>>>>> [2] https://github.com/easybuilders/easybuild-easyconfigs/pull/5499
>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> Jakob
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>>>> Department of Physics
>>>>>>>> Technical University of Denmark
>>>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> --
>>>>>> Jakob Schiøtz, professor, Ph.D.
>>>>>> Department of Physics
>>>>>> Technical University of Denmark
>>>>>> DK-2800 Kongens Lyngby, Denmark
>>>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> Jakob Schiøtz, professor, Ph.D.
>>>> Department of Physics
>>>> Technical University of Denmark
>>>> DK-2800 Kongens Lyngby, Denmark
>>>> http://www.fysik.dtu.dk/~schiotz/
>>>>
>>>>
>>>>
>> --
>> Jakob Schiøtz, professor, Ph.D.
>> Department of Physics
>> Technical University of Denmark
>> DK-2800 Kongens Lyngby, Denmark
>> http://www.fysik.dtu.dk/~schiotz/
>>
>>
>>
>
--
Jakob Schiøtz, professor, Ph.D.
Department of Physics
Technical University of Denmark
DK-2800 Kongens Lyngby, Denmark
http://www.fysik.dtu.dk/~schiotz/