hi Andrew, On Python 2.7 you need to run both
pip install -r requirements.txt pip install -r requirements-test.txt It looks like your CMake version is old so ZSTD was disabled. zstd cannot be built automatically from source for CMake versions less than 3.7 You will have a better time if you use conda to manage your build toolchain, see https://github.com/apache/arrow/blob/master/docs/source/python/development.rst - Wes On Tue, Jan 8, 2019 at 6:48 PM Andrew Palumbo <ap....@outlook.com> wrote: > > Hello, > I'm just building arrow from source from a fresh checkout; commit: > > 326015cfc66e1f657cdd6811620137e9e277b43d > > Everything seems to build against python 2.7: > > > $python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet > --with-plasma --inplace > > {...} > Bundling includes: release/include > release/gandiva.so > Cython module gandiva failure permitted > ('Moving generated C++ source', 'lib.cpp', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/lib.cpp') > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/lib.so') > ('Moving generated C++ source', '_csv.cpp', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_csv.cpp') > ('Moving built C-extension', 'release/_csv.so', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_csv.so') > release/_cuda.so > Cython module _cuda failure permitted > ('Moving generated C++ source', '_parquet.cpp', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_parquet.cpp') > ('Moving built C-extension', 'release/_parquet.so', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_parquet.so') > release/_orc.so > Cython module _orc failure permitted > ('Moving generated C++ source', '_plasma.cpp', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_plasma.cpp') > ('Moving built C-extension', 'release/_plasma.so', 'to build path', > '/home/apalumbo/repos/arrow/python/pyarrow/_plasma.so') > {...} > > running tests though I get: > > $ py.test pyarrow > > ImportError while loading conftest > '/home/apalumbo/repos/arrow/python/pyarrow/tests/conftest.py'. > ../../pyarrow/lib/python2.7/site-packages/six.py:709: in exec_ > exec("""exec _code_ in _globs_, _locs_""") > pyarrow/tests/conftest.py:20: in <module> > import hypothesis as h > E ImportError: No module named hypothesis > > > > after a pip install of `hypothesis` in my venv, (Python 2.7) I am able to run > the tests. > > Several fail right off the bat (seems like many of the errors are > Pandas-related (see bottom for stack trace): > > > > Switching to a virtualenv Running Python 3.5, the build fails: > > > $make -j4 > > {...} > make[2]: *** > [src/arrow/python/CMakeFiles/arrow_python_objlib.dir/benchmark.cc.o] Error 1 > CMakeFiles/Makefile2:1862: recipe for target > 'src/arrow/python/CMakeFiles/arrow_python_objlib.dir/all' failed > make[1]: *** [src/arrow/python/CMakeFiles/arrow_python_objlib.dir/all] Error 2 > make[1]: *** Waiting for unfinished jobs.... > -- glog_ep install command succeeded. See also > /home/apalumbo/repos/arrow/cpp/build/glog_ep-prefix/src/glog_ep-stamp/glog_ep-install-*.log > [ 40%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/common.cc.o > [ 40%] Completed 'glog_ep' > [ 40%] Built target glog_ep > [ 41%] Building CXX object > src/plasma/CMakeFiles/plasma_objlib.dir/eviction_policy.cc.o > [ 41%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/events.cc.o > [ 42%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/fling.cc.o > [ 42%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/io.cc.o > [ 43%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/malloc.cc.o > [ 43%] Building CXX object src/plasma/CMakeFiles/plasma_objlib.dir/plasma.cc.o > [ 44%] Building CXX object > src/plasma/CMakeFiles/plasma_objlib.dir/protocol.cc.o > [ 44%] Building C object > src/plasma/CMakeFiles/plasma_objlib.dir/thirdparty/ae/ae.c.o > [ 44%] Built target plasma_objlib > -- jemalloc_ep build command succeeded. See also > /home/apalumbo/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-build-*.log > [ 45%] Performing install step for 'jemalloc_ep' > -- jemalloc_ep install command succeeded. See also > /home/apalumbo/repos/arrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-install-*.log > [ 45%] Completed 'jemalloc_ep' > [ 45%] Built target jemalloc_ep > Makefile:138: recipe for target 'all' failed > make: *** [all] Error 2 > > > > > Any thoughts? I', building with the instructions from > https://arrow.apache.org/docs/python/development.html#development > > Thanks in advance, > > Andy > > > > > > > > > Partial stack trace (python 2.7) : > > $py.test pyarrow > > {...} > > > [5000 rows x 1 columns] > schema = None, preserve_index = False, nthreads = 16, columns = None, safe = > True > > def dataframe_to_arrays(df, schema, preserve_index, nthreads=1, > columns=None, > safe=True): > names, column_names, index_columns, index_column_names, \ > columns_to_convert, convert_types = _get_columns_to_convert( > df, schema, preserve_index, columns > ) > > # NOTE(wesm): If nthreads=None, then we use a heuristic to decide > whether > # using a thread pool is worth it. Currently the heuristic is whether > the > # nrows > 100 * ncols. > if nthreads is None: > nrows, ncols = len(df), len(df.columns) > if nrows > ncols * 100: > nthreads = pa.cpu_count() > else: > nthreads = 1 > > def convert_column(col, ty): > try: > return pa.array(col, type=ty, from_pandas=True, safe=safe) > except (pa.ArrowInvalid, > pa.ArrowNotImplementedError, > pa.ArrowTypeError) as e: > e.args += ("Conversion failed for column {0!s} with type > {1!s}" > .format(col.name, col.dtype),) > raise e > > if nthreads == 1: > arrays = [convert_column(c, t) > for c, t in zip(columns_to_convert, > convert_types)] > else: > > from concurrent import futures > E ImportError: No module named concurrent > > pyarrow/pandas_compat.py:430: ImportError > ___________________________________________________ test_compress_decompress > ___________________________________________________ > > def test_compress_decompress(): > INPUT_SIZE = 10000 > test_data = (np.random.randint(0, 255, size=INPUT_SIZE) > .astype(np.uint8) > .tostring()) > test_buf = pa.py_buffer(test_data) > > codecs = ['lz4', 'snappy', 'gzip', 'zstd', 'brotli'] > for codec in codecs: > > compressed_buf = pa.compress(test_buf, codec=codec) > > pyarrow/tests/test_io.py:508: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > pyarrow/io.pxi:1340: in pyarrow.lib.compress > check_status(CCodec.Create(c_codec, &compressor)) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > raise ArrowNotImplementedError(message) > E ArrowNotImplementedError: ZSTD codec support not built > > pyarrow/error.pxi:89: ArrowNotImplementedError > _______________________________________________ > test_compressed_roundtrip[zstd] > ________________________________________________ > > compression = 'zstd' > > @pytest.mark.parametrize("compression", > ["bz2", "brotli", "gzip", "lz4", "zstd"]) > def test_compressed_roundtrip(compression): > data = b"some test data\n" * 10 + b"eof\n" > raw = pa.BufferOutputStream() > try: > > with pa.CompressedOutputStream(raw, compression) as compressed: > > pyarrow/tests/test_io.py:1045: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > pyarrow/io.pxi:1149: in pyarrow.lib.CompressedOutputStream.__init__ > self._init(stream, compression_type) > pyarrow/io.pxi:1162: in pyarrow.lib.CompressedOutputStream._init > _make_compressed_output_stream(stream.get_output_stream(), > pyarrow/io.pxi:1087: in pyarrow.lib._make_compressed_output_stream > check_status(CCodec.Create(compression_type, &codec)) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > raise ArrowNotImplementedError(message) > E ArrowNotImplementedError: ZSTD codec support not built > > pyarrow/error.pxi:89: ArrowNotImplementedError > __________________________________________ > test_pandas_serialize_round_trip_nthreads > ___________________________________________ > > def test_pandas_serialize_round_trip_nthreads(): > index = pd.Index([1, 2, 3], name='my_index') > columns = ['foo', 'bar'] > df = pd.DataFrame( > {'foo': [1.5, 1.6, 1.7], 'bar': list('abc')}, > index=index, columns=columns > ) > > _check_serialize_pandas_round_trip(df, use_threads=True) > > pyarrow/tests/test_ipc.py:536: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > pyarrow/tests/test_ipc.py:514: in _check_serialize_pandas_round_trip > buf = pa.serialize_pandas(df, nthreads=2 if use_threads else 1) > pyarrow/ipc.py:163: in serialize_pandas > preserve_index=preserve_index) > pyarrow/table.pxi:864: in pyarrow.lib.RecordBatch.from_pandas > names, arrays, metadata = pdcompat.dataframe_to_arrays( > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > df = foo bar > my_index > 1 1.5 a > 2 1.6 b > 3 1.7 c, schema = None > preserve_index = True, nthreads = 2, columns = None, safe = True > > def dataframe_to_arrays(df, schema, preserve_index, nthreads=1, > columns=None, > safe=True): > names, column_names, index_columns, index_column_names, \ > columns_to_convert, convert_types = _get_columns_to_convert( > df, schema, preserve_index, columns > ) > > # NOTE(wesm): If nthreads=None, then we use a heuristic to decide > whether > # using a thread pool is worth it. Currently the heuristic is whether > the > # nrows > 100 * ncols. > if nthreads is None: > nrows, ncols = len(df), len(df.columns) > if nrows > ncols * 100: > nthreads = pa.cpu_count() > else: > nthreads = 1 > > def convert_column(col, ty): > try: > return pa.array(col, type=ty, from_pandas=True, safe=safe) > except (pa.ArrowInvalid, > pa.ArrowNotImplementedError, > pa.ArrowTypeError) as e: > e.args += ("Conversion failed for column {0!s} with type > {1!s}" > .format(col.name, col.dtype),) > raise e > > if nthreads == 1: > arrays = [convert_column(c, t) > for c, t in zip(columns_to_convert, > convert_types)] > else: > > from concurrent import futures > E ImportError: No module named concurrent > > pyarrow/pandas_compat.py:430: ImportError > ======================================================= warnings summary > ======================================================= > pyarrow/tests/test_convert_pandas.py::TestConvertMetadata::test_empty_list_metadata > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > pyarrow/tests/test_convert_pandas.py::TestListTypes::test_column_of_lists_first_empty > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > pyarrow/tests/test_convert_pandas.py::TestListTypes::test_empty_list_roundtrip > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > /home/apalumbo/repos/pyarrow/lib/python2.7/site-packages/pandas/core/dtypes/missing.py:431: > DeprecationWarning: The truth value of an empty array is ambiguous. > Returning False, but in future this will result in an error. Use `array.size > > 0` to check that an array is not empty. > if left_value != right_value: > > -- Docs: https://docs.pytest.org/en/latest/warnings.html > ========================== 45 failed, 997 passed, 194 skipped, 3 xfailed, 7 > warnings in 33.14 seconds ========================== > (pyarrow) > > >