I created https://issues.apache.org/jira/browse/ARROW-7596 and made it a blocker for 0.16.0 so this does not get lost in the shuffle
On Thu, Jan 16, 2020 at 3:43 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Joris, > > Thanks for investigating this. It seems there were some unintended > consequences of the zero-copy optimizations from ARROW-3789. Another > way forward might be to "opt in" to this behavior, or to only do the > zero copy optimizations when split_blocks=True. What do you think? > > - Wes > > On Thu, Jan 16, 2020 at 3:42 AM Joris Van den Bossche > <jorisvandenboss...@gmail.com> wrote: > > > > So the spark integration build started to fail, and with the following test > > error: > > > > ====================================================================== > > ERROR: test_toPandas_batch_order > > (pyspark.sql.tests.test_arrow.EncryptionArrowTests) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "/spark/python/pyspark/sql/tests/test_arrow.py", line 422, in > > test_toPandas_batch_order > > run_test(*case) > > File "/spark/python/pyspark/sql/tests/test_arrow.py", line 409, in > > run_test > > pdf, pdf_arrow = self._toPandas_arrow_toggle(df) > > File "/spark/python/pyspark/sql/tests/test_arrow.py", line 152, in > > _toPandas_arrow_toggle > > pdf_arrow = df.toPandas() > > File "/spark/python/pyspark/sql/pandas/conversion.py", line 115, in > > toPandas > > return _check_dataframe_localize_timestamps(pdf, timezone) > > File "/spark/python/pyspark/sql/pandas/types.py", line 180, in > > _check_dataframe_localize_timestamps > > pdf[column] = _check_series_localize_timestamps(series, timezone) > > File > > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py", > > line 3487, in __setitem__ > > self._set_item(key, value) > > File > > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/frame.py", > > line 3565, in _set_item > > NDFrame._set_item(self, key, value) > > File > > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/generic.py", > > line 3381, in _set_item > > self._data.set(key, value) > > File > > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/managers.py", > > line 1090, in set > > blk.set(blk_locs, value_getitem(val_locs)) > > File > > "/opt/conda/envs/arrow/lib/python3.7/site-packages/pandas/core/internals/blocks.py", > > line 380, in set > > self.values[locs] = values > > ValueError: assignment destination is read-only > > > > > > It's from a test that is doing conversions from spark to arrow to pandas > > (so calling pyarrow.Table.to_pandas here > > <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/conversion.py#L111-L115>), > > and on the resulting DataFrame, it is iterating through all columns, > > potentially fixing timezones, and writing each column back into the > > DataFrame (here > > <https://github.com/apache/spark/blob/018bdcc53c925072b07956de0600452ad255b9c7/python/pyspark/sql/pandas/types.py#L179-L181> > > ). > > > > Since it is giving an error about read-only, it might be related to > > zero-copy behaviour of to_pandas, and thus might be related to the refactor > > of the arrow->pandas conversion that landed yesterday ( > > https://github.com/apache/arrow/pull/6067, it says it changed to do > > zero-copy for 1-column blocks if possible). > > I am not sure if something should be fixed in pyarrow for this, but the > > obvious thing that pyspark can do is specify they don't want zero-copy. > > > > Joris > > > > On Wed, 15 Jan 2020 at 14:32, Crossbow <cross...@ursalabs.org> wrote: > > > > > > > > Arrow Build Report for Job nightly-2020-01-15-0 > > > > > > All tasks: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0 > > > > > > Failed Tasks: > > > - gandiva-jar-osx: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-osx > > > - test-conda-python-3.7-spark-master: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-spark-master > > > - wheel-manylinux2014-cp35m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp35m > > > > > > Succeeded Tasks: > > > - centos-6: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-6 > > > - centos-7: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-7 > > > - centos-8: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-centos-8 > > > - conda-linux-gcc-py27: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py27 > > > - conda-linux-gcc-py36: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py36 > > > - conda-linux-gcc-py37: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py37 > > > - conda-linux-gcc-py38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-linux-gcc-py38 > > > - conda-osx-clang-py27: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py27 > > > - conda-osx-clang-py36: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py36 > > > - conda-osx-clang-py37: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py37 > > > - conda-osx-clang-py38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-osx-clang-py38 > > > - conda-win-vs2015-py36: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py36 > > > - conda-win-vs2015-py37: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py37 > > > - conda-win-vs2015-py38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-conda-win-vs2015-py38 > > > - debian-buster: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-buster > > > - debian-stretch: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-debian-stretch > > > - gandiva-jar-trusty: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-gandiva-jar-trusty > > > - homebrew-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-homebrew-cpp > > > - macos-r-autobrew: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-macos-r-autobrew > > > - test-conda-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-cpp > > > - test-conda-python-2.7-pandas-latest: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7-pandas-latest > > > - test-conda-python-2.7: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-2.7 > > > - test-conda-python-3.6: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.6 > > > - test-conda-python-3.7-dask-latest: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-dask-latest > > > - test-conda-python-3.7-hdfs-2.9.2: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-hdfs-2.9.2 > > > - test-conda-python-3.7-pandas-latest: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-latest > > > - test-conda-python-3.7-pandas-master: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-pandas-master > > > - test-conda-python-3.7-turbodbc-latest: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-latest > > > - test-conda-python-3.7-turbodbc-master: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7-turbodbc-master > > > - test-conda-python-3.7: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.7 > > > - test-conda-python-3.8-dask-master: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-dask-master > > > - test-conda-python-3.8-pandas-latest: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-python-3.8-pandas-latest > > > - test-conda-r-3.6: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-conda-r-3.6 > > > - test-debian-10-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-cpp > > > - test-debian-10-go-1.12: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-go-1.12 > > > - test-debian-10-python-3: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-10-python-3 > > > - test-debian-c-glib: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-c-glib > > > - test-debian-ruby: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-debian-ruby > > > - test-fedora-29-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-cpp > > > - test-fedora-29-python-3: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-fedora-29-python-3 > > > - test-r-rhub-debian-gcc-devel: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-debian-gcc-devel > > > - test-r-rhub-ubuntu-gcc-release: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rhub-ubuntu-gcc-release > > > - test-r-rstudio-r-base-3.6-bionic: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-bionic > > > - test-r-rstudio-r-base-3.6-centos6: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-centos6 > > > - test-r-rstudio-r-base-3.6-opensuse15: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse15 > > > - test-r-rstudio-r-base-3.6-opensuse42: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-test-r-rstudio-r-base-3.6-opensuse42 > > > - test-ubuntu-16.04-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-16.04-cpp > > > - test-ubuntu-18.04-cpp-cmake32: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-cmake32 > > > - test-ubuntu-18.04-cpp-release: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-release > > > - test-ubuntu-18.04-cpp-static: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp-static > > > - test-ubuntu-18.04-cpp: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-cpp > > > - test-ubuntu-18.04-docs: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-docs > > > - test-ubuntu-18.04-python-3: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-python-3 > > > - test-ubuntu-18.04-r-sanitizer: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-18.04-r-sanitizer > > > - test-ubuntu-c-glib: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-c-glib > > > - test-ubuntu-fuzzit-fuzzing: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-fuzzing > > > - test-ubuntu-fuzzit-regression: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-fuzzit-regression > > > - test-ubuntu-ruby: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-circle-test-ubuntu-ruby > > > - ubuntu-bionic: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-bionic > > > - ubuntu-disco: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-disco > > > - ubuntu-xenial: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-ubuntu-xenial > > > - wheel-manylinux1-cp27m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27m > > > - wheel-manylinux1-cp27mu: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp27mu > > > - wheel-manylinux1-cp35m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp35m > > > - wheel-manylinux1-cp36m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp36m > > > - wheel-manylinux1-cp37m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp37m > > > - wheel-manylinux1-cp38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux1-cp38 > > > - wheel-manylinux2010-cp27m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27m > > > - wheel-manylinux2010-cp27mu: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp27mu > > > - wheel-manylinux2010-cp35m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp35m > > > - wheel-manylinux2010-cp36m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp36m > > > - wheel-manylinux2010-cp37m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp37m > > > - wheel-manylinux2010-cp38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2010-cp38 > > > - wheel-manylinux2014-cp36m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp36m > > > - wheel-manylinux2014-cp37m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp37m > > > - wheel-manylinux2014-cp38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-azure-wheel-manylinux2014-cp38 > > > - wheel-osx-cp27m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp27m > > > - wheel-osx-cp35m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp35m > > > - wheel-osx-cp36m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp36m > > > - wheel-osx-cp37m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp37m > > > - wheel-osx-cp38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-travis-wheel-osx-cp38 > > > - wheel-win-cp36m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp36m > > > - wheel-win-cp37m: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp37m > > > - wheel-win-cp38: > > > URL: > > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-15-0-appveyor-wheel-win-cp38 > > >