Re: [pyspark/sparksql]: How to overcome redundant/repetitive code? Is a for loop over an sql statement with a variable a bad idea?

2023-01-06 Thread Sean Owen
Right, nothing wrong with a for loop here. Seems like just the right thing. On Fri, Jan 6, 2023, 3:20 PM Joris Billen wrote: > Hello Community, > I am working in pyspark with sparksql and have a very similar very complex > list of dataframes that Ill have to execute several times for all the > “

[pyspark/sparksql]: How to overcome redundant/repetitive code? Is a for loop over an sql statement with a variable a bad idea?

2023-01-06 Thread Joris Billen
Hello Community, I am working in pyspark with sparksql and have a very similar very complex list of dataframes that Ill have to execute several times for all the “models” I have. Suppose the code is exactly the same for all models, only the table it reads from and some values in the where statem

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
So I think now that my problem is Spark-related after all. It looks like my bootstrap script installs SciPy just fine in a regular environment, but somehow interaction with PySpark breaks it. On Fri, Jan 6, 2023 at 12:39 PM Bjørn Jørgensen wrote: > Create a Dockerfile > > FROM fedora > > RUN sud

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Bjørn Jørgensen
Create a Dockerfile FROM fedora RUN sudo yum install -y python3-devel RUN sudo pip3 install -U Cython && \ sudo pip3 install -U pybind11 && \ sudo pip3 install -U pythran && \ sudo pip3 install -U numpy && \ sudo pip3 install -U scipy docker build --pull --rm -f "Dockerfile" -t fedoratest:l

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Mich Talebzadeh
https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own ri

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
Thank you for the link. I already tried most of what was suggested there, but without success. On Fri, Jan 6, 2023 at 11:35 AM Bjørn Jørgensen wrote: > > > > https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp > > > > > fre.

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Bjørn Jørgensen
https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp fre. 6. jan. 2023, 16:01 skrev Oliver Ruebenacker < oliv...@broadinstitute.org>: > > Hello, > > I'm trying to install SciPy using a bootstrap script and then use it

[PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
Hello, I'm trying to install SciPy using a bootstrap script and then use it to calculate a new field in a dataframe, running on AWS EMR. Although the SciPy website states that only NumPy is needed, when I tried to install SciPy using pip, pip kept failing, complaining about missing softw

Re: Spark reading from HBase using hbase-connectors - any benefit from localization?

2023-01-06 Thread Aaron Grubb
Hi Mich, Thanks a lot for the insight, it was very helpful. Aaron On Thu, 2023-01-05 at 23:44 +, Mich Talebzadeh wrote: Hi Aaron, Thanks for the details. It is a general practice when running Spark on premise to use Hadoop clusters.