from:"Oliver Ruebenacker"

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-24 Thread Oliver Ruebenacker

ell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB > <http://facebook.com/jurney> datasyndrome.com Book a time on Calendly > <https://calendly.com/rjurney_personal/30min> > > > On Fri, Feb 24, 2023 at 9:53 AM Oliver Ruebenacker < > oliv...@broadins

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-24 Thread Oliver Ruebenacker

t;> ``` >>>> Traceback (most recent call last): >>>> File "nearest-gene.py", line 74, in >>>> main() >>>> File "nearest-gene.py", line 62, in main >>>> distances = joined.withColumn("di

Re: [PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Oliver Ruebenacker

&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. ``` On Thu, Feb 23, 2023 at 2:00 PM Sean Owen wrote: > That error sounds like it's from pandas not spark. Are you sure it's this > lin

[PySpark SQL] New column with the maximum of multiple terms?

2023-02-23 Thread Oliver Ruebenacker

7;|' for 'or', '~' for 'not' when building DataFrame boolean expressions. ``` How can I do this? Thanks! Best, Oliver -- Oliver Ruebenacker, Ph.D. (he) Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, Flannick Lab <http://www.flannicklab.org/>, Broad Institute <http://www.broadinstitute.org/>

Re: [PySPark] How to check if value of one column is in array of another column

2023-01-18 Thread Oliver Ruebenacker

Arguments must be same type but were: string != >> array; >> >> How do I do this? Thanks! >> >> Best, Oliver >> >> -- >> Oliver Ruebenacker, Ph.D. (he) >> Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, >

[PySPark] How to check if value of one column is in array of another column

2023-01-17 Thread Oliver Ruebenacker

: pyspark.sql.utils.AnalysisException: cannot resolve '(gene IN (nearest))' due to data type mismatch: Arguments must be same type but were: string != array; How do I do this? Thanks! Best, Oliver -- Oliver Ruebenacker, Ph.D. (he) Senior Software Engineer, Knowledge Portal Net

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker

ement already satisfied: numpy<1.27.0,>=1.19.5 in > /usr/local/lib64/python3.11/site-packages (from scipy) (1.24.1) > Installing collected packages: scipy > Successfully installed scipy-1.10.0 > WARNING: Running pip as the 'root' user can result in broken permiss

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker

gt; > > > > fre. 6. jan. 2023, 16:01 skrev Oliver Ruebenacker < > oliv...@broadinstitute.org>: > >> >> Hello, >> >> I'm trying to install SciPy using a bootstrap script and then use it to >> calculate a new field in a dataframe, runnin

[PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker

en at this line: *from scipy.stats import norm* I get the following error: *ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject* Any advice on how to proceed? Thanks! Best, Oliver -- Oliver Ruebenacker, Ph.D. (he) S

Re: [PySpark] Getting the best row from each group

2022-12-21 Thread Oliver Ruebenacker

be improved. > > > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > los

Re: [PySpark] Getting the best row from each group

2022-12-20 Thread Oliver Ruebenacker

ce as it needs to order/sort. > -- > Raghavendra > > > On Mon, Dec 19, 2022 at 8:57 PM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello, >> >> How can I retain from each group only the row for which one value is &g

Re: [PySpark] Getting the best row from each group

2022-12-20 Thread Oliver Ruebenacker

at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages ari

Re: [PySpark] Getting the best row from each group

2022-12-19 Thread Oliver Ruebenacker

001 >> UkraineKharkiv140 2 >> USANew York 9001 >> USAMiami 6202 >> >> Which you could further filter in another CTE or subquery where >> PopulationRank = 1. >> >> As I mentioned, I&

Re: [PySpark] Getting the best row from each group

2022-12-19 Thread Oliver Ruebenacker

t; a window function? > > On Mon, Dec 19, 2022, 9:45 AM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello, >> >> Thank you for the response! >> >> I can think of two ways to get the largest city by country, but bot

Re: [PySpark] Getting the best row from each group

2022-12-19 Thread Oliver Ruebenacker

> loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > >

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

On Tue, Dec 6, 2022 at 10:47 AM Holden Karau wrote: > Take a look at https://github.com/nielsbasjes/splittablegzip :D > > On Tue, Dec 6, 2022 at 7:46 AM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello Holden, >> >> T

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

Dec 6, 2022 at 1:43 PM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >> >> Hello Chris, >> >> Yes, you can use gunzip/gzip to uncompress a file created by bgzip, but >> to start reading from somewhere other than the beginning of the file

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-06 Thread Oliver Ruebenacker

To achieve either of those, > it would require writing a custom Hadoop compression codec to integrate > more closely with the data format. > > Chris Nauroth > > > On Mon, Dec 5, 2022 at 2:08 PM Oliver Ruebenacker < > oliv...@broadinstitute.org> wrote: > >>

Re: [PySpark] Reader/Writer for bgzipped data

2022-12-05 Thread Oliver Ruebenacker

codecs like Snappy are > generally preferred for greater efficiency. (Of course, we're not always in > complete control of the data formats we're given, so the support for bz2 is > there.) > > [1] > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoo

[PySpark] Reader/Writer for bgzipped data

2022-12-02 Thread Oliver Ruebenacker

Hello, Is it possible to read/write a DataFrame from/to a set of bgzipped files? Can it read from/write to AWS S3? Thanks! Best, Oliver -- Oliver Ruebenacker, Ph.D. (he) Senior Software Engineer, Knowledge Portal Network <http://kp4cd.org/>, Flannick Lab <http://www.flanni

Re: [PySpark] Join using condition where each record may be joined multiple times

2022-11-28 Thread Oliver Ruebenacker

query... > > On 11/27/22 12:30 PM, Oliver Ruebenacker wrote: > > > Hello, > > I have two Dataframes I want to join using a condition such that each > record from each Dataframe may be joined with multiple records from the > other Dataframe. This means the origina

[PySpark] Join using condition where each record may be joined multiple times

2022-11-27 Thread Oliver Ruebenacker

_glob).select('chromosome', 'position', 'reference', 'alt', 'pValue')print('There is data from ' + str(variants.count()) + ' variants:')for row in variants.take(42): print(row)cond = (genes.chromosome == variants.chromosome)

Re: [scala-user] ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-16 Thread Oliver Ruebenacker

a.io/releases/"; > > I am getting TaskResultGetter error with ClassNotFoundException for > scala.Some . > > Can I please get some help how to fix it? > > Thanks, > S. Sarkar > > -- > You received this message because you are subscribed to the Google

Re: [PySpark SQL] New column with the maximum of multiple terms?

Re: [PySpark SQL] New column with the maximum of multiple terms?

Re: [PySpark SQL] New column with the maximum of multiple terms?

[PySpark SQL] New column with the maximum of multiple terms?

Re: [PySPark] How to check if value of one column is in array of another column

[PySPark] How to check if value of one column is in array of another column

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

[PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Getting the best row from each group

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Reader/Writer for bgzipped data

[PySpark] Reader/Writer for bgzipped data

Re: [PySpark] Join using condition where each record may be joined multiple times

[PySpark] Join using condition where each record may be joined multiple times

Re: [scala-user] ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

23 matches

Site Navigation

Mail list logo

Footer information