[ANNOUNCE] Apache Celeborn 0.5.4 available

2025-03-13 Thread Nicholas
/ Celeborn Resources: - Issue Management: https://issues.apache.org/jira/projects/CELEBORN - Mailing List: d...@celeborn.apache.org Regards, Nicholas Jiang On behalf of the Apache Celeborn community

Re: How can I use pyspark to upsert one row without replacing entire table

2020-08-12 Thread Nicholas Gustafson
The delta docs have examples of upserting: https://docs.delta.io/0.4.0/delta-update.html#upsert-into-a-table-using-merge > On Aug 12, 2020, at 08:31, Siavash Namvar wrote: > >  > Thanks Sean, > > Do you have any URL or reference to help me how to upsert in Spark? I need to > update Sybase db

Re: unit testing for spark code

2021-03-22 Thread Nicholas Gustafson
I've found pytest works well if you're using PySpark. Though if you have a lot of tests, running them all can be pretty slow. On Mon, Mar 22, 2021 at 6:32 AM Amit Sharma wrote: > Hi, can we write unit tests for spark code. Is there any specific > framework? > > > Thanks > Amit >

Re: AnalysisException: Trouble using select() to append multiple columns

2021-12-17 Thread Nicholas Gustafson
Since df1 and df2 are different DataFrames, you will need to use a join. For example: df1.join(df2.selectExpr(“Name”, “NumReads as ctrl_2”), on=[“Name”]) > On Dec 17, 2021, at 16:25, Andrew Davidson wrote: > >  > Hi I am a newbie > > I have 16,000 data files, all files have the same number o

[ANNOUNCE] Apache Kyuubi (Incubating) released 1.6.0-incubating

2022-09-06 Thread Nicholas Jiang
Hi all, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.6.0-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, build on top of multiple compule e

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
nks for the advice Nicholas. > > As mentioned in the original email, I have tried JDBC + SSH Tunnel using > pymysql and sshtunnel and it worked fine. The problem happens only with Spark. > > Thanks, > Venkat > > > > On Wed, Dec 6, 2023 at 10:21 PM Nicholas Chammas <

Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and fail

[ANNOUNCE] Apache Celeborn(incubating) 0.3.2 available

2024-01-07 Thread Nicholas Jiang
Page: https://celeborn.apache.org/ Celeborn Resources: - Issue Management: https://issues.apache.org/jira/projects/CELEBORN - Mailing List: d...@celeborn.apache.org Regards, Nicholas Jiang On behalf of the Apache Celeborn(incubating) community

[ANNOUNCE] Apache Celeborn 0.4.1 available

2024-05-22 Thread Nicholas Jiang
Resources: - Issue Management: https://issues.apache.org/jira/projects/CELEBORN - Mailing List: d...@celeborn.apache.org Regards, Nicholas Jiang On behalf of the Apache Celeborn community

Re: Wish for 1.4: upper bound on # tasks in Mesos

2015-05-20 Thread Nicholas Chammas
To put this on the devs' radar, I suggest creating a JIRA for it (and checking first if one already exists). issues.apache.org/jira/ Nick On Tue, May 19, 2015 at 1:34 PM Matei Zaharia wrote: > Yeah, this definitely seems useful there. There might also be some ways to > cap the application in M

Re: Required settings for permanent HDFS Spark on EC2

2015-06-05 Thread Nicholas Chammas
If your problem is that stopping/starting the cluster resets configs, then you may be running into this issue: https://issues.apache.org/jira/browse/SPARK-4977 Nick On Thu, Jun 4, 2015 at 2:46 PM barmaley wrote: > Hi - I'm having similar problem with switching from ephemeral to persistent > HD

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs 1.4? Nick 2015년 6월 27일 (토) 오전 2:51, Axel Dahl 님이 작성: > still feels like a bug to have to create unique names before a join. > > On Fri, Jun 26, 2015 at 9:51 PM, ayan guha wrote: > >> You c

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
only tested on 1.4, but imagine 1.3 is the same or a lot of people's > code would be failing right now. > > On Saturday, June 27, 2015, Nicholas Chammas > wrote: > >> Yeah, you shouldn't have to rename the columns before joining them. >> >> Do you see the s

Re: spark ec2 as non-root / any plan to improve that in the future ?

2015-07-09 Thread Nicholas Chammas
No plans to change that at the moment, but agreed it is against accepted convention. It would be a lot of work to change the tool, change the AMIs, and test everything. My suggestion is not to hold your breath for such a change. spark-ec2, as far as I understand, is not intended for spinning up pe

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
> However, I believe, investing (or having some members of your group) learn and invest in Scala is worthwhile for few reasons. One, you will get the performance gain, especially now with Tungsten (not sure how it relates to Python, but some other knowledgeable people on the list, please chime in).

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
us almost all the processing comes before there is structure to it. > > > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > ---- Original message > From: Nicholas Chammas > Date: 03/02/2016 5:13 PM (GMT-05:00) > To: Jules Damji , Joshua So

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
oblem anyway. > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > Original message > From: Nicholas Chammas > Date: 03/02/2016 5:43 PM (GMT-05:00) > To: Darren Govoni , Jules Damji , > Joshua Sorrell > Cc: user@spark.apache.org >

Re: Reading Back a Cached RDD

2016-03-24 Thread Nicholas Chammas
Isn’t persist() only for reusing an RDD within an active application? Maybe checkpoint() is what you’re looking for instead? ​ On Thu, Mar 24, 2016 at 2:02 PM Afshartous, Nick wrote: > > Hi, > > > After calling RDD.persist(), is then possible to come back later and > access the persisted RDD. >

Re: Spark 1.6.1 packages on S3 corrupt?

2016-04-12 Thread Nicholas Chammas
Yes, this is a known issue. The core devs are already aware of it. [CC dev] FWIW, I believe the Spark 1.6.1 / Hadoop 2.6 package on S3 is not corrupt. It may be the only 1.6.1 package that is not corrupt, though. :/ Nick On Tue, Apr 12, 2016 at 9:00 PM Augustus Hong wrote: > Hi all, > > I'm t

Re: spark-ec2 hitting yum install issues

2016-04-14 Thread Nicholas Chammas
If you log into the cluster and manually try that step does it still fail? Can you yum install anything else? You might want to report this issue directly on the spark-ec2 repo, btw: https://github.com/amplab/spark-ec2 Nick On Thu, Apr 14, 2016 at 9:08 PM sanusha wrote: > > I am using spark-1.

Re: Writing output of key-value Pair RDD

2016-05-04 Thread Nicholas Chammas
You're looking for this discussion: http://stackoverflow.com/q/23995040/877069 Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325 On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick wrote: > Hi, > > > Is there any way to write out to S3 the

Re: Unsubscribe - 3rd time

2016-06-29 Thread Nicholas Chammas
> I'm not sure I've ever come across an email list that allows you to unsubscribe by responding to the list with "unsubscribe". Many noreply lists (e.g. companies sending marketing email) actually work that way, which is probably what most people are used to these days. What this list needs is an

Re: spark-2.0 support for spark-ec2 ?

2016-07-27 Thread Nicholas Chammas
Yes, spark-ec2 has been removed from the main project, as called out in the Release Notes: http://spark.apache.org/releases/spark-release-2-0-0.html#removals You can still discuss spark-ec2 here or on Stack Overflow, as before. Bug reports and the like should now go on that AMPLab GitHub project

Re: How to filter based on a constant value

2016-07-30 Thread Nicholas Hakobian
its returning the content of the first element in the row, in this case the Array (of length 1) of Date types. Nicholas Szandor Hakobian Data Scientist Rally Health nicholas.hakob...@rallyhealth.com M: 510-295-7113 On Sat, Jul 30, 2016 at 11:41 PM, Mich Talebzadeh wrote: > thanks gents. > > I

Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
Have you looked at pyspark.sql.functions.udf and the associated examples? 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성: > Hi, > > I’d like to use a UDF in pyspark 2.0. As in .. > > > def squareIt(x): > return x * x > > # register the function and define return type > …. > > spark.sql(“”"s

Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
to > use instead. > > On Aug 4, 2016, at 3:54 PM, Nicholas Chammas > wrote: > > Have you looked at pyspark.sql.functions.udf and the associated examples? > 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성: > >> Hi, >> >> I’d like to use a UDF in pyspark 2.0. As in

Re: Add column sum as new column in PySpark dataframe

2016-08-05 Thread Nicholas Chammas
I think this is what you need: import pyspark.sql.functions as sqlf df.withColumn('total', sqlf.sum(df.columns)) Nic On Thu, Aug 4, 2016 at 9:41 AM Javier Rey jre...@gmail.com wrote: Hi everybody, > > Sorry, I sent last mesage it was imcomplete this is complete

Re: Unsubscribe.

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 3:05 PM Martin Somers wrote: > Unsubscribe. > > Thanks > M >

Re: Unsubscribe

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 3:02 PM Hogancamp, Aaron < aaron.t.hoganc...@leidos.com> wrote: > Unsubscribe. > > > > Thanks, > > > > Aaron Hogancamp > > Data Scientist > > >

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Wed, Aug 10, 2016 at 2:46 AM Martin Somers wrote: > > > -- > M >

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 8:03 PM James Ding wrote: > >

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 5:14 PM abhishek singh wrote: > >

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
+1 Red Hat supports Python 2.6 on REHL 5 until 2020 , but otherwise yes, Python 2.6 is ancient history and the core Python developers stopped supporting it in 2013. REHL 5 is not a good enough reason to continue support for Python

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
see a reason Spark 2.0 would need to support Python 2.6. At this >> point, Python 3 should be the default that is encouraged. >> Most organizations acknowledge the 2.7 is common, but lagging behind the >> version they should theoretically use. Dropping python 2.6 >> support sounds v

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
i dont like it either, but i cannot change it. >>> >>> we currently don't use pyspark so i have no stake in this, but if we did >>> i can assure you we would not upgrade to spark 2.x if python 2.6 was >>> dropped. no point in developing something that doesn

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
I think all the slaves need the same (or a compatible) version of Python installed since they run Python code in PySpark jobs natively. On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers wrote: > interesting i didnt know that! > > On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas < >

Re: Is spark-ec2 going away?

2016-01-27 Thread Nicholas Chammas
I noticed that in the main branch, the ec2 directory along with the spark-ec2 script is no longer present. It’s been moved out of the main repo to its own location: https://github.com/amplab/spark-ec2/pull/21 Is spark-ec2 going away in the next release? If so, what would be the best alternative a

Re: Is this likely to cause any problems?

2016-02-19 Thread Nicholas Chammas
The docs mention spark-ec2 because it is part of the Spark project. There are many, many alternatives to spark-ec2 out there like EMR, but it's probably not the place of the official docs to promote any one of those third-party solutions. On Fri, Feb 19, 2016 at 11:05 AM James Hammerton wrote: >

Re: Spark UI consuming lots of memory

2015-10-12 Thread Nicholas Pritchard
I set those configurations by passing to spark-submit script: "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified that these configurations are being passed correctly because they are listed in the environments tab and also by counting the number of job/stages that are listed. T

Re: Spark UI consuming lots of memory

2015-10-12 Thread Nicholas Pritchard
2015 at 8:42 PM, Nicholas Pritchard < nicholas.pritch...@falkonry.com> wrote: > I set those configurations by passing to spark-submit script: > "bin/spark-submit --conf spark.ui.retainedJobs=20 ...". I have verified > that these configurations are being passed correct

Re: stability of Spark 1.4.1 with Python 3 versions

2015-10-14 Thread Nicholas Chammas
The Spark 1.4 release notes say that Python 3 is supported. The 1.4 docs are incorrect, and the 1.5 programming guide has been updated to indicate Python 3 support. On Wed, Oct 14, 2015 at 7:06 AM shoira.mukhsin...@bnpparibasfortis.com <

Re: Spark UI consuming lots of memory

2015-10-15 Thread Nicholas Pritchard
er. >> >> Best Regards, >> Shixiong Zhu >> >> 2015-10-13 11:44 GMT+08:00 Nicholas Pritchard < >> nicholas.pritch...@falkonry.com>: >> >>> As an update, I did try disabling the ui with "spark.ui.enabled=false", >>> but the JobL

Can we add an unsubscribe link in the footer of every email?

2015-10-21 Thread Nicholas Chammas
Every week or so someone emails the list asking to unsubscribe. Of course, that's not the right way to do it. You're supposed to email a different address than this one to unsubscribe, yet this is not in-your-face obvious, so many people miss it. And someon

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Nicholas Chammas
Nabble is an unofficial archive of this mailing list. I don't know who runs it, but it's not Apache. There are often delays between when things get posted to the list and updated on Nabble, and sometimes things never make it over for whatever reason. This mailing list is, I agree, very 1980s. Unfo

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Nicholas Chammas
Yeah, as Shivaram mentioned, this issue is well-known. It's documented in SPARK-5189 and a bunch of related issues. Unfortunately, it's hard to resolve this issue in spark-ec2 without rewriting large parts of the project. But if you take a crack at

Re: Upgrading Spark in EC2 clusters

2015-11-12 Thread Nicholas Chammas
spark-ec2 does not offer a way to upgrade an existing cluster, and from what I gather, it wasn't intended to be used to manage long-lasting infrastructure. The recommended approach really is to just destroy your existing cluster and launch a new one with the desired configuration. If you want to u

Re: spark-ec2 script to launch cluster running Spark 1.5.2 built with HIVE?

2015-11-23 Thread Nicholas Chammas
Don't the Hadoop builds include Hive already? Like spark-1.5.2-bin-hadoop2.6.tgz? On Mon, Nov 23, 2015 at 7:49 PM Jeff Schecter wrote: > Hi all, > > As far as I can tell, the bundled spark-ec2 script provides no way to > launch a cluster running Spark 1.5.2 pre-built with HIVE. > > That is to sa

Re: Adding more slaves to a running cluster

2015-11-25 Thread Nicholas Chammas
spark-ec2 does not directly support adding instances to an existing cluster, apart from the special case of adding slaves to a cluster with a master but no slaves. There is an open issue to track adding this support, SPARK-2008 , but it doesn't have

Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Nicholas Chammas
Quick question: Are you processing gzipped files by any chance? It's a common stumbling block people hit. See: http://stackoverflow.com/q/27531816/877069 Nick On Fri, Dec 4, 2015 at 2:28 PM Kyohey Hamaguchi wrote: > Hi, > > I have setup a Spark standalone-cluster, which involves 5 workers, > u

Re: spark spark-ec2 credentials using aws_security_token

2015-07-27 Thread Nicholas Chammas
You refer to `aws_security_token`, but I'm not sure where you're specifying it. Can you elaborate? Is it an environment variable? On Mon, Jul 27, 2015 at 4:21 AM Jan Zikeš wrote: > Hi, > > I would like to ask if it is currently possible to use spark-ec2 script > together with credentials that ar

[survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-17 Thread Nicholas Chammas
Howdy folks! I’m interested in hearing about what people think of spark-ec2 outside of the formal JIRA process. Your answers will all be anonymous and public. If the embedded form below doesn’t work for you, you can use this link to get the s

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-20 Thread Nicholas Chammas
Nick On Mon, Aug 17, 2015 at 11:09 AM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Howdy folks! > > I’m interested in hearing about what people think of spark-ec2 > <http://spark.apache.org/docs/latest/ec2-scripts.html> outside of the > formal JIRA proce

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-25 Thread Nicholas Chammas
Final chance to fill out the survey! http://goo.gl/forms/erct2s6KRR I'm gonna close it to new responses tonight and send out a summary of the results. Nick On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas wrote: > I'm planning to close the survey to further responses early next

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-28 Thread Nicholas Chammas
days: http://goo.gl/forms/erct2s6KRR As noted before, your results are anonymous and public. Thanks again for participating! I hope this has been useful to the community. Nick On Tue, Aug 25, 2015 at 1:31 PM Nicholas Chammas wrote: > Final chance to fill out the survey! > > http://go

Failing to include multiple JDBC drivers

2015-09-04 Thread Nicholas Connor
sqlIO.checkErrorPacket(MysqlIO.java:910) at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3923) at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1273) *Versions Tested*: Spark 1.3.1 && 1.4.1 What method can I use to load both drivers? Thanks, Nicholas Connor

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete nonsense. First off, you c

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 2 September 2016 at 15:35, Nic

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
e it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from >

Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I have a dataset that I need to convert some of the the variables to dummy variables. The get_dummies function in Pandas works perfectly on smaller datasets but since it collects I'll always be bottlenecked by the master node. I've looked at Spark's OHE feature and while that will work in theory I

Re: Finding a Spark Equivalent for Pandas' get_dummies

2016-11-11 Thread Nicholas Sharkey
I did get *some* help from DataBricks in terms of programmatically grabbing the categorical variables but I can't figure out where to go from here: *# Get all string cols/categorical cols* *stringColList = [i[0] for i in df.dtypes if i[1] == 'string']* *# generate OHEs for every col in stringColL

Re: Strongly Connected Components

2016-11-13 Thread Nicholas Chammas
FYI: There is a new connected components implementation coming in GraphFrames 0.3. See: https://github.com/graphframes/graphframes/pull/119 Implementation is based on: https://mmds-data.org/presentations/2014/vassilvitskii_mmds14.pdf Nick On Sat, Nov 12, 2016 at 3:01 PM Koert Kuipers wrote: >

Re: Spark ML : One hot Encoding for multiple columns

2016-11-13 Thread Nicholas Sharkey
Amen > On Nov 13, 2016, at 7:55 PM, janardhan shetty wrote: > > These Jiras' are still unresolved: > https://issues.apache.org/jira/browse/SPARK-11215 > > Also there is https://issues.apache.org/jira/browse/SPARK-8418 > >> On Wed, Aug 17, 2016 at 11:15 AM, Nisha Muktewar wrote: >> >> The O

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Wed, Dec 7, 2016 at 10:53 PM Ajith Jose wrote: > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:12 AM Ajit Jaokar wrote: > > > - > To unsubscribe e-mail: user-unsu

Re: unscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 1:34 AM smith_666 wrote: > > > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 6:27 AM Vinicius Barreto < vinicius.s.barr...@gmail.com> wrote: > Unsubscribe > > Em 7 de dez de 2016 17:46, "map reduced" escreveu: > > H

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:54 AM Roger Holenweger wrote: > > > - > To unsubscribe e-mail: user

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:08 AM Kranthi Gmail wrote: > > > -- > Kranthi > > PS: Sent from mobile, pls excuse the brevity and typos. > > On Dec 7, 2016, at 8:05 P

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:42 AM Chen, Yan I wrote: > > > > ___ > > If you received this email

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:17 AM Prashant Singh Thakur < prashant.tha...@impetus.co.in> wrote: > > > > > Best Regards, > > Prashant Thakur > > Work : 6046 > > Mobi

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:54 AM Kishorkumar Patil wrote: > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:50 AM Juan Caravaca wrote: > unsubscribe >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 8:01 AM Niki Pavlopoulou wrote: > unsubscribe >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:46 AM Tao Lu wrote: > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:46 AM Ramon Rosa da Silva wrote: > > This e-mail message, including any attachments, is for the sole use of the > person to whom it has

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
Yes, sorry about that. I didn't think before responding to all those who asked to unsubscribe. On Thu, Dec 8, 2016 at 10:00 AM Di Zhu wrote: > Could you send to individual privately without cc to all users every time? > > > On 8 Dec 2016, at 3:58 PM, Nicholas Chammas

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
received this email. > > > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:02 AM > *To:* Di Zhu > *Cc:* user @spark > *Subject:* Re: unsubscribe > > > > Yes, sorry about that. I didn't think be

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
wrote: > I’m pretty sure I didn’t. > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:56 AM > *To:* Chen, Yan I; Di Zhu > > > *Cc:* user @spark > *Subject:* Re: unsubscribe > > > > Oh, hmm... > >

Re: [TorrentBroadcast] Pyspark Application terminated saying "Failed to get broadcast_1_ piece0 of broadcast_1 in Spark 2.0.0"

2016-12-29 Thread Nicholas Hakobian
. Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Thu, Dec 29, 2016 at 4:00 AM, Palash Gupta < spline_pal...@yahoo.com.invalid> wrote: > Hi Marco, > > Thanks for your response. > > Yes I tested it before & am able to l

Re: Best way to process lookup ETL with Dataframes

2016-12-30 Thread Nicholas Hakobian
but its just harder to read. Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Fri, Dec 30, 2016 at 7:46 AM, Sesterhenn, Mike wrote: > Thanks, but is nvl() in Spark 1.5? I can't find it in spark.sql.functions > (http://spark.

Re: Best way to process lookup ETL with Dataframes

2016-12-30 Thread Nicholas Hakobian
the least, or 2.0 if possible) is feasible to install. There are lots of performance improvements in those versions, if you have the option of upgrading. -Nick Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Fri, Dec 30, 2016 at 3:35 PM

Re: Custom delimiter file load

2016-12-31 Thread Nicholas Hakobian
(a similar syntax is also available in pySpark). -Nick Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sat, Dec 31, 2016 at 9:58 AM, A Shaikh wrote: > In Pyspark 2 loading file wtih any delimiter into Dataframe is pre

Re: What is the difference between hive on spark and spark on hive?

2017-01-09 Thread Nicholas Hakobian
functions. There is also the SparkSQL shell and thrift server which provides a SQL only interface, but uses all the native Spark pipeline. Hope this helps! -Nick Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Mon, Jan 9, 2017 at 7:05 AM, 李斌

Re: H2O DataFrame to Spark RDD/DataFrame

2017-01-12 Thread Nicholas Sharkey
Page 33 of the Sparkling Water Booklet: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/SparklingWaterBooklet.pdf df = sqlContext.read.format("h2o").option("key",frame.frame_id).load() df = sqlContext.read.format("h2o").load(frame.frame_id) On Thu, Jan 12, 2017 at 1:17 PM, Md. Rezaul Kar

Re: Re: Re: how to change datatype by useing StructType

2017-01-12 Thread Nicholas Hakobian
Have you tried the native CSV reader (in spark 2) or the Databricks CSV reader (in 1.6). If your format is in a CSV like format it'll load it directly into a DataFrame. Its possible you have some rows where types are inconsistent. Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist

Debugging a PythonException with no details

2017-01-13 Thread Nicholas Chammas
I’m looking for tips on how to debug a PythonException that’s very sparse on details. The full exception is below, but the only interesting bits appear to be the following lines: org.apache.spark.api.python.PythonException: ... py4j.protocol.Py4JError: An error occurred while calling None.org.apac

Re: Debugging a PythonException with no details

2017-01-17 Thread Nicholas Chammas
UDF..Could u share snippet of code you are > running? > Kr > > On 14 Jan 2017 1:40 am, "Nicholas Chammas" > wrote: > > I’m looking for tips on how to debug a PythonException that’s very sparse > on details. The full exception is below, but the only interesting bi

Re: Order of rows not preserved after cache + count + coalesce

2017-02-13 Thread Nicholas Chammas
RDDs and DataFrames do not guarantee any specific ordering of data. They are like tables in a SQL database. The only way to get a guaranteed ordering of rows is to explicitly specify an orderBy() clause in your statement. Any ordering you see otherwise is incidental. ​ On Mon, Feb 13, 2017 at 7:52

Re: New Amazon AMIs for EC2 script

2017-02-23 Thread Nicholas Chammas
spark-ec2 has moved to GitHub and is no longer part of the Spark project. A related issue from the current issue tracker that you may want to follow/comment on is this one: https://github.com/amplab/spark-ec2/issues/74 As I said there, I think requiring custom AMIs is one of the major maintenance

Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Hmm, so when I submit an application with `spark-submit`, I need to guarantee it resources using YARN queues and not Spark's scheduler pools. Is that correct? When are Spark's scheduler pools relevant/useful in this context? On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra wrote: > grrr... s/your/yo

Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
ot;spark.scheduler.pool" to something other than the default pool before > a particular Job intended to use that pool is started via that SparkContext. > > On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > > Hmm, so when I submit

Re: Spark and Hive connection

2017-04-06 Thread Nicholas Hakobian
for the connection. Hope this helps, Nick Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Wed, Apr 5, 2017 at 10:06 PM, infa elance wrote: > Hi all, > When using spark-shell my understanding is spark connects to hive through >

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread Nicholas Hakobian
ormalize52.select([_if_not_in_processing(i) for i in dfTotaleNormalize52.columns]) Otherwise there isn't anything obvious to me as to why it isn't working. If you actually do have pySpark 1.5 and not 1.6 I know it handles UDF registration differently. Hope this helps. Nicholas Szandor Ha

Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Nicholas Hakobian
#pyspark.sql.DataFrameReader.json "If the schema parameter is not specified, this function goes through the input once to determine the input schema." Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Thu, May 25, 2017 at 9:24 AM, Steffen Schm

Re: Trouble with PySpark UDFs and SPARK_HOME only on EMR

2017-06-22 Thread Nicholas Chammas
Here’s a repro for a very similar issue where Spark hangs on the UDF, which I think is related to the SPARK_HOME issue. I posted the repro on the EMR forum , but in case you can’t access it: 1. I’m running EMR 5.6.0, Spark 2.1.1, and

Re: how do you deal with datetime in Spark?

2017-10-03 Thread Nicholas Hakobian
impleDateFormat.html). Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Tue, Oct 3, 2017 at 10:43 AM, Adaryl Wakefield < adaryl.wakefi...@hotmail.com> wrote: > I gave myself a project to start actually writing Spark programs. I’m > us

Re: How to flatten a row in PySpark

2017-10-12 Thread Nicholas Hakobian
Using explode on the 4th column, followed by an explode on the 5th column would produce what you want (you might need to use split on the columns first if they are not already an array). Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Thu

Re: NLTK with Spark Streaming

2017-11-28 Thread Nicholas Hakobian
Depending on your needs, its fairly easy to write a lightweight python wrapper around the Databricks spark-corenlp library: https://github.com/databricks/spark-corenlp Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sun, Nov 26, 2017 at 8

Re: Subqueries

2017-12-29 Thread Nicholas Hakobian
able to squeeze some more performance out of it (depending on the size of the table), by caching it beforehand. Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Fri, Dec 29, 2017 at 1:02 PM, Lalwani, Jayesh < jayesh.lalw...@capitalone.

Re: Spark Dataframe and HIVE

2018-02-09 Thread Nicholas Hakobian
ave a parquet file with a column in date format into a Hive table. In older versions of hive, its parquet reader/writer did not support Date formats (among a couple others). Nicholas Szandor Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Fri, Feb 9, 2018 at 9:

  1   2   3   4   5   >