Re: Help choose a GraphFrames logo

2025-01-15 Thread Denny Lee
Thanks Russell, just wanted to give a shout out that this is really cool :) On Wed, Jan 15, 2025 at 1:13 AM Russell Jurney wrote: > GraphFrames needs a logo, so I created a 99designs contest to create one. > There are six finalists. Please vote for the one you like the most :) > > https://99desi

Re: First Time contribution.

2023-09-17 Thread Denny Lee
Hi Ram, We have some good guidance at https://spark.apache.org/contributing.html HTH! Denny On Sun, Sep 17, 2023 at 17:18 ram manickam wrote: > > > > Hello All, > Recently, joined this community and would like to contribute. Is there a > guideline or recommendation on tasks that can be picked

Re: Slack for PySpark users

2023-04-03 Thread Denny Lee
>>>>>> medium size groups it is good and affordable. Alternatives have been >>>>>>> suggested as well so those who like investigative search can agree and >>>>>>> come >>>>>>> up with a freebie one. >>>>

Re: Slack for PySpark users

2023-03-30 Thread Denny Lee
>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> + @d...@spark.apache.org >>>>>>> >>>>>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid, >>&g

Re: Slack for PySpark users

2023-03-27 Thread Denny Lee
+1 I think this is a great idea! On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon wrote: > Yeah, actually I think we should better have a slack channel so we can > easily discuss with users and developers. > > On Tue, 28 Mar 2023 at 03:08, keen wrote: > >> Hi all, >> I really like *Slack *as commun

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali >> wrote: >> >> Hello Mich, >> >> My apologies

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
Thanks Mich for tackling this! I encourage everyone to add to the list so we can have a comprehensive list of topics, eh?! On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh wrote: > Hi all, > > Thanks to @Denny Lee to give access to > > https://www.linkedin.com/company/apach

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
rg or request to leverage the original Spark confluence page <https://cwiki.apache.org/confluence/display/SPARK>.WDYT? On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh wrote: > Well that needs to be created first for this purpose. The appropriate name > etc. to be decided. Maybe @Denny

Re: Online classes for spark topics

2023-03-12 Thread Denny Lee
Looks like we have some good topics here - I'm glad to help with setting up the infrastructure to broadcast if it helps? On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani wrote: > I am happy to be a part of this discussion as well. > > Regards, > Neeraj > > On Wed, 8 Mar 2023 at 22:41, Winston Lai

Re: Online classes for spark topics

2023-03-08 Thread Denny Lee
We used to run Spark webinars on the Apache Spark LinkedIn group but honestly the turnout was pretty low. We had dove into various features. If there are particular topics that. you would like to discuss during a live session, pleas

Re: Prometheus with spark

2022-10-27 Thread Denny Lee
Hi Raja, A little atypical way to respond to your question - please check out the most recent Spark AMA where we discuss this: https://www.linkedin.com/posts/apachespark_apachespark-ama-committers-activity-6989052811397279744-jpWH?utm_source=share&utm_medium=member_ios HTH! Denny On Tue, Oct 2

Re: Databricks notebook - cluster taking a long time to get created, often timing out

2021-08-17 Thread Denny Lee
Hi Karan, You may want to ping Databricks Help or Forums as this is a Databricks specific question. I'm a little surprised that a Databricks cluster would take a long time to create so it may be best to utilize these foru

Re: Append to an existing Delta Lake using structured streaming

2021-07-21 Thread Denny Lee
Including the Delta Lake Users and Developers DL to help out. Saying this, could you clarify how data is not being added? By any chance do you have any code samples to recreate this? Sent via Superhuman On Wed, Jul 21, 2021 at 2:49 AM, wrote: > Hi

Re: How to unsubscribe

2020-05-06 Thread Denny Lee
Hi Fred, To unsubscribe, could you please email: user-unsubscr...@spark.apache.org (for more information, please refer to https://spark.apache.org/community.html). Thanks! Denny On Wed, May 6, 2020 at 10:12 AM Fred Liu wrote: > Hi guys > > > > -

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Denny Lee
There are a number of really good datasets already available including (but not limited to): - South Korea COVID-19 Dataset - 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee
+1 On Fri, May 31, 2019 at 17:58 Holden Karau wrote: > +1 > > On Fri, May 31, 2019 at 5:41 PM Bryan Cutler wrote: > >> +1 and the draft sounds good >> >> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: >> >>> Here is the draft announcement: >>> >>> === >>> Plan for dropping Python 2 suppor

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Denny Lee
y, > The pyspark script uses the --packages option to load graphframe library, > what about the SparkLauncher class? > > > > -- Original -- > *From:* Denny Lee > *Date:* Sun,Feb 18,2018 11:07 AM > *To:* 94035420 > *Cc:* user@spark.apa

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
hX vs. GraphFrames? On Sat, Feb 17, 2018 at 8:26 PM xiaobo wrote: > Thanks Denny, will it be supported in the near future? > > > > -- Original ------ > *From:* Denny Lee > *Date:* Sun,Feb 18,2018 11:05 AM > *To:* 94035420 > *Cc:* user@spark.a

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
That’s correct - you can use GraphFrames though as it does support PySpark. On Sat, Feb 17, 2018 at 17:36 94035420 wrote: > I can not find anything for graphx module in the python API document, does > it mean it is not supported yet? >

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Denny Lee
This is amazingly awesome! :) On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com wrote: > That's great! > > > > On 12 July 2017 at 12:41, Felix Cheung wrote: > >> Awesome! Congrats!! >> >> -- >> *From:* holden.ka...@gmail.com on behalf of >> Holden Karau >> *Sent:*

Re: Spark Shell issue on HDInsight

2017-05-14 Thread Denny Lee
gt; Works for me tooyou are a life-saver :) > > But the question: should/how we report this to Azure team? > > On Fri, May 12, 2017 at 10:32 AM, Denny Lee wrote: > >> I was able to repro your issue when I had downloaded the jars via blob >> but when I downloaded them

Re: Spark Shell issue on HDInsight

2017-05-11 Thread Denny Lee
apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) > at org.apache.spark.deploy.SparkSubmit$.ma

Re: Spark Shell issue on HDInsight

2017-05-08 Thread Denny Lee
This appears to be an issue with the Spark to DocumentDB connector, specifically version 0.0.1. Could you run the 0.0.3 version of the jar and see if you're still getting the same error? i.e. spark-shell --master yarn --jars azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.ja

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Denny Lee
As well, perhaps another option could be to use the Spark Connector to DocumentDB (https://github.com/Azure/azure-documentdb-spark) if sticking with Scala? On Thu, Apr 20, 2017 at 21:46 Nan Zhu wrote: > DocDB does have a java client? Anything prevent you using that? > > Get Outlook for iOS

Support Stored By Clause

2017-03-27 Thread Denny Lee
Per SPARK-19630, wondering if there are plans to support "STORED BY" clause for Spark 2.x? Thanks!

Re: unsubscribe

2017-01-09 Thread Denny Lee
Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org HTH! On Mon, Jan 9, 2017 4:40 PM, william tellme williamtellme...@gmail.com wrote:

Re: UNSUBSCRIBE

2017-01-09 Thread Denny Lee
Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org HTH! On Mon, Jan 9, 2017 4:41 PM, Chris Murphy - ChrisSMurphy.com cont...@chrissmurphy.com wrote: PLEASE!!

Re: Spark app write too many small parquet files

2016-11-27 Thread Denny Lee
Generally, yes - you should try to have larger data sizes due to the overhead of opening up files. Typical guidance is between 64MB-1GB; personally I usually stick with 128MB-512MB with the default of snappy codec compression with parquet. A good reference is Vida Ha's presentation Data Storage T

Re: hope someone can recommend some books for me,a spark beginner

2016-11-06 Thread Denny Lee
There are a number of great resources to learn Apache Spark - a good starting point is the Apache Spark Documentation at: http://spark.apache.org/documentation.html The two books that immediately come to mind are - Learning Spark: http://shop.oreilly.com/product/mobile/0636920028512.do (there's

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
The one you're looking for is the Data Sciences and Engineering with Apache Spark at https://www.edx.org/xseries/data-science-engineering-apacher-sparktm. Note, a great quick start is the Getting Started with Apache Spark on Databricks at https://databricks.com/product/getting-started-guide HTH!

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Denny Lee
If you're able to read the data in as a DataFrame, perhaps you can use a BroadcastHashJoin so that way you can join to that table presuming its small enough to distributed? Here's a handy guide on a BroadcastHashJoin: https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20S

Re: GraphFrame BFS

2016-11-01 Thread Denny Lee
You should be able to GraphX or GraphFrames subgraph to build up your subgraph. A good example for GraphFrames can be found at: http://graphframes.github.io/user-guide.html#subgraphs. HTH! On Mon, Oct 10, 2016 at 9:32 PM cashinpj wrote: > Hello, > > I have a set of data representing various ne

Re: Spark GraphFrames

2016-08-02 Thread Denny Lee
Hi Divya, Here's a blog post concerning On-Time Flight Performance with GraphFrames: https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html It also includes a Databricks notebook that has the code in it. HTH! Denny On Tue, Aug 2, 2016 at 1:16 A

Re: Meetup in Rome

2016-02-19 Thread Denny Lee
Hey Domenico, Glad to hear that you love Spark and would like to organize a meetup in Rome. We created a Meetup-in-a-box to help with that - check out the post https://databricks.com/blog/2015/11/19/meetup-in-a-box.html. HTH! Denny On Fri, Feb 19, 2016 at 02:38 Domenico Pontari wrote: > > Hi

Re: How to compile Python and use How to compile Python and use spark-submit

2016-01-08 Thread Denny Lee
Per http://spark.apache.org/docs/latest/submitting-applications.html: For Python, you can use the --py-files argument of spark-submit to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg. O

Re: subscribe

2016-01-08 Thread Denny Lee
To subscribe, please go to http://spark.apache.org/community.html to join the mailing list. On Fri, Jan 8, 2016 at 3:58 AM Jeetendra Gangele wrote: > >

Re: Intercept in Linear Regression

2015-12-15 Thread Denny Lee
If you're using model = LinearRegressionWithSGD.train(parseddata, iterations=100, step=0.01, intercept=True) then to get the intercept, you would use model.intercept More information can be found at: http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#module-pyspark.mllib.regression

Re: Best practises

2015-11-02 Thread Denny Lee
In addition, you may want to check out Tuning and Debugging in Apache Spark (https://sparkhub.databricks.com/video/tuning-and-debugging-apache-spark/) On Mon, Nov 2, 2015 at 05:27 Stefano Baghino wrote: > There is this interesting book from Databricks: > https://www.gitbook.com/book/databricks/d

Spark Survey Results 2015 are now available

2015-10-05 Thread Denny Lee
Thanks to all of you who provided valuable feedback in our Spark Survey 2015. Because of the survey, we have a better picture of who’s using Spark, how they’re using it, and what they’re using it to build–insights that will guide major updates to the Spark platform as we move into Spark’s next pha

Re: SQL Server to Spark

2015-07-23 Thread Denny Lee
It sort of depends on optimized. There is a good thread on the topic at http://search-hadoop.com/m/q3RTtJor7QBnWT42/Spark+and+SQL+server/v=threaded If you have an archival type strategy, you could do daily BCP extracts out to load the data into HDFS / S3 / etc. This would result in minimal impact

Re: Please add the Chicago Spark Users' Group to the community page

2015-07-06 Thread Denny Lee
Hey Dean, Sure, will take care of this. HTH, Denny On Tue, Jul 7, 2015 at 10:07 Dean Wampler wrote: > Here's our home page: http://www.meetup.com/Chicago-Spark-Users/ > > Thanks, > Dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition >

Re: Spark SQL queries hive table, real time ?

2015-07-06 Thread Denny Lee
Within the context of your question, Spark SQL utilizing the Hive context is primarily about very fast queries. If you want to use real-time queries, I would utilize Spark Streaming. A couple of great resources on this topic include Guest Lecture on Spark Streaming in Stanford CME 323: Distribute

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Denny Lee
4g > > /Sim > > Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/> > @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746 > > > From: Yin Huai > Date: Monday, July 6, 2015 at 12:59 AM > To: Simeon Simeonov > Cc:

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Denny Lee
I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang wrote: > We have hit the same

Hive Skew flag?

2015-05-15 Thread Denny Lee
Just wondering if we have any timeline on when the hive skew flag will be included within SparkSQL? Thanks! Denny

Re: how to delete data from table in sparksql

2015-05-14 Thread Denny Lee
Delete from table is available as part of Hive 0.14 (reference: Apache Hive > Language Manual DML - Delete ) while Spark 1.3 defaults to Hive 0.13.Perhaps rebuild Spark with Hive 0.14 or generate a new

Re: Spark Cluster Setup

2015-04-27 Thread Denny Lee
Similar to what Dean called out, we build Puppet manifests so we could do the automation - its a bit of work to setup, but well worth the effort. On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler wrote: > It's mostly manual. You could try automating with something like Chef, of > course, but there's

Re: Start ThriftServer Error

2015-04-22 Thread Denny Lee
You may need to specify the hive port itself. For example, my own Thrift start command is in the form: ./sbin/start-thriftserver.sh --master spark://$myserver:7077 --driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host $myserver --hiveconf hive.server2.thrift.port 1 HTH! O

Re: Skipped Jobs

2015-04-19 Thread Denny Lee
Thanks for the correction Mark :) On Sun, Apr 19, 2015 at 3:45 PM Mark Hamstra wrote: > Almost. Jobs don't get skipped. Stages and Tasks do if the needed > results are already available. > > On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee wrote: > >> The job is skipp

Re: Skipped Jobs

2015-04-19 Thread Denny Lee
The job is skipped because the results are available in memory from a prior run. More info at: http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3ccakx7bf-u+jc6q_zm7gtsj1mihagd_4up4qxpd9jfdjrfjax...@mail.gmail.com%3E. HTH! On Sun, Apr 19, 2015 at 1:43 PM James King wrote: > In th

Re: Which version of Hive QL is Spark 1.3.0 using?

2015-04-17 Thread Denny Lee
Support for sub queries in predicates hasn't been resolved yet - please refer to SPARK-4226 BTW, Spark 1.3 default bindings to Hive 0.13.1 On Fri, Apr 17, 2015 at 09:18 ARose wrote: > So I'm trying to store the results of a query into a DataFrame, but I get > the > following exception thrown

Re: Microsoft SQL jdbc support from spark sql

2015-04-16 Thread Denny Lee
Bummer - out of curiosity, if you were to use the classpath.first or perhaps copy the jar to the slaves could that actually do the trick? The latter isn't really all that efficient but just curious if that could do the trick. On Thu, Apr 16, 2015 at 7:14 AM ARose wrote: > I take it back. My so

Re: Converting Date pattern in scala code

2015-04-14 Thread Denny Lee
If you're doing in Scala per se - then you can probably just reference JodaTime or Java Date / Time classes. If are using SparkSQL, then you can use the various Hive date functions for conversion. On Tue, Apr 14, 2015 at 11:04 AM BASAK, ANANDA wrote: > I need some help to convert the date patte

Re: Which Hive version should be used for Spark 1.3

2015-04-09 Thread Denny Lee
By default Spark 1.3 has bindings to Hive 0.13.1 though you can bind it to Hive 0.12 if you specify it in the profile when building Spark as per https://spark.apache.org/docs/1.3.0/building-spark.html. If you are downloading a pre built version of Spark 1.3 - then by default, it is set to Hive 0.1

Re: SQL can't not create Hive database

2015-04-09 Thread Denny Lee
Can you create the database directly within Hive? If you're getting the same error within Hive, it sounds like a permissions issue as per Bojan. More info can be found at: http://stackoverflow.com/questions/15898211/unable-to-create-database-path-file-user-hive-warehouse-error On Thu, Apr 9, 201

Re: Microsoft SQL jdbc support from spark sql

2015-04-07 Thread Denny Lee
That's correct, at this time MS SQL Server is not supported through the JDBC data source at this time. In my environment, we've been using Hadoop streaming to extract out data from multiple SQL Servers, pushing the data into HDFS, creating the Hive tables and/or converting them into Parquet, and t

Re: Microsoft SQL jdbc support from spark sql

2015-04-06 Thread Denny Lee
At this time, the JDBC Data source is not extensible so it cannot support SQL Server. There was some thoughts - credit to Cheng Lian for this - about making the JDBC data source extensible for third party support possibly via slick. On Mon, Apr 6, 2015 at 10:41 PM bipin wrote: > Hi, I am try

Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
I think something like this would work. You might need to play with the > type. > > df.explode("arrayBufferColumn") { x => x } > > > > On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee wrote: > >> Thanks Dean - fun hack :) >> >> On Fri, Apr 3, 2015

Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
eilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee wrote: > >> Thanks Michael - that was it! I was

Re: ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Apr 2, 2015 at 7:10 PM, Denny Lee wrote: > >> Quick question - the output of a dataframe is in the format of: >> >> [2015-04, ArrayBuffer(A, B, C, D)] >> >> and I'd like to return it as: >> >> 2015-04, A >> 2015-04, B >> 2015-04, C >> 2015-04, D >> >> What's the best way to do this? >> >> Thanks in advance! >> >> >> >

ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Quick question - the output of a dataframe is in the format of: [2015-04, ArrayBuffer(A, B, C, D)] and I'd like to return it as: 2015-04, A 2015-04, B 2015-04, C 2015-04, D What's the best way to do this? Thanks in advance!

Re: Creating Partitioned Parquet Tables via SparkSQL

2015-04-01 Thread Denny Lee
Thanks Felix :) On Wed, Apr 1, 2015 at 00:08 Felix Cheung wrote: > This is tracked by these JIRAs.. > > https://issues.apache.org/jira/browse/SPARK-5947 > https://issues.apache.org/jira/browse/SPARK-5948 > > -- > From: denny.g@gmail.com > Date: Wed, 1 Apr 2015 04:

Creating Partitioned Parquet Tables via SparkSQL

2015-03-31 Thread Denny Lee
Creating Parquet tables via .saveAsTable is great but was wondering if there was an equivalent way to create partitioned parquet tables. Thanks!

Re: Anyone has some simple example with spark-sql with spark 1.3

2015-03-30 Thread Denny Lee
Hi Vincent, This may be a case that you're missing a semi-colon after your CREATE TEMPORARY TABLE statement. I ran your original statement (missing the semi-colon) and got the same error as you did. As soon as I added it in, I was good to go again: CREATE TEMPORARY TABLE jsonTable USING org.apa

Re: Hive Table not from from Spark SQL

2015-03-27 Thread Denny Lee
Upon reviewing your other thread, could you confirm that your Hive metastore that you can connect to via Hive is a MySQL database? And to also confirm, when you're running spark-shell and doing a "show tables" statement, you're getting the same error? On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏

Re: spark-sql throws org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException

2015-03-26 Thread Denny Lee
If you're not using MySQL as your metastore for Hive, out of curiosity what are you using? The error you are seeing is common when there isn't the correct driver to allow Spark to connect to the Hive metastore because the correct driver isn't there. As well, I noticed that you're using SPARK_CLAS

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Denny Lee
BTW, a tool that I have been using to help do the preaggregation of data using hyperloglog in combination with Spark is atscale (http://atscale.com/). It builds the aggregations and makes use of the speed of SparkSQL - all within the context of a model that is accessible by Tableau or Qlik. On Thu

Re: [SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Denny Lee
Perhaps this email reference may be able to help from a DataFrame perspective: http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang wrote: > Hi, > > > > I ha

Re: Total size of serialized results is bigger than spark.driver.maxResultSize

2015-03-25 Thread Denny Lee
As you noted, you can change the spark.driver.maxResultSize value in your Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html). Please reference the Spark Properties section noting that you can modify these properties via the spark-defaults.conf or via SparkConf(). HTH!

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
t; instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient* > > Cheers, > Sandeep.v > > On Wed, Mar 25, 2015 at 11:10 AM, sandeep vura > wrote: > >> No I am just running ./spark-shell command in terminal I will try with >> above command >> >> On Wed,

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
Did you include the connection to a MySQL connector jar so that way spark-shell / hive can connect to the metastore? For example, when I run my spark-shell instance in standalone mode, I use: ./spark-shell --master spark://servername:7077 --driver-class-path /lib/mysql-connector-java-5.1.27.jar

Re: Hadoop 2.5 not listed in Spark 1.4 build page

2015-03-24 Thread Denny Lee
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile -Phadoop-2.4 Please note earlier in the link the section: # Apache Hadoop 2.4.X or 2.5.X mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package Versions of Hadoop after 2.5.X may or may not work with the -Ph

Re: Standalone Scheduler VS YARN Performance

2015-03-24 Thread Denny Lee
By any chance does this thread address look similar: http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html ? On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan < harut.martiros...@gmail.com> wrote: > What is performance overhead caused by YARN

Re: Should I do spark-sql query on HDFS or hive?

2015-03-23 Thread Denny Lee
>From the standpoint of Spark SQL accessing the files - when it is hitting Hive, it is in effect hitting HDFS as well. Hive provides a great framework where the table structure is already well defined.But underneath it, Hive is just accessing files from HDFS so you are hitting HDFS either way.

Re: Using a different spark jars than the one on the cluster

2015-03-23 Thread Denny Lee
+1 - I currently am doing what Marcelo is suggesting as I have a CDH 5.2 cluster (with Spark 1.1) and I'm also running Spark 1.3.0+ side-by-side in my cluster. On Wed, Mar 18, 2015 at 1:23 PM Marcelo Vanzin wrote: > Since you're using YARN, you should be able to download a Spark 1.3.0 > tarball

Re: Use pig load function in spark

2015-03-23 Thread Denny Lee
You may be able to utilize Spork (Pig on Apache Spark) as a mechanism to do this: https://github.com/sigmoidanalytics/spork On Mon, Mar 23, 2015 at 2:29 AM Dai, Kevin wrote: > Hi, all > > > > Can spark use pig’s load function to load data? > > > > Best Regards, > > Kevin. >

Re: Spark sql thrift server slower than hive

2015-03-22 Thread Denny Lee
How are you running your spark instance out of curiosity? Via YARN or standalone mode? When connecting Spark thriftserver to the Spark service, have you allocated enough memory and CPU when executing with spark? On Sun, Mar 22, 2015 at 3:39 AM fanooos wrote: > We have cloudera CDH 5.3 installe

Re: takeSample triggers 2 jobs

2015-03-06 Thread Denny Lee
Hi Rares, If you dig into the descriptions for the two jobs, it will probably return something like: Job ID: 1 org.apache.spark.rdd.RDD.takeSample(RDD.scala:447) $line41.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:22) ... Job ID: 0 org.apache.spark.rdd.RDD.takeSample(RDD.scala:428) $line41.$

Re: spark master shut down suddenly

2015-03-04 Thread Denny Lee
It depends on your setup but one of the locations is /var/log/mesos On Wed, Mar 4, 2015 at 19:11 lisendong wrote: > I ‘m sorry, but how to look at the mesos logs? > where are them? > > > > 在 2015年3月4日,下午6:06,Akhil Das 写道: > > > You can check in the mesos logs and see whats really happening. > >

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
e > location of default database for the > warehouse > > > Do I need to do anything explicitly other than placing hive-site.xml in > the spark.conf directory ? > > Thanks !! > > > > On Wed, Feb 25, 2015 at 11:42 AM, Denny Lee wrote: > >

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
The error message you have is: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one) Could you verify that you (the user you are running under) has the rights to create th

Re: How to start spark-shell with YARN?

2015-02-24 Thread Denny Lee
It may have to do with the akka heartbeat interval per SPARK-3923 - https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-3923 ? On Tue, Feb 24, 2015 at 16:40 Xi Shen wrote: > Hi Sean, > > I launched the spark-shell on the same machine as I started YARN service. > I don't think port

Re: Use case for data in SQL Server

2015-02-24 Thread Denny Lee
Hi Suhel, My team is currently working with a lot of SQL Server databases as one of our many data sources and ultimately we pull the data into HDFS from SQL Server. As we had a lot of SQL databases to hit, we used the jTDS driver and SQOOP to extract the data out of SQL Server and into HDFS (smal

Re: Spark SQL odbc on Windows

2015-02-23 Thread Denny Lee
imited. And thanks for writing the klout paper!! We were already > using it as a guideline for our tests. > > Best regards, > Francisco > -- > From: Denny Lee > Sent: ‎22/‎02/‎2015 17:56 > To: Ashic Mahtab ; Francisco Orchard ; > Apache Spark

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Back to thrift, there was an earlier thread on this topic at http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E that may be useful as well. On Sun Feb 22 2015 at 8:42:29 AM Denny Lee wrote: > Hi Franci

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Hi Francisco, Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular) from SSAS to Spark? As a past SSAS guy you've definitely piqued my interest. The one thing that you may run into is that the SQL generated by SSAS can be quite convoluted. When we were doing the same thing t

Re: Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
t; > On Fri, Feb 20, 2015 at 9:55 AM, Denny Lee wrote: > >> Quickly reviewing the latest SQL Programming Guide >> <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> >> (in github) I had a couple of quick questions: >> >> 1) Do we need t

Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
Quickly reviewing the latest SQL Programming Guide (in github) I had a couple of quick questions: 1) Do we need to instantiate the SparkContext as per // sc is an existing SparkContext. val sqlContext = new org.apache.spar

Re: Tableau beta connector

2015-02-05 Thread Denny Lee
t(sc) > > > Do some processing on RDD and persist it on hive using registerTempTable > > and tableau can extract that RDD persisted on hive. > > > Regards, > > Ashutosh > > > -- > *From:* Denny Lee > > *Sent:* Thursday, Fe

Re: Tableau beta connector

2015-02-04 Thread Denny Lee
rrect me if I am wrong. > > > I guess I have to look at how thrift server works. > -- > *From:* Denny Lee > *Sent:* Thursday, February 5, 2015 12:20 PM > *To:* İsmail Keskin; Ashutosh Trivedi (MT2013030) > *Cc:* user@spark.apache.org > *Subjec

Re: Tableau beta connector

2015-02-04 Thread Denny Lee
Some quick context behind how Tableau interacts with Spark / Hive can also be found at https://www.concur.com/blog/en-us/connect-tableau-to-sparksql - its for how to connect from Tableau to the thrift server before the official Tableau beta connector but should provide some of the additional conte

Re: Fail to launch spark-shell on windows 2008 R2

2015-02-03 Thread Denny Lee
Hi Ningjun, I have been working with Spark 1.2 on Windows 7 and Windows 2008 R2 (purely for development purposes). I had most recently installed them utilizing Java 1.8, Scala 2.10.4, and Spark 1.2 Precompiled for Hadoop 2.4+. A handy thread concerning the null\bin\winutils issue is addressed in

Re: Spark (SQL) as OLAP engine

2015-02-03 Thread Denny Lee
A great presentation by Evan Chan on utilizing Cassandra as Jonathan noted is at: OLAP with Cassandra and Spark http://www.slideshare.net/EvanChan2/2014-07olapcassspark. On Tue Feb 03 2015 at 10:03:34 AM Jonathan Haddad wrote: > Write out the rdd to a cassandra table. The datastax driver provid

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
ava : > > > METASTORE_CLIENT_CONNECT_RETRY_DELAY("hive.metastore.client.connect.retry.delay", > "1s", > new TimeValidator(TimeUnit.SECONDS), > "Number of seconds for the client to wait between consecutive > connection attempts"), > > It seems having the 's

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
I may be missing something here but typically when the hive-site.xml configurations do not require you to place "s" within the configuration itself. Both the retry.delay and socket.timeout values are in seconds so you should only need to place the integer value (which are in seconds). On Sun Feb

Spark 1.2 and Mesos 0.21.0 spark.executor.uri issue?

2014-12-30 Thread Denny Lee
I've been working with Spark 1.2 and Mesos 0.21.0 and while I have set the spark.executor.uri within spark-env.sh (and directly within bash as well), the Mesos slaves do not seem to be able to access the spark tgz file via HTTP or HDFS as per the message below. 14/12/30 15:57:35 INFO SparkILoop:

Re: S3 files , Spark job hungsup

2014-12-23 Thread Denny Lee
You should be able to kill the job using the webUI or via spark-class. More info can be found in the thread: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-kill-a-Spark-job-running-in-cluster-mode-td18583.html. HTH! On Tue, Dec 23, 2014 at 4:47 PM, durga wrote: > Hi All , > > It se

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
Sorry Ted! I saw profile (-P) but missed the -D. My bad! On Fri, Dec 19, 2014 at 16:46 Ted Yu wrote: > Here is the command I used: > > mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 > -Dhadoop.version=2.6.0 -Phive -DskipTests > > FYI > > On Fri, Dec 19, 2014 at 4

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
To clarify, there isn't a Hadoop 2.6 profile per se but you can build using -Dhadoop.version=2.4 which works with Hadoop 2.6. On Fri, Dec 19, 2014 at 12:55 Ted Yu wrote: > You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 > > Cheers > > On Fri, Dec 19, 2014 at 12:51 PM, sa wrote: >

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Denny Lee
u suggest I run to test this? But more importantly, what > information would this give me? > > On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee wrote: >> >> Oh, it makes sense of gsutil scans through this quickly, but I was >> wondering if running a Hadoop job / bdutil would res

  1   2   >