Re: Append to an existing Delta Lake using structured streaming

2021-07-21 Thread Denny Lee
Including the Delta Lake Users and Developers DL to help out. Saying this, could you clarify how data is not being added? By any chance do you have any code samples to recreate this? Sent via Superhuman On Wed, Jul 21, 2021 at 2:49 AM, wrote: > Hi

Re: Databricks notebook - cluster taking a long time to get created, often timing out

2021-08-17 Thread Denny Lee
Hi Karan, You may want to ping Databricks Help or Forums as this is a Databricks specific question. I'm a little surprised that a Databricks cluster would take a long time to create so it may be best to utilize these foru

Re: Prometheus with spark

2022-10-27 Thread Denny Lee
Hi Raja, A little atypical way to respond to your question - please check out the most recent Spark AMA where we discuss this: https://www.linkedin.com/posts/apachespark_apachespark-ama-committers-activity-6989052811397279744-jpWH?utm_source=share&utm_medium=member_ios HTH! Denny On Tue, Oct 2

Re: Online classes for spark topics

2023-03-08 Thread Denny Lee
We used to run Spark webinars on the Apache Spark LinkedIn group but honestly the turnout was pretty low. We had dove into various features. If there are particular topics that. you would like to discuss during a live session, pleas

Re: Online classes for spark topics

2023-03-12 Thread Denny Lee
Looks like we have some good topics here - I'm glad to help with setting up the infrastructure to broadcast if it helps? On Thu, Mar 9, 2023 at 6:19 AM neeraj bhadani wrote: > I am happy to be a part of this discussion as well. > > Regards, > Neeraj > > On Wed, 8 Mar 2023 at 22:41, Winston Lai

Re: Topics for Spark online classes & webinars

2023-03-14 Thread Denny Lee
rg or request to leverage the original Spark confluence page <https://cwiki.apache.org/confluence/display/SPARK>.WDYT? On Mon, Mar 13, 2023 at 9:34 AM Mich Talebzadeh wrote: > Well that needs to be created first for this purpose. The appropriate name > etc. to be decided. Maybe @Denny

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
Thanks Mich for tackling this! I encourage everyone to add to the list so we can have a comprehensive list of topics, eh?! On Wed, Mar 15, 2023 at 10:27 Mich Talebzadeh wrote: > Hi all, > > Thanks to @Denny Lee to give access to > > https://www.linkedin.com/company/apach

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Denny Lee
> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Wed, 15 Mar 2023 at 18:31, Nitin Bhansali >> wrote: >> >> Hello Mich, >> >> My apologies

Re: Slack for PySpark users

2023-03-27 Thread Denny Lee
+1 I think this is a great idea! On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon wrote: > Yeah, actually I think we should better have a slack channel so we can > easily discuss with users and developers. > > On Tue, 28 Mar 2023 at 03:08, keen wrote: > >> Hi all, >> I really like *Slack *as commun

Re: Slack for PySpark users

2023-03-30 Thread Denny Lee
>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> + @d...@spark.apache.org >>>>>>> >>>>>>> This is a good idea. The other Apache projects (e.g., Pinot, Druid, >>&g

Re: Slack for PySpark users

2023-04-03 Thread Denny Lee
>>>>>> medium size groups it is good and affordable. Alternatives have been >>>>>>> suggested as well so those who like investigative search can agree and >>>>>>> come >>>>>>> up with a freebie one. >>>>

Re: First Time contribution.

2023-09-17 Thread Denny Lee
Hi Ram, We have some good guidance at https://spark.apache.org/contributing.html HTH! Denny On Sun, Sep 17, 2023 at 17:18 ram manickam wrote: > > > > Hello All, > Recently, joined this community and would like to contribute. Is there a > guideline or recommendation on tasks that can be picked

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-05 Thread Denny Lee
I had run into the same problem where everything was working swimmingly with Spark 1.3.1. When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue. HTH! On Mon, Jul 6, 2015 at 8:31 AM Andy Huang wrote: > We have hit the same

Re: 1.4.0 regression: out-of-memory errors on small data

2015-07-06 Thread Denny Lee
4g > > /Sim > > Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/> > @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746 > > > From: Yin Huai > Date: Monday, July 6, 2015 at 12:59 AM > To: Simeon Simeonov > Cc:

Re: Spark SQL queries hive table, real time ?

2015-07-06 Thread Denny Lee
Within the context of your question, Spark SQL utilizing the Hive context is primarily about very fast queries. If you want to use real-time queries, I would utilize Spark Streaming. A couple of great resources on this topic include Guest Lecture on Spark Streaming in Stanford CME 323: Distribute

Re: Please add the Chicago Spark Users' Group to the community page

2015-07-06 Thread Denny Lee
Hey Dean, Sure, will take care of this. HTH, Denny On Tue, Jul 7, 2015 at 10:07 Dean Wampler wrote: > Here's our home page: http://www.meetup.com/Chicago-Spark-Users/ > > Thanks, > Dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition >

Re: Spark GraphFrames

2016-08-02 Thread Denny Lee
Hi Divya, Here's a blog post concerning On-Time Flight Performance with GraphFrames: https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html It also includes a Databricks notebook that has the code in it. HTH! Denny On Tue, Aug 2, 2016 at 1:16 A

Re: Intercept in Linear Regression

2015-12-15 Thread Denny Lee
If you're using model = LinearRegressionWithSGD.train(parseddata, iterations=100, step=0.01, intercept=True) then to get the intercept, you would use model.intercept More information can be found at: http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#module-pyspark.mllib.regression

Re: subscribe

2016-01-08 Thread Denny Lee
To subscribe, please go to http://spark.apache.org/community.html to join the mailing list. On Fri, Jan 8, 2016 at 3:58 AM Jeetendra Gangele wrote: > >

Re: How to compile Python and use How to compile Python and use spark-submit

2016-01-08 Thread Denny Lee
Per http://spark.apache.org/docs/latest/submitting-applications.html: For Python, you can use the --py-files argument of spark-submit to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg. O

Re: Meetup in Rome

2016-02-19 Thread Denny Lee
Hey Domenico, Glad to hear that you love Spark and would like to organize a meetup in Rome. We created a Meetup-in-a-box to help with that - check out the post https://databricks.com/blog/2015/11/19/meetup-in-a-box.html. HTH! Denny On Fri, Feb 19, 2016 at 02:38 Domenico Pontari wrote: > > Hi

Spark Survey Results 2015 are now available

2015-10-05 Thread Denny Lee
Thanks to all of you who provided valuable feedback in our Spark Survey 2015. Because of the survey, we have a better picture of who’s using Spark, how they’re using it, and what they’re using it to build–insights that will guide major updates to the Spark platform as we move into Spark’s next pha

Re: Best practises

2015-11-02 Thread Denny Lee
In addition, you may want to check out Tuning and Debugging in Apache Spark (https://sparkhub.databricks.com/video/tuning-and-debugging-apache-spark/) On Mon, Nov 2, 2015 at 05:27 Stefano Baghino wrote: > There is this interesting book from Databricks: > https://www.gitbook.com/book/databricks/d

Re: SQL Server to Spark

2015-07-23 Thread Denny Lee
It sort of depends on optimized. There is a good thread on the topic at http://search-hadoop.com/m/q3RTtJor7QBnWT42/Spark+and+SQL+server/v=threaded If you have an archival type strategy, you could do daily BCP extracts out to load the data into HDFS / S3 / etc. This would result in minimal impact

Re: GraphFrame BFS

2016-11-01 Thread Denny Lee
You should be able to GraphX or GraphFrames subgraph to build up your subgraph. A good example for GraphFrames can be found at: http://graphframes.github.io/user-guide.html#subgraphs. HTH! On Mon, Oct 10, 2016 at 9:32 PM cashinpj wrote: > Hello, > > I have a set of data representing various ne

Re: How do I convert a data frame to broadcast variable?

2016-11-03 Thread Denny Lee
If you're able to read the data in as a DataFrame, perhaps you can use a BroadcastHashJoin so that way you can join to that table presuming its small enough to distributed? Here's a handy guide on a BroadcastHashJoin: https://docs.cloud.databricks.com/docs/latest/databricks_guide/index.html#04%20S

Re: Newbie question - Best way to bootstrap with Spark

2016-11-06 Thread Denny Lee
The one you're looking for is the Data Sciences and Engineering with Apache Spark at https://www.edx.org/xseries/data-science-engineering-apacher-sparktm. Note, a great quick start is the Getting Started with Apache Spark on Databricks at https://databricks.com/product/getting-started-guide HTH!

Re: hope someone can recommend some books for me,a spark beginner

2016-11-06 Thread Denny Lee
There are a number of great resources to learn Apache Spark - a good starting point is the Apache Spark Documentation at: http://spark.apache.org/documentation.html The two books that immediately come to mind are - Learning Spark: http://shop.oreilly.com/product/mobile/0636920028512.do (there's

Re: Spark app write too many small parquet files

2016-11-27 Thread Denny Lee
Generally, yes - you should try to have larger data sizes due to the overhead of opening up files. Typical guidance is between 64MB-1GB; personally I usually stick with 128MB-512MB with the default of snappy codec compression with parquet. A good reference is Vida Ha's presentation Data Storage T

Re: UNSUBSCRIBE

2017-01-09 Thread Denny Lee
Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org HTH! On Mon, Jan 9, 2017 4:41 PM, Chris Murphy - ChrisSMurphy.com cont...@chrissmurphy.com wrote: PLEASE!!

Re: unsubscribe

2017-01-09 Thread Denny Lee
Please unsubscribe by sending an email to user-unsubscr...@spark.apache.org HTH! On Mon, Jan 9, 2017 4:40 PM, william tellme williamtellme...@gmail.com wrote:

Support Stored By Clause

2017-03-27 Thread Denny Lee
Per SPARK-19630, wondering if there are plans to support "STORED BY" clause for Spark 2.x? Thanks!

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Denny Lee
As well, perhaps another option could be to use the Spark Connector to DocumentDB (https://github.com/Azure/azure-documentdb-spark) if sticking with Scala? On Thu, Apr 20, 2017 at 21:46 Nan Zhu wrote: > DocDB does have a java client? Anything prevent you using that? > > Get Outlook for iOS

Re: Spark Shell issue on HDInsight

2017-05-08 Thread Denny Lee
This appears to be an issue with the Spark to DocumentDB connector, specifically version 0.0.1. Could you run the 0.0.3 version of the jar and see if you're still getting the same error? i.e. spark-shell --master yarn --jars azure-documentdb-spark-0.0.3-SNAPSHOT.jar,azure-documentdb-1.10.0.ja

Re: Spark Shell issue on HDInsight

2017-05-11 Thread Denny Lee
apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) > at org.apache.spark.deploy.SparkSubmit$.ma

Re: Spark Shell issue on HDInsight

2017-05-14 Thread Denny Lee
gt; Works for me tooyou are a life-saver :) > > But the question: should/how we report this to Azure team? > > On Fri, May 12, 2017 at 10:32 AM, Denny Lee wrote: > >> I was able to repro your issue when I had downloaded the jars via blob >> but when I downloaded them

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Denny Lee
This is amazingly awesome! :) On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com wrote: > That's great! > > > > On 12 July 2017 at 12:41, Felix Cheung wrote: > >> Awesome! Congrats!! >> >> -- >> *From:* holden.ka...@gmail.com on behalf of >> Holden Karau >> *Sent:*

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
That’s correct - you can use GraphFrames though as it does support PySpark. On Sat, Feb 17, 2018 at 17:36 94035420 wrote: > I can not find anything for graphx module in the python API document, does > it mean it is not supported yet? >

Re: Does Pyspark Support Graphx?

2018-02-17 Thread Denny Lee
hX vs. GraphFrames? On Sat, Feb 17, 2018 at 8:26 PM xiaobo wrote: > Thanks Denny, will it be supported in the near future? > > > > -- Original ------ > *From:* Denny Lee > *Date:* Sun,Feb 18,2018 11:05 AM > *To:* 94035420 > *Cc:* user@spark.a

Re: Does Pyspark Support Graphx?

2018-02-18 Thread Denny Lee
y, > The pyspark script uses the --packages option to load graphframe library, > what about the SparkLauncher class? > > > > -- Original -- > *From:* Denny Lee > *Date:* Sun,Feb 18,2018 11:07 AM > *To:* 94035420 > *Cc:* user@spark.apa

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee
+1 On Fri, May 31, 2019 at 17:58 Holden Karau wrote: > +1 > > On Fri, May 31, 2019 at 5:41 PM Bryan Cutler wrote: > >> +1 and the draft sounds good >> >> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: >> >>> Here is the draft announcement: >>> >>> === >>> Plan for dropping Python 2 suppor

Re: can we all help use our expertise to create an IT solution for Covid-19

2020-03-26 Thread Denny Lee
There are a number of really good datasets already available including (but not limited to): - South Korea COVID-19 Dataset - 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

Re: How to unsubscribe

2020-05-06 Thread Denny Lee
Hi Fred, To unsubscribe, could you please email: user-unsubscr...@spark.apache.org (for more information, please refer to https://spark.apache.org/community.html). Thanks! Denny On Wed, May 6, 2020 at 10:12 AM Fred Liu wrote: > Hi guys > > > > -

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
I may be missing something here but typically when the hive-site.xml configurations do not require you to place "s" within the configuration itself. Both the retry.delay and socket.timeout values are in seconds so you should only need to place the integer value (which are in seconds). On Sun Feb

Re: spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread Denny Lee
ava : > > > METASTORE_CLIENT_CONNECT_RETRY_DELAY("hive.metastore.client.connect.retry.delay", > "1s", > new TimeValidator(TimeUnit.SECONDS), > "Number of seconds for the client to wait between consecutive > connection attempts"), > > It seems having the 's

Re: Spark (SQL) as OLAP engine

2015-02-03 Thread Denny Lee
A great presentation by Evan Chan on utilizing Cassandra as Jonathan noted is at: OLAP with Cassandra and Spark http://www.slideshare.net/EvanChan2/2014-07olapcassspark. On Tue Feb 03 2015 at 10:03:34 AM Jonathan Haddad wrote: > Write out the rdd to a cassandra table. The datastax driver provid

Re: Fail to launch spark-shell on windows 2008 R2

2015-02-03 Thread Denny Lee
Hi Ningjun, I have been working with Spark 1.2 on Windows 7 and Windows 2008 R2 (purely for development purposes). I had most recently installed them utilizing Java 1.8, Scala 2.10.4, and Spark 1.2 Precompiled for Hadoop 2.4+. A handy thread concerning the null\bin\winutils issue is addressed in

Re: Tableau beta connector

2015-02-04 Thread Denny Lee
Some quick context behind how Tableau interacts with Spark / Hive can also be found at https://www.concur.com/blog/en-us/connect-tableau-to-sparksql - its for how to connect from Tableau to the thrift server before the official Tableau beta connector but should provide some of the additional conte

Re: Tableau beta connector

2015-02-04 Thread Denny Lee
rrect me if I am wrong. > > > I guess I have to look at how thrift server works. > -- > *From:* Denny Lee > *Sent:* Thursday, February 5, 2015 12:20 PM > *To:* İsmail Keskin; Ashutosh Trivedi (MT2013030) > *Cc:* user@spark.apache.org > *Subjec

Re: Tableau beta connector

2015-02-05 Thread Denny Lee
t(sc) > > > Do some processing on RDD and persist it on hive using registerTempTable > > and tableau can extract that RDD persisted on hive. > > > Regards, > > Ashutosh > > > -- > *From:* Denny Lee > > *Sent:* Thursday, Fe

Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
Quickly reviewing the latest SQL Programming Guide (in github) I had a couple of quick questions: 1) Do we need to instantiate the SparkContext as per // sc is an existing SparkContext. val sqlContext = new org.apache.spar

Re: Spark 1.3 SQL Programming Guide and sql._ / sql.types._

2015-02-20 Thread Denny Lee
t; > On Fri, Feb 20, 2015 at 9:55 AM, Denny Lee wrote: > >> Quickly reviewing the latest SQL Programming Guide >> <https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md> >> (in github) I had a couple of quick questions: >> >> 1) Do we need t

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Hi Francisco, Out of curiosity - why ROLAP mode using multi-dimensional mode (vs tabular) from SSAS to Spark? As a past SSAS guy you've definitely piqued my interest. The one thing that you may run into is that the SQL generated by SSAS can be quite convoluted. When we were doing the same thing t

Re: Spark SQL odbc on Windows

2015-02-22 Thread Denny Lee
Back to thrift, there was an earlier thread on this topic at http://mail-archives.apache.org/mod_mbox/spark-user/201411.mbox/%3CCABPQxsvXA-ROPeXN=wjcev_n9gv-drqxujukbp_goutvnyx...@mail.gmail.com%3E that may be useful as well. On Sun Feb 22 2015 at 8:42:29 AM Denny Lee wrote: > Hi Franci

Re: Spark SQL odbc on Windows

2015-02-23 Thread Denny Lee
imited. And thanks for writing the klout paper!! We were already > using it as a guideline for our tests. > > Best regards, > Francisco > -- > From: Denny Lee > Sent: ‎22/‎02/‎2015 17:56 > To: Ashic Mahtab ; Francisco Orchard ; > Apache Spark

Re: Use case for data in SQL Server

2015-02-24 Thread Denny Lee
Hi Suhel, My team is currently working with a lot of SQL Server databases as one of our many data sources and ultimately we pull the data into HDFS from SQL Server. As we had a lot of SQL databases to hit, we used the jTDS driver and SQOOP to extract the data out of SQL Server and into HDFS (smal

Re: How to start spark-shell with YARN?

2015-02-24 Thread Denny Lee
It may have to do with the akka heartbeat interval per SPARK-3923 - https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-3923 ? On Tue, Feb 24, 2015 at 16:40 Xi Shen wrote: > Hi Sean, > > I launched the spark-shell on the same machine as I started YARN service. > I don't think port

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
The error message you have is: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/user/hive/warehouse/src is not a directory or unable to create one) Could you verify that you (the user you are running under) has the rights to create th

Re: Unable to run hive queries inside spark

2015-02-24 Thread Denny Lee
e > location of default database for the > warehouse > > > Do I need to do anything explicitly other than placing hive-site.xml in > the spark.conf directory ? > > Thanks !! > > > > On Wed, Feb 25, 2015 at 11:42 AM, Denny Lee wrote: > >

Re: spark master shut down suddenly

2015-03-04 Thread Denny Lee
It depends on your setup but one of the locations is /var/log/mesos On Wed, Mar 4, 2015 at 19:11 lisendong wrote: > I ‘m sorry, but how to look at the mesos logs? > where are them? > > > > 在 2015年3月4日,下午6:06,Akhil Das 写道: > > > You can check in the mesos logs and see whats really happening. > >

Re: takeSample triggers 2 jobs

2015-03-06 Thread Denny Lee
Hi Rares, If you dig into the descriptions for the two jobs, it will probably return something like: Job ID: 1 org.apache.spark.rdd.RDD.takeSample(RDD.scala:447) $line41.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:22) ... Job ID: 0 org.apache.spark.rdd.RDD.takeSample(RDD.scala:428) $line41.$

Re: Spark sql thrift server slower than hive

2015-03-22 Thread Denny Lee
How are you running your spark instance out of curiosity? Via YARN or standalone mode? When connecting Spark thriftserver to the Spark service, have you allocated enough memory and CPU when executing with spark? On Sun, Mar 22, 2015 at 3:39 AM fanooos wrote: > We have cloudera CDH 5.3 installe

Re: Use pig load function in spark

2015-03-23 Thread Denny Lee
You may be able to utilize Spork (Pig on Apache Spark) as a mechanism to do this: https://github.com/sigmoidanalytics/spork On Mon, Mar 23, 2015 at 2:29 AM Dai, Kevin wrote: > Hi, all > > > > Can spark use pig’s load function to load data? > > > > Best Regards, > > Kevin. >

Re: Using a different spark jars than the one on the cluster

2015-03-23 Thread Denny Lee
+1 - I currently am doing what Marcelo is suggesting as I have a CDH 5.2 cluster (with Spark 1.1) and I'm also running Spark 1.3.0+ side-by-side in my cluster. On Wed, Mar 18, 2015 at 1:23 PM Marcelo Vanzin wrote: > Since you're using YARN, you should be able to download a Spark 1.3.0 > tarball

Re: Should I do spark-sql query on HDFS or hive?

2015-03-23 Thread Denny Lee
>From the standpoint of Spark SQL accessing the files - when it is hitting Hive, it is in effect hitting HDFS as well. Hive provides a great framework where the table structure is already well defined.But underneath it, Hive is just accessing files from HDFS so you are hitting HDFS either way.

Re: Standalone Scheduler VS YARN Performance

2015-03-24 Thread Denny Lee
By any chance does this thread address look similar: http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html ? On Tue, Mar 24, 2015 at 5:23 AM Harut Martirosyan < harut.martiros...@gmail.com> wrote: > What is performance overhead caused by YARN

Re: Hadoop 2.5 not listed in Spark 1.4 build page

2015-03-24 Thread Denny Lee
Hadoop 2.5 would be referenced as via -Dhadoop-2.5 using the profile -Phadoop-2.4 Please note earlier in the link the section: # Apache Hadoop 2.4.X or 2.5.X mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=VERSION -DskipTests clean package Versions of Hadoop after 2.5.X may or may not work with the -Ph

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
Did you include the connection to a MySQL connector jar so that way spark-shell / hive can connect to the metastore? For example, when I run my spark-shell instance in standalone mode, I use: ./spark-shell --master spark://servername:7077 --driver-class-path /lib/mysql-connector-java-5.1.27.jar

Re: Errors in SPARK

2015-03-24 Thread Denny Lee
t; instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient* > > Cheers, > Sandeep.v > > On Wed, Mar 25, 2015 at 11:10 AM, sandeep vura > wrote: > >> No I am just running ./spark-shell command in terminal I will try with >> above command >> >> On Wed,

Re: Total size of serialized results is bigger than spark.driver.maxResultSize

2015-03-25 Thread Denny Lee
As you noted, you can change the spark.driver.maxResultSize value in your Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html). Please reference the Spark Properties section noting that you can modify these properties via the spark-defaults.conf or via SparkConf(). HTH!

Re: [SparkSQL] How to calculate stddev on a DataFrame?

2015-03-25 Thread Denny Lee
Perhaps this email reference may be able to help from a DataFrame perspective: http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfbnyk4z43wkcq4fkdcbwmgf_3_o...@mail.gmail.com%3E On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang wrote: > Hi, > > > > I ha

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Denny Lee
BTW, a tool that I have been using to help do the preaggregation of data using hyperloglog in combination with Spark is atscale (http://atscale.com/). It builds the aggregations and makes use of the speed of SparkSQL - all within the context of a model that is accessible by Tableau or Qlik. On Thu

Re: spark-sql throws org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException

2015-03-26 Thread Denny Lee
If you're not using MySQL as your metastore for Hive, out of curiosity what are you using? The error you are seeing is common when there isn't the correct driver to allow Spark to connect to the Hive metastore because the correct driver isn't there. As well, I noticed that you're using SPARK_CLAS

Re: Hive Table not from from Spark SQL

2015-03-27 Thread Denny Lee
Upon reviewing your other thread, could you confirm that your Hive metastore that you can connect to via Hive is a MySQL database? And to also confirm, when you're running spark-shell and doing a "show tables" statement, you're getting the same error? On Fri, Mar 27, 2015 at 6:08 AM ÐΞ€ρ@Ҝ (๏̯͡๏

Re: Anyone has some simple example with spark-sql with spark 1.3

2015-03-30 Thread Denny Lee
Hi Vincent, This may be a case that you're missing a semi-colon after your CREATE TEMPORARY TABLE statement. I ran your original statement (missing the semi-colon) and got the same error as you did. As soon as I added it in, I was good to go again: CREATE TEMPORARY TABLE jsonTable USING org.apa

Creating Partitioned Parquet Tables via SparkSQL

2015-03-31 Thread Denny Lee
Creating Parquet tables via .saveAsTable is great but was wondering if there was an equivalent way to create partitioned parquet tables. Thanks!

Re: Creating Partitioned Parquet Tables via SparkSQL

2015-04-01 Thread Denny Lee
Thanks Felix :) On Wed, Apr 1, 2015 at 00:08 Felix Cheung wrote: > This is tracked by these JIRAs.. > > https://issues.apache.org/jira/browse/SPARK-5947 > https://issues.apache.org/jira/browse/SPARK-5948 > > -- > From: denny.g@gmail.com > Date: Wed, 1 Apr 2015 04:

ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Quick question - the output of a dataframe is in the format of: [2015-04, ArrayBuffer(A, B, C, D)] and I'd like to return it as: 2015-04, A 2015-04, B 2015-04, C 2015-04, D What's the best way to do this? Thanks in advance!

Re: ArrayBuffer within a DataFrame

2015-04-02 Thread Denny Lee
Apr 2, 2015 at 7:10 PM, Denny Lee wrote: > >> Quick question - the output of a dataframe is in the format of: >> >> [2015-04, ArrayBuffer(A, B, C, D)] >> >> and I'd like to return it as: >> >> 2015-04, A >> 2015-04, B >> 2015-04, C >> 2015-04, D >> >> What's the best way to do this? >> >> Thanks in advance! >> >> >> >

Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
eilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Thu, Apr 2, 2015 at 10:45 PM, Denny Lee wrote: > >> Thanks Michael - that was it! I was

Re: ArrayBuffer within a DataFrame

2015-04-03 Thread Denny Lee
I think something like this would work. You might need to play with the > type. > > df.explode("arrayBufferColumn") { x => x } > > > > On Fri, Apr 3, 2015 at 6:43 AM, Denny Lee wrote: > >> Thanks Dean - fun hack :) >> >> On Fri, Apr 3, 2015

Re: Microsoft SQL jdbc support from spark sql

2015-04-06 Thread Denny Lee
At this time, the JDBC Data source is not extensible so it cannot support SQL Server. There was some thoughts - credit to Cheng Lian for this - about making the JDBC data source extensible for third party support possibly via slick. On Mon, Apr 6, 2015 at 10:41 PM bipin wrote: > Hi, I am try

Re: Microsoft SQL jdbc support from spark sql

2015-04-07 Thread Denny Lee
That's correct, at this time MS SQL Server is not supported through the JDBC data source at this time. In my environment, we've been using Hadoop streaming to extract out data from multiple SQL Servers, pushing the data into HDFS, creating the Hive tables and/or converting them into Parquet, and t

Re: Which Hive version should be used for Spark 1.3

2015-04-09 Thread Denny Lee
By default Spark 1.3 has bindings to Hive 0.13.1 though you can bind it to Hive 0.12 if you specify it in the profile when building Spark as per https://spark.apache.org/docs/1.3.0/building-spark.html. If you are downloading a pre built version of Spark 1.3 - then by default, it is set to Hive 0.1

Re: SQL can't not create Hive database

2015-04-09 Thread Denny Lee
Can you create the database directly within Hive? If you're getting the same error within Hive, it sounds like a permissions issue as per Bojan. More info can be found at: http://stackoverflow.com/questions/15898211/unable-to-create-database-path-file-user-hive-warehouse-error On Thu, Apr 9, 201

Re: Converting Date pattern in scala code

2015-04-14 Thread Denny Lee
If you're doing in Scala per se - then you can probably just reference JodaTime or Java Date / Time classes. If are using SparkSQL, then you can use the various Hive date functions for conversion. On Tue, Apr 14, 2015 at 11:04 AM BASAK, ANANDA wrote: > I need some help to convert the date patte

Re: Microsoft SQL jdbc support from spark sql

2015-04-16 Thread Denny Lee
Bummer - out of curiosity, if you were to use the classpath.first or perhaps copy the jar to the slaves could that actually do the trick? The latter isn't really all that efficient but just curious if that could do the trick. On Thu, Apr 16, 2015 at 7:14 AM ARose wrote: > I take it back. My so

Re: Which version of Hive QL is Spark 1.3.0 using?

2015-04-17 Thread Denny Lee
Support for sub queries in predicates hasn't been resolved yet - please refer to SPARK-4226 BTW, Spark 1.3 default bindings to Hive 0.13.1 On Fri, Apr 17, 2015 at 09:18 ARose wrote: > So I'm trying to store the results of a query into a DataFrame, but I get > the > following exception thrown

Re: Skipped Jobs

2015-04-19 Thread Denny Lee
The job is skipped because the results are available in memory from a prior run. More info at: http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3ccakx7bf-u+jc6q_zm7gtsj1mihagd_4up4qxpd9jfdjrfjax...@mail.gmail.com%3E. HTH! On Sun, Apr 19, 2015 at 1:43 PM James King wrote: > In th

Re: Skipped Jobs

2015-04-19 Thread Denny Lee
Thanks for the correction Mark :) On Sun, Apr 19, 2015 at 3:45 PM Mark Hamstra wrote: > Almost. Jobs don't get skipped. Stages and Tasks do if the needed > results are already available. > > On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee wrote: > >> The job is skipp

Re: Start ThriftServer Error

2015-04-22 Thread Denny Lee
You may need to specify the hive port itself. For example, my own Thrift start command is in the form: ./sbin/start-thriftserver.sh --master spark://$myserver:7077 --driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host $myserver --hiveconf hive.server2.thrift.port 1 HTH! O

Re: Spark Cluster Setup

2015-04-27 Thread Denny Lee
Similar to what Dean called out, we build Puppet manifests so we could do the automation - its a bit of work to setup, but well worth the effort. On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler wrote: > It's mostly manual. You could try automating with something like Chef, of > course, but there's

Re: how to delete data from table in sparksql

2015-05-14 Thread Denny Lee
Delete from table is available as part of Hive 0.14 (reference: Apache Hive > Language Manual DML - Delete ) while Spark 1.3 defaults to Hive 0.13.Perhaps rebuild Spark with Hive 0.14 or generate a new

Hive Skew flag?

2015-05-15 Thread Denny Lee
Just wondering if we have any timeline on when the hive skew flag will be included within SparkSQL? Thanks! Denny

Seattle Spark Meetup: Machine Learning Streams with Spark 1.0

2014-06-05 Thread Denny Lee
If you’re in the Seattle area on 6/24, come join us at Madrona Ventures building in downtown Seattle to join the session: Machine Learning Streams with Spark 1.0.   For more information, please check out our meetup event:  http://www.meetup.com/Seattle-Spark-Meetup/events/187375042/ Enjoy! Denn

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
By any chance do you have HDP 2.1 installed? you may need to install the utils and update the env variables per http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows > On Jul 2, 2014, at 10:20 AM, Konstantin Kudryavtsev > wrote: > > Hi Andrew, > > it's windows 7

Re: Run spark unit test on Windows 7

2014-07-02 Thread Denny Lee
cular issue. On Wed, Jul 2, 2014 at 12:04 PM, Kostiantyn Kudriavtsev < kudryavtsev.konstan...@gmail.com> wrote: > No, I don’t > > why do I need to have HDP installed? I don’t use Hadoop at all and I’d > like to read data from local filesystem > > On Jul 2, 2014, at 9:10 PM,

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
=hdinsight 2) put this file into d:\winutil\bin 3) add in my test: System.setProperty("hadoop.home.dir", "d:\\winutil\\") after that test runs Thank you, Konstantin Kudryavtsev On Wed, Jul 2, 2014 at 10:24 PM, Denny Lee wrote: You don't actually need it per se - its ju

Re: Run spark unit test on Windows 7

2014-07-03 Thread Denny Lee
Thanks! will take a look at this later today. HTH! > On Jul 3, 2014, at 11:09 AM, Kostiantyn Kudriavtsev > wrote: > > Hi Denny, > > just created https://issues.apache.org/jira/browse/SPARK-2356 > >> On Jul 3, 2014, at 7:06 PM, Denny Lee wrote: >> &

Seattle Spark Meetup slides: xPatterns, Fun Things, and Machine Learning Streams - next is Interactive OLAP

2014-07-07 Thread Denny Lee
Apologies for the delay but we’ve had a bunch of great slides and sessions at Seattle Spark Meetup this past couple of months including Claudiu Barbura’s "xPatterns on Spark, Shark, Mesos, and Tachyon"; Paco Nathan’s "Fun Things You Can Do with Spark 1.0”, and "Machine Learning Streams with Spar

  1   2   >