Spark Streaming and database access (e.g. MySQL)

2014-09-06 Thread jchen
Hi, Has someone tried using Spark Streaming with MySQL (or any other database/data store)? I can write to MySQL at the beginning of the driver application. However, when I am trying to write the result of every streaming processing window to MySQL, it fails with the following error: org.apache.sp

Re: prepending jars to the driver class path for spark-submit on YARN

2014-09-06 Thread Victor Tso-Guillen
I ran into the same issue. What I did was use maven shade plugin to shade my version of httpcomponents libraries into another package. On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza < pesp...@societyconsulting.com> wrote: > Hey - I’m struggling with some dependency issues with > org.apache.http

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, sorry about that. That link you mentioned is probably the one for the products. We don't have one pointing from adatao.com to ddf.io; maybe we'll add it. As for access to the code base itself, I think the team has already created a GitHub repo for it, and should open it up within a few wee

Re: Is there any way to control the parallelism in LogisticRegression

2014-09-06 Thread DB Tsai
Yes. But you need to store RDD as *serialized* Java objects. See the session of storage level http://spark.apache.org/docs/latest/programming-guide.html Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in

Q: About scenarios where driver execution flow may block...

2014-09-06 Thread didata
Hello friends: I have a theory question about call blocking in a Spark driver. Consider this (admittedly contrived =:)) snippet to illustrate this question... x = rdd01.reduceByKey() # or maybe some other 'shuffle-requiring action'. b = sc.broadcast(x. take(20)) # Or any statement that r

Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Davies Liu
But you can not get what you expected in PySpark, because the RDD in Scala is serialized, so it will always be RDD[Array[Byte]], whatever the type of RDD in Python is. Davies On Sat, Sep 6, 2014 at 4:09 AM, Aaron Davidson wrote: > Pretty easy to do in Scala: > > rdd.elementClassTag.runtimeClas

Re: Spark SQL check if query is completed (pyspark)

2014-09-06 Thread Davies Liu
The SQLContext.sql() will return an SchemaRDD, you need to call collect() to pull the data in. On Sat, Sep 6, 2014 at 6:02 AM, jamborta wrote: > Hi, > > I am using Spark SQL to run some administrative queries and joins (e.g. > create table, insert overwrite, etc), where the query does not return

Re: Support R in Spark

2014-09-06 Thread oppokui
Thanks, Christopher. I saw it before, it is amazing. Last time I try to download it from adatao, but no response after filling the table. How can I download it or its source code? What is the license? Kui > On Sep 6, 2014, at 8:08 PM, Christopher Nguyen wrote: > > Hi Kui, > > DDF (open sour

Re: unsubscribe

2014-09-06 Thread Nicholas Chammas
To unsubscribe send an email to user-unsubscr...@spark.apache.org Links to sub/unsub are here: https://spark.apache.org/community.html On Sat, Sep 6, 2014 at 7:52 AM, Derek Schoettle wrote: > Unsubscribe > > > On Sep 6, 2014, at 7:48 AM, "Murali Raju" > wrote: > > > > >

Spark SQL check if query is completed (pyspark)

2014-09-06 Thread jamborta
Hi, I am using Spark SQL to run some administrative queries and joins (e.g. create table, insert overwrite, etc), where the query does not return any data. I noticed if the query fails it prints some error message on the console, but does not actually throw an exception (this is spark 1.0.2). Is

Re: How spark parallelize maps Slices to tasks/executors/workers

2014-09-06 Thread Matthew Farrellee
On 09/04/2014 09:55 PM, Mozumder, Monir wrote: I have this 2-node cluster setup, where each node has 4-cores. MASTER (Worker-on-master) (Worker-on-node1) (slaves(master,node1)) SPARK_WORKER_INSTANCES=1 I am trying to understand Spark's parallelize behavior. The

Re: Task not serializable

2014-09-06 Thread Sarath Chandra
Thanks Alok, Sean. As suggested by Sean, I tried a sample program. I wrote a function in which I made a reference to a class from third party library that is not serialized and passed it to my map function. On executing I got same exception. Then I modified the program removed function and writte

Re: Support R in Spark

2014-09-06 Thread Christopher Nguyen
Hi Kui, DDF (open sourced) also aims to do something similar, adding RDBMS idioms, and is already implemented on top of Spark. One philosophy is that the DDF API aggressively hides the notion of parallel datasets, exposing only (mutable) tables to users, on which they can apply R and other famili

Re: unsubscribe

2014-09-06 Thread Derek Schoettle
Unsubscribe > On Sep 6, 2014, at 7:48 AM, "Murali Raju" wrote: > >

unsubscribe

2014-09-06 Thread Murali Raju

Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Aaron Davidson
Pretty easy to do in Scala: rdd.elementClassTag.runtimeClass You can access this method from Python as well by using the internal _jrdd. It would look something like this (warning, I have not tested it): rdd._jrdd.classTag().runtimeClass() (The method name is "classTag" for JavaRDDLike, and "ele

Re: Task not serializable

2014-09-06 Thread Sean Owen
I disagree that the generally right change is to try to make the classes serializable. Usually, classes that are not serializable are not supposed to be serialized. You're using them in a way that's causing them to be serialized, and that's probably not desired. For example, this is wrong: val fo

Re: question on replicate() in blockManager.scala

2014-09-06 Thread Aaron Davidson
Looks like that's BlockManagerWorker.syncPutBlock(), which is in an if check, perhaps obscuring its existence. On Fri, Sep 5, 2014 at 2:19 AM, rapelly kartheek wrote: > Hi, > > var cachedPeers: Seq[BlockManagerId] = null > private def replicate(blockId: String, data: ByteBuffer, level: > Stor

Re: error: type mismatch while Union

2014-09-06 Thread Aaron Davidson
Are you doing this from the spark-shell? You're probably running into https://issues.apache.org/jira/browse/SPARK-1199 which should be fixed in 1.1. On Sat, Sep 6, 2014 at 3:03 AM, Dhimant wrote: > I am using Spark version 1.0.2 > > > > > -- > View this message in context: > http://apache-spark

Re: How to change the values in Array of Bytes

2014-09-06 Thread Aaron Davidson
More of a Scala question than Spark, but "apply" here can be written with just parentheses like this: val array = Array.fill[Byte](10)(0) if (array(index) == 0) { array(index) = 1 } The second instance of "array(index) = 1" is actually not calling apply, but "update". It's a scala-ism that's us

Re: error: type mismatch while Union

2014-09-06 Thread Dhimant
I am using Spark version 1.0.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-type-mismatch-while-Union-tp13547p13618.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Support R in Spark

2014-09-06 Thread oppokui
Cool! It is a very good news. Can’t wait for it. Kui > On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman > wrote: > > Thanks Kui. SparkR is a pretty young project, but there are a bunch of > things we are working on. One of the main features is to expose a data > frame API (https://sparkr.atl

How to change the values in Array of Bytes

2014-09-06 Thread Deep Pradhan
Hi, I have an array of bytes and I have filled the array with 0 in all the postitions. *var Array = Array.fill[Byte](10)(0)* Now, if certain conditions are satisfied, I want to change some elements of the array to 1 instead of 0. If I run, *if (Array.apply(index)==0) Array.apply(index) = 1* it