Question about TTL with TorrentBroadcastFactory in Spark-1.2.0

2014-12-21 Thread 顾亮亮
Hi All, I am facing a problem when using TTL with TorrentBroadcastFactory in Spark-1.2.0. My code is as follows: val conf = new SparkConf(). setAppName("TTL_Broadcast_vars"). setMaster("local"). //set("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFa

Python:Streaming Question

2014-12-21 Thread Samarth Mailinglist
I’m trying to run the stateful network word count at https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/stateful_network_wordcount.py using the command: ./bin/spark-submit examples/src/main/python/streaming/stateful_network_wordcount.py localhost I am also running

Re: locality sensitive hashing for spark

2014-12-21 Thread Nick Pentreath
Looks interesting thanks for sharing. Does it support cosine similarity ? I only saw jaccard mentioned from a quick glance. — Sent from Mailbox On Mon, Dec 22, 2014 at 4:12 AM, morr0723 wrote: > I've pushed out an implementation of locality sensitive hashing for spark. > LSH has a number of

S3 files , Spark job hungsup

2014-12-21 Thread durga
Hi All, I am facing a strange issue sporadically. occasionally my spark job is hungup on reading s3 files. It is not throwing exception . or making some progress, it is just hungs up there. Is this a known issue , Please let me know how could I solve this issue. Thanks, -D -- View this messag

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
One more question. How would I submit additional jars to the spark-submit job. I used --jars option, it seems it is not working as explained earlier. Thanks for the help, -D -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-sql-SQLException-No-suitable

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
Hi All, I tried to make combined.jar in shell script . it is working when I am using spark-shell. But for the spark-submit it is same issue. Help is highly appreciated. Thanks -D -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-sql-SQLException-No-sui

Parquet schema changes

2014-12-21 Thread Adam Gilmore
Hi all, I understand that parquet allows for schema versioning automatically in the format; however, I'm not sure whether Spark supports this. I'm saving a SchemaRDD to a parquet file, registering it as a table, then doing an insertInto with a SchemaRDD with an extra column. The second SchemaRDD

locality sensitive hashing for spark

2014-12-21 Thread morr0723
I've pushed out an implementation of locality sensitive hashing for spark. LSH has a number of use cases, most prominent being if the features are not based in Euclidean space. Code, documentation, and small exemplar dataset is available on github: https://github.com/mrsqueeze/spark-hash Feel f

Issue with Parquet on Spark 1.2 and Amazon EMR

2014-12-21 Thread Adam Gilmore
Hi all, I've just launched a new Amazon EMR cluster and used the script at: s3://support.elasticmapreduce/spark/install-spark to install Spark (this script was upgraded to support 1.2). I know there are tools to launch a Spark cluster in EC2, but I want to use EMR. Everything installs fine; ho

Re: Find the file info of when load the data into RDD

2014-12-21 Thread Anwar Rizal
Yeah..., buat apparently mapPartitionsWithInputSplit thing is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that, I'm not sure that it's a good idea to use the function. For this problem, I had to create a subclass HadoopRDD and use mapPartitions instead. Is there any reason w

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread Michael Armbrust
With JDBC you often need to load the class so it can register the driver at the beginning of your program. Usually this is something like: Class.forName("com.mysql.jdbc.Driver"); On Fri, Dec 19, 2014 at 3:47 PM, durga wrote: > Hi I am facing an issue with mysql jars with spark-submit. > > I a

Re: Find the file info of when load the data into RDD

2014-12-21 Thread Shuai Zheng
I just found a possible answer: http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ Will give a try on it. Although it is a bit troublesome, but if it works, will give what I want. Sorry for bother everyone here Regards, Shuai On Sun, Dec 21, 2014 at 4:43 P

Find the file info of when load the data into RDD

2014-12-21 Thread Shuai Zheng
Hi All, When I try to load a folder into the RDDs, any way for me to find the input file name of particular partitions? So I can track partitions from which file. In the hadoop, I can find this information through the code: FileSplit fileSplit = (FileSplit) context.getInputSplit(); String strFil

need help with simple http request mapper

2014-12-21 Thread kmatzen
I have what I think is a pretty simple task and one that works pretty well with Celery . I wanted to see how easy it was to configure for Spark since I already run a Mesos cluster for something else. But I had a pretty hard time getting Spark configured so that i

Re: Network file input cannot be recognized?

2014-12-21 Thread Akhil Das
Did you try? sc.textFile("file:networklocation\\README.md") Thanks Best Regards On Sun, Dec 21, 2014 at 5:34 PM, Shuai Zheng wrote: > Hi, > > I am running a code which takes a network file (not HDFS) location as > input. But sc.textFile("networklocation\\README.md") can't recognize > t

Re: Passing Spark Configuration from Driver (Master) to all of the Slave nodes

2014-12-21 Thread Shuai Zheng
Agree. I did similar things last week. The only issue is create a subclass of configuration to implement serializable interface. The Demi' solution is a bit overkill for this simple requirement On Tuesday, December 16, 2014, Gerard Maas wrote: > Hi Demi, > > Thanks for sharing. > > What we usual

Re: spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-21 Thread andy petrella
Actually yes, things like interactive notebooks f.i. On Sun Dec 21 2014 at 11:35:18 AM Sean Owen wrote: > I'm only speculating, but I wonder if it was on purpose? would people > ever build an app against the REPL? > > On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng wrote: > > Everything else is the

Re: does spark sql support columnar compression with encoding when caching tables

2014-12-21 Thread Cheng Lian
Would like to add that compression schemes built in in-memory columnar storage only supports primitive columns (int, string, etc.), complex types like array, map and struct are not supported. On 12/20/14 6:17 AM, Sadhan Sood wrote: Hey Michael, Thank you for clarifying that. Is tachyon the ri

Re: integrating long-running Spark jobs with Thriftserver

2014-12-21 Thread Cheng Lian
Hi Schweichler, This is an interesting and practical question. I'm not familiar with how Tableau works, but would like to share some thoughts. In general, big data analytics frameworks like MR and Spark tend to perform immutable functional transformations over immutable data. Whilst in your

Re: SparkSQL 1.2.1-snapshot Left Join problem

2014-12-21 Thread Cheng Lian
Could you please file a JIRA together with the Git commit you're using? Thanks! On 12/18/14 2:32 AM, Hao Ren wrote: Hi, When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following query does not work: create table debug as select v1.* from t1 as v1 left join t2 as v2 on v1.sku

Network file input cannot be recognized?

2014-12-21 Thread Shuai Zheng
Hi, I am running a code which takes a network file (not HDFS) location as input. But sc.textFile("networklocation\\README.md") can't recognize the network location start with "" as a valid location, because it only accept HDFS and local like file name format? Anyone has idea how can I use

Re: Spark SQL DSL for joins?

2014-12-21 Thread Cheng Lian
On 12/17/14 1:43 PM, Jerry Raj wrote: Hi, I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I have two tables (backed by Parquet files) and I need to do a join across them using a common field (user_id). This works fine using standard SQL but not using the language-integrat

Re: Sharing sqlContext between Akka router and "routee" actors ...

2014-12-21 Thread Cheng Lian
SQLContext is thread safe, so this should be generally safe, otherwise it should be a bug. However, please note that if you're using HiveContext (which is usually recommended), all the routee actors share a single Hive session. This means if one routee execute "use db1", then the current databa

Re: Querying registered RDD (AsTable) using JDBC

2014-12-21 Thread Cheng Lian
Evert - Thanks for the instructions, this is generally useful in other scenarios, but I think this isn’t what Shahab needs, because |saveAsTable| actually saves the contents of the SchemaRDD into Hive. Shahab - As Michael has answered in another thread, you may try |HiveThriftServer2.startWith

Re: spark-sql with join terribly slow.

2014-12-21 Thread Cheng Lian
Hari, Thanks for the details and sorry for the late reply. Currently Spark SQL doesn’t enable broadcast join optimization for left outer join, thus shuffles are required to perform this query. I made a quite artificial test to show the physical plan of your query: |== Physical Plan == HashOu

Re: spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-21 Thread Sean Owen
I'm only speculating, but I wonder if it was on purpose? would people ever build an app against the REPL? On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng wrote: > Everything else is there except spark-repl. Can someone check that out this > weekend? > > > > -- > View this message in context: > http:

Re: How to deploy my java code which invokes Spark in Tomcat?

2014-12-21 Thread Akhil Das
If you are getting classNotFound, then you should use --jars option (of spark-submit) to submit those jars. Thanks Best Regards On Sun, Dec 21, 2014 at 10:01 AM, Tao Lu wrote: > Hi, Guys, > > I have some code which runs will using Spark-Submit command. > > $SPARK_HOME/bin/spark-submit --class c