date:20150211

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Silvio Fiorito

Hey Todd, I don’t have an app to test against the thrift server, are you able to define custom SQL without using Tableau’s schema query? I guess it’s not possible to just use SparkSQL temp tables, you may have to use permanent Hive tables that are actually in the metastore so Tableau can discov

RE: Easy way to "partition" an RDD into chunks like Guava's Iterables.partition

2015-02-11 Thread Yang, Yuhao

Check spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala It can be used through sliding(windowSize: Int) in spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala Yuhao From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: Thursday, February 12, 2015 7:0

feeding DataFrames into predictive algorithms

2015-02-11 Thread Sandy Ryza

Hey All, I've been playing around with the new DataFrame and ML pipelines APIs and am having trouble accomplishing what seems like should be a fairly basic task. I have a DataFrame where each column is a Double. I'd like to turn this into a DataFrame with a features column and a label column tha

Re: Strongly Typed SQL in Spark

2015-02-11 Thread jay vyas

Ah, nevermind, I just saw http://spark.apache.org/docs/1.2.0/sql-programming-guide.html (language integrated queries) which looks quite similar to what i was thinking about. I'll give that a whirl... On Wed, Feb 11, 2015 at 7:40 PM, jay vyas wrote: > Hi spark. is there anything in the works fo

Re: feeding DataFrames into predictive algorithms

2015-02-11 Thread Michael Armbrust

It sounds like you probably want to do a standard Spark map, that results in a tuple with the structure you are looking for. You can then just assign names to turn it back into a dataframe. Assuming the first column is your label and the rest are features you can do something like this: val df =

Re: feeding DataFrames into predictive algorithms

2015-02-11 Thread Patrick Wendell

I think there is a minor error here in that the first example needs a "tail" after the seq: df.map { row => (row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double])) }.toDataFrame("label", "features") On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust wrote: > It sounds like you probably w

Re: Spark SQL - Point lookup optimisation in SchemaRDD?

2015-02-11 Thread nitin

I was able to resolve this use case (Thanks Cheng Lian) where I wanted to launch executor on just the specific partition while also getting the batch pruning optimisations of Spark SQL by doing following :- val query = sql("SELECT * FROM cac hedTable WHERE key = 1") val plannedRDD = query.queryExe

Re: how to debug this kind of error, e.g. "lost executor"?

2015-02-11 Thread Praveen Garg

Try increasing the value of spark.yarn.executor.memoryOverhead. It’s default value is 384mb in spark 1.1. This error generally comes when your process usage exceed your max allocation. Use following property to increase memory overhead. From: Yifan LI mailto:iamyifa...@gmail.com>> Date: Friday,

Re: Datastore HDFS vs Cassandra

2015-02-11 Thread Mike Trienis

Thanks everyone for your responses. I'll definitely think carefully about the data models, querying patterns and fragmentation side-effects. Cheers, Mike. On Wed, Feb 11, 2015 at 1:14 AM, Franc Carter wrote: > > I forgot to mention that if you do decide to use Cassandra I'd highly > recommend j

how to avoid Spark and Hive log from Application log

2015-02-11 Thread sachin Singh

Hi, Please can somebody help ,how to avoid Spark and Hive log from Application log, I mean both spark and hive are using log4j property file , I have configured log4j.property file as per my application as under but its printing Spark and hive console logging also,please suggest its urgent for me,

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-11 Thread fightf...@163.com

Hi, Really have no adequate solution got for this issue. Expecting any available analytical rules or hints. Thanks, Sun. fightf...@163.com From: fightf...@163.com Date: 2015-02-09 11:56 To: user; dev Subject: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

PySpark 1.2 Hadoop version mismatch

2015-02-11 Thread Michael Nazario

Hi Spark users, I seem to be having this consistent error which I have been trying to reproduce and narrow down the problem. I've been running a PySpark application on Spark 1.2 reading avro files from Hadoop. I was consistently seeing the following error: py4j.protocol.Py4JJavaError: An error

RE: PySpark 1.2 Hadoop version mismatch

2015-02-11 Thread Michael Nazario

I also forgot some other information. I have made this error go away by making my pyspark application use spark-1.1.1-bin-cdh4 for the driver, but communicate with a spark 1.2 master and worker. It's not a good workaround, so I would like to have the driver also be spark 1.2 Michael ___

Unable to query hive tables from spark

2015-02-11 Thread kundan kumar

I want to create/access the hive tables from spark. I have placed the hive-site.xml inside the spark/conf directory. Even though it creates a local metastore in the directory where I run the spark shell and exists with an error. I am getting this error when I try to create a new hive table. Even

Extract hour from Timestamp in Spark SQL

2015-02-11 Thread Wush Wu

Dear all, I am new to Spark SQL and have no experience of Hive. I tried to use the built-in Hive Function to extract the hour from timestamp in spark sql, but got : "java.util.NoSuchElementException: key not found: hour" How should I extract the hour from timestamp? And I am very confusing abou

Re: Strongly Typed SQL in Spark

2015-02-11 Thread Felix C

As far as from my tests, language integrated query in spark isn't type safe, ie. query.where('cost == "foo") Would compile and return nothing. If you want type safety, perhaps you want to map the SchemaRDD to a RDD of Product (your type, not scala.Product) --- Original Message --- From: "jay

Spark SQL release

2015-02-11 Thread Agarwal, Shagun

Looks like latest SparkSQL(1.2.1) release is still alpha. Any idea about stable release? Thanks Shagun

obtain cluster assignment in K-means

2015-02-11 Thread Shi Yu

Hi there, I am new to spark. When training a model using K-means using the following code, how do I obtain the cluster assignment in the next step? val clusters = KMeans.train(parsedData, numClusters, numIterations) I searched around many examples but they mostly calculate the WSSSE. I am sti

Re: iteratively modifying an RDD

2015-02-11 Thread Rok Roskar

yes, sorry i wasn't clear -- I still have to trigger the calculation of the RDD at the end of each iteration. Otherwise all of the lookup tables are shipped to the cluster at the same time resulting in memory errors. Therefore this becomes several map jobs instead of one and each consecutive map

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Felix C

What kind of data do you have? Kafka is a popular source to use with spark streaming. But, spark streaming also support reading from a file. Its called basic source https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers --- Original Message --- From: "

Re: PySpark 1.2 Hadoop version mismatch

2015-02-11 Thread Akhil Das

Did you have a look at http://spark.apache.org/docs/1.2.0/building-spark.html I think you can simply download the source and build for your hadoop version as: mvn -Dhadoop.version=2.0.0-mr1-cdh4.7.0 -DskipTests clean package Thanks Best Regards On Thu, Feb 12, 2015 at 11:45 AM, Michael Nazario

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Su She

Hello Felix, I am already streaming in very simple data using Kafka (few messages / second, each record only has 3 columns...really simple, but looking to scale once I connect everything). I am processing it in Spark Streaming and am currently writing word counts to hdfs. So the part where I am co

Streaming scheduling delay

2015-02-11 Thread Tim Smith

On Spark 1.2 (have been seeing this behaviour since 1.0), I have a streaming app that consumes data from Kafka and writes it back to Kafka (different topic). My big problem has been Total Delay. While execution time is usually https://github.com/apache/spark/blob/master/core/src/main/scala/org/apac

Re: Can't access remote Hive table from spark

2015-02-11 Thread guxiaobo1982

Hi Zhan, Yes, I found there is a hdfs account, which is created by Ambari, but what's the password for this account, how can I login under this account? Can I just change the password for the hdfs account? Regards, -- Original -- From: "Zhan Zhang";; Send

Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov

Thank you! The Hive solution seemed more like a workaround. I was wondering if a native Spark Sql support for computing statistics for Parquet files would be available Dima Sent from my iPhone > On Feb 11, 2015, at 3:34 PM, Ted Yu wrote: > > See earlier thread: > http://search-hadoop.com/

Re: Streaming scheduling delay

2015-02-11 Thread Tim Smith

Just read the thread "Are these numbers abnormal for spark streaming?" and I think I am seeing similar results - that is - increasing the window seems to be the trick here. I will have to monitor for a few hours/days before I can conclude (there are so many knobs/dials). On Wed, Feb 11, 2015 at

Re: SparkSQL + Tableau Connector

RE: Easy way to "partition" an RDD into chunks like Guava's Iterables.partition

feeding DataFrames into predictive algorithms

Re: Strongly Typed SQL in Spark

Re: feeding DataFrames into predictive algorithms

Re: feeding DataFrames into predictive algorithms

Re: Spark SQL - Point lookup optimisation in SchemaRDD?

Re: how to debug this kind of error, e.g. "lost executor"?

Re: Datastore HDFS vs Cassandra

how to avoid Spark and Hive log from Application log

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

PySpark 1.2 Hadoop version mismatch

RE: PySpark 1.2 Hadoop version mismatch

Unable to query hive tables from spark

Extract hour from Timestamp in Spark SQL

Re: Strongly Typed SQL in Spark

Spark SQL release

obtain cluster assignment in K-means

Re: iteratively modifying an RDD

Re: Can spark job server be used to visualize streaming data?

Re: PySpark 1.2 Hadoop version mismatch

Re: Can spark job server be used to visualize streaming data?

Streaming scheduling delay

Re: Can't access remote Hive table from spark

Re: How to do broadcast join in SparkSQL

Re: Streaming scheduling delay

< 1 2

101 - 126 of 126 matches

Site Navigation

Mail list logo

Footer information