Hi,
I am trying to write a String that is not an rdd to HDFS. This data is a
variable in Spark Scheduler code. None of the spark File operations are
working because my data is not rdd.
So, I tried using SparkContext.parallelize(data). But it throws error:
[error]
/home/karthik/spark-1.0.0/core/s
Created JIRA for this: https://issues.apache.org/jira/browse/SPARK-3915
On Sat, Oct 11, 2014 at 12:40 PM, Evan Samanas wrote:
> It's true that it is an implementation detail, but it's a very important one
> to document because it has the possibility of changing results depending on
> when I use t
Because of how closures work in Scala, there is no support for nested
map/rdd-based operations. Specifically, if you have
Context a {
Context b {
}
}
Operations within context b, when distributed across nodes, will no longer
have visibility of variables specific to context a because that
Very cool Denny, thanks for sharing this!
Matei
On Oct 11, 2014, at 9:46 AM, Denny Lee wrote:
> https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
>
> If you're wondering how to connect Tableau to SparkSQL - here are the steps
> to connect Tableau to SparkSQL.
>
>
>
> Enjoy!
>
It's true that it is an implementation detail, but it's a very important
one to document because it has the possibility of changing results
depending on when I use take or collect. The issue I was running in to was
when the executor had a different operating system than the driver, and I
was using
Hi spark !
I dont quite yet understand the semantics of RDDs in a streaming context
very well yet.
Are there any examples of how to implement CustomInputDStreams, with
corresponding Receivers in the docs ?
Ive hacked together a custom stream, which is being opened and is
consuming data internal
I tried even without the “T” and it still returns an empty result:
scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01
00:00:00';")
sRdd: org.apache.spark.sql.SchemaRDD =
SchemaRDD[35] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [a#0]
ExistingR
Hi,
My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1
subquery in my Spark SQL, below are my sample table structures and a SQL that
contains more than 1 subquery.
Question 1: How to load a HIVE table into Scala/Spark?
Question 2: How to implement a SQL_WITH_MORE_THAN_O
I found this on computer where I built Spark:
$ jar tvf
/homes/hortonzy/.m2/repository//org/spark-project/hive/hive-exec/0.13.1/hive-exec-0.13.1.jar
| grep ParquetHiveSerDe
2228 Mon Jun 02 12:50:16 UTC 2014
org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe$1.class
1442 Mon Jun 02 12:
-- Forwarded message --
From: Sadhan Sood
Date: Sat, Oct 11, 2014 at 10:26 AM
Subject: Re: how to find the sources for spark-project
To: Stephen Boesch
Thanks, I still didn't find it - is it under some particular branch ? More
specifically, I am looking to modify the file: Parqu
Thank you Sean. I'll try to do it externally as you suggested, however, can
you please give me some hints on how to do that? In fact, where can I find
the 1.2 implementation you just mentioned? Thanks!
On Wed, Oct 8, 2014 at 12:58 PM, Sean Owen wrote:
> Plain old SVMs don't produce an estimat
Yes of course. If your number is "123456", the this takes 4 bytes as
an int. But as a String in a 64-bit JVM you have an 8-byte reference,
4-byte object overhead, a char count of 4 bytes, and 6 2-byte chars.
Maybe more i'm not thinking of.
On Sat, Oct 11, 2014 at 6:29 AM, Liam Clarke-Hutchinson
w
Hmm, the details of the error didn't show in your mail...
On 10/10/14 12:25 AM, sadhan wrote:
We have a hive deployement on which we tried running spark-sql. When we try
to do describe for some of the tables, spark-sql fails with
this:
while it works for some of the other tables. Confused and
How was the table created? Would you mind to share related code? It
seems that the underlying type of the |customer_id| field is actually
long, but the schema says it’s integer, basically it’s a type mismatch
error.
The first query succeeds because |SchemaRDD.count()| is translated to
somethi
I suspect you do not actually need to change the number of partitions
dynamically.
Do you just have groupings of data to process? use an RDD of (K,V) pairs
and things like groupByKey. If really have only 1000 unique keys, yes, only
half of the 2000 workers would get data in a phase that groups by
15 matches
Mail list logo