Cycling prior bits:
http://search-hadoop.com/m/q3RTto4sby1Cd2rt&subj=Re+Unit+test+with+sqlContext
On Wed, Mar 2, 2016 at 9:54 AM, SRK wrote:
> Hi,
>
> What is a good unit testing framework for Spark batch/streaming jobs? I
> have
> core spark, spark sql with dataframes and streaming api getting
RDDOperationScope is in spark-core_2.1x jar file.
7148 Mon Feb 29 09:21:32 PST 2016
org/apache/spark/rdd/RDDOperationScope.class
Can you check whether the spark-core jar is in classpath ?
FYI
On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C
wrote:
> Hi Jules, folks,
>
>
>
> I have tried “h
The default value for spark.shuffle.reduceLocality.enabled is true.
To reduce surprise to users of 1.5 and earlier releases, should the default
value be set to false ?
On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote:
> Hi Koret,
> Try spark.shuffle.reduceLocality.enabled=false
> This is an un
Is there particular reason you cannot use temporary table ?
Thanks
On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote:
> Thank you sir.
>
> Can one do this sorting without using temporary table if possible?
>
> Best
>
>
> On Saturday, 27 February 2016, 18:50, Yin
scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT struct(id, b, a) from test order by b")
df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct]
scala> df.show
++
|struct(id, b, a)|
++
Is this what you look for ?
scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT struct(id, b, a) from test")
df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct]
scala> df.show
++
|struct(id, b, a)|
+
Please see
[SPARK-13465] Add a task failure listener to TaskContext
On Sat, Dec 19, 2015 at 3:44 PM, Neelesh wrote:
> Hi,
> I'm trying to build automatic Kafka watermark handling in my stream apps
> by overriding the KafkaRDDIterator, and adding a taskcompletionlistener and
> updating watermar
I tried the following:
scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a",
"b").registerTempTable("test")
scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as
maxRow FROM test) a")
df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field]
scala> d
Have you read this ?
https://spark.apache.org/docs/latest/running-on-mesos.html
On Fri, Feb 26, 2016 at 11:03 AM, Ashish Soni wrote:
> Hi All ,
>
> Is there any proper documentation as how to run spark on mesos , I am
> trying from the last few days and not able to make it work.
>
> Please help
Since collect is involved, the approach would be slower compared to the SQL
Mich gave in his first email.
On Fri, Feb 26, 2016 at 1:42 AM, Michał Zieliński <
zielinski.mich...@gmail.com> wrote:
> You need to collect the value.
>
> val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0)
> d.filt
The header of DirectOutputCommitter.scala says Databricks.
Did you get it from Databricks ?
On Thu, Feb 25, 2016 at 3:01 PM, Teng Qiu wrote:
> interesting in this topic as well, why the DirectFileOutputCommitter not
> included?
>
> we added it in our fork, under
> core/src/main/scala/org/apache
Which release of hadoop are you using ?
Can you share a bit about the logic of your job ?
Pastebinning portion of relevant logs would give us more clue.
Thanks
On Thu, Feb 25, 2016 at 8:54 AM, unk1102 wrote:
> Hi I have spark job which I run on yarn and sometimes it behaves in weird
> manner
Which Spark / hadoop release are you running ?
Thanks
On Thu, Feb 25, 2016 at 4:28 AM, Jan Štěrba wrote:
> Hello,
>
> I have quite a weird behaviour that I can't quite wrap my head around.
> I am running Spark on a Hadoop YARN cluster. I have Spark configured
> in such a way that it utilizes al
See slides starting with slide #25 of
http://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications
FYI
On Wed, Feb 24, 2016 at 7:25 PM, xiazhuchang wrote:
> When cache data to memory, the code DiskStore$getBytes will be called. If
> there is a big data, the
However, when the number of choices gets big, the following notation
becomes cumbersome.
On Wed, Feb 24, 2016 at 3:41 PM, Mich Talebzadeh <
mich.talebza...@cloudtechnologypartners.co.uk> wrote:
> You can use operators here.
>
> t.filter($"column1" === 1 || $"column1" === 2)
>
>
>
>
>
> On 24/02/
Is the following what you were looking for ?
sqlContext.sql("""
CREATE TEMPORARY TABLE partitionedParquet
USING org.apache.spark.sql.parquet
OPTIONS (
path '/tmp/partitioned'
)""")
table("partitionedParquet").explain(true)
On Wed, Feb 24, 2016 at 1:16 AM, Ashok Kuma
Hi, Sa:
Have you asked on spark-cassandra-connector mailing list ?
Seems you would get better response there.
Cheers
17 matches
Mail list logo