from:"Yin Yang"

Re: Unit testing framework for Spark Jobs?

2016-03-02 Thread Yin Yang

Cycling prior bits: http://search-hadoop.com/m/q3RTto4sby1Cd2rt&subj=Re+Unit+test+with+sqlContext On Wed, Mar 2, 2016 at 9:54 AM, SRK wrote: > Hi, > > What is a good unit testing framework for Spark batch/streaming jobs? I > have > core spark, spark sql with dataframes and streaming api getting

Re: a basic question on first use of PySpark shell and example, which is failing

2016-02-29 Thread Yin Yang

RDDOperationScope is in spark-core_2.1x jar file. 7148 Mon Feb 29 09:21:32 PST 2016 org/apache/spark/rdd/RDDOperationScope.class Can you check whether the spark-core jar is in classpath ? FYI On Mon, Feb 29, 2016 at 1:40 PM, Taylor, Ronald C wrote: > Hi Jules, folks, > > > > I have tried “h

Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Yin Yang

The default value for spark.shuffle.reduceLocality.enabled is true. To reduce surprise to users of 1.5 and earlier releases, should the default value be set to false ? On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote: > Hi Koret, > Try spark.shuffle.reduceLocality.enabled=false > This is an un

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

Is there particular reason you cannot use temporary table ? Thanks On Sat, Feb 27, 2016 at 10:59 AM, Ashok Kumar wrote: > Thank you sir. > > Can one do this sorting without using temporary table if possible? > > Best > > > On Saturday, 27 February 2016, 18:50, Yin

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

scala> Seq((1, "b", "test"), (2, "a", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test order by b") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| ++

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

2016-02-27 Thread Yin Yang

Is this what you look for ? scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT struct(id, b, a) from test") df: org.apache.spark.sql.DataFrame = [struct(id, b, a): struct] scala> df.show ++ |struct(id, b, a)| +

Re: TaskCompletionListener and Exceptions

2016-02-26 Thread Yin Yang

Please see [SPARK-13465] Add a task failure listener to TaskContext On Sat, Dec 19, 2015 at 3:44 PM, Neelesh wrote: > Hi, > I'm trying to build automatic Kafka watermark handling in my stream apps > by overriding the KafkaRDDIterator, and adding a taskcompletionlistener and > updating watermar

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang

I tried the following: scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as maxRow FROM test) a") df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field] scala> d

Re: Spark 1.5 on Mesos

2016-02-26 Thread Yin Yang

Have you read this ? https://spark.apache.org/docs/latest/running-on-mesos.html On Fri, Feb 26, 2016 at 11:03 AM, Ashish Soni wrote: > Hi All , > > Is there any proper documentation as how to run spark on mesos , I am > trying from the last few days and not able to make it work. > > Please help

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang

Since collect is involved, the approach would be slower compared to the SQL Mich gave in his first email. On Fri, Feb 26, 2016 at 1:42 AM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > You need to collect the value. > > val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) > d.filt

Re: DirectFileOutputCommiter

2016-02-25 Thread Yin Yang

The header of DirectOutputCommitter.scala says Databricks. Did you get it from Databricks ? On Thu, Feb 25, 2016 at 3:01 PM, Teng Qiu wrote: > interesting in this topic as well, why the DirectFileOutputCommitter not > included? > > we added it in our fork, under > core/src/main/scala/org/apache

Re: Spark 1.6.0 running jobs in yarn shows negative no of tasks in executor

2016-02-25 Thread Yin Yang

Which release of hadoop are you using ? Can you share a bit about the logic of your job ? Pastebinning portion of relevant logs would give us more clue. Thanks On Thu, Feb 25, 2016 at 8:54 AM, unk1102 wrote: > Hi I have spark job which I run on yarn and sometimes it behaves in weird > manner

Re: Running executors missing in sparkUI

2016-02-25 Thread Yin Yang

Which Spark / hadoop release are you running ? Thanks On Thu, Feb 25, 2016 at 4:28 AM, Jan Štěrba wrote: > Hello, > > I have quite a weird behaviour that I can't quite wrap my head around. > I am running Spark on a Hadoop YARN cluster. I have Spark configured > in such a way that it utilizes al

Re: Error:java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2016-02-24 Thread Yin Yang

See slides starting with slide #25 of http://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications FYI On Wed, Feb 24, 2016 at 7:25 PM, xiazhuchang wrote: > When cache data to memory, the code DiskStore$getBytes will be called. If > there is a big data, the

Re: Filter on a column having multiple values

2016-02-24 Thread Yin Yang

However, when the number of choices gets big, the following notation becomes cumbersome. On Wed, Feb 24, 2016 at 3:41 PM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > You can use operators here. > > t.filter($"column1" === 1 || $"column1" === 2) > > > > > > On 24/02/

Re: Execution plan in spark

2016-02-24 Thread Yin Yang

Is the following what you were looking for ? sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tmp/partitioned' )""") table("partitionedParquet").explain(true) On Wed, Feb 24, 2016 at 1:16 AM, Ashok Kuma

Re: metrics not reported by spark-cassandra-connector

2016-02-23 Thread Yin Yang

Hi, Sa: Have you asked on spark-cassandra-connector mailing list ? Seems you would get better response there. Cheers

Re: Unit testing framework for Spark Jobs?

Re: a basic question on first use of PySpark shell and example, which is failing

Re: spark 1.6 new memory management - some issues with tasks not using all executors

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: Ordering two dimensional arrays of (String, Int) in the order of second element

Re: TaskCompletionListener and Exceptions

Re: Spark SQL support for sub-queries

Re: Spark 1.5 on Mesos

Re: Spark SQL support for sub-queries

Re: DirectFileOutputCommiter

Re: Spark 1.6.0 running jobs in yarn shows negative no of tasks in executor

Re: Running executors missing in sparkUI

Re: Error:java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE

Re: Filter on a column having multiple values

Re: Execution plan in spark

Re: metrics not reported by spark-cassandra-connector

17 matches

Site Navigation

Mail list logo

Footer information