Hi,
Sorry to insist, anyone has any thoughts on this? Or at least someone can
point me to a documentation of DStream.compute() so I can understand when I
should return None for a batch?
Thanks
Juan
2015-09-14 20:51 GMT+02:00 Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:
> Hi,
>
It is exactly the issue here, isn't it?
We are using memory / N, where N should be the maximum number of active
tasks. In the current master, we use the number of cores to approximate the
number of tasks -- but it turned out to be a bad approximation in tests
because it is set to 32 to increase co
Oops... I meant to say "The page size calculation is NOT the issue here"
On 16 September 2015 at 06:46, Pete Robbins wrote:
> The page size calculation is the issue here as there is plenty of free
> memory, although there is maybe a fair bit of wasted space in some pages.
> It is that when we ha
The page size calculation is the issue here as there is plenty of free
memory, although there is maybe a fair bit of wasted space in some pages.
It is that when we have a lot of tasks each is only allowed to reach 1/n of
the available memory and several of the tasks bump in to that limit. With
task
On Tue, Sep 15, 2015 at 1:46 PM, Renyi Xiong wrote:
> Can anybody help understand why pyspark streaming uses py4j callback to
> execute python code while pyspark batch uses worker.py?
There are two kind of callback in pyspark streaming:
1) one operate on RDDs, it take an RDD and return an new RDD
Can anybody help understand why pyspark streaming uses py4j callback to
execute python code while pyspark batch uses worker.py?
regarding pyspark streaming, is py4j callback only used for
DStream, worker.py still used for RDD?
thanks,
Renyi.
I see. We're having problems with code like this (forgive my noob scala):
val df = Seq(("moose","ice"), (null,"fire")).toDF("animals", "elements")
df
.filter($"animals".rlike(".*"))
.filter(callUDF({(value: String) => value.length > 2}, BooleanType,
$"animals"))
.collect()
This code throws a
Hi Ravi
This does look like a bug.. I have created a JIRA to track it here:
https://issues.apache.org/jira/browse/SPARK-10623
Ram
On Tue, Sep 15, 2015 at 10:47 AM, Ram Sriharsha
wrote:
> Hi Ravi
>
> Can you share more details? What Spark version are you running?
>
> Ram
>
> On Tue, Sep 15, 20
Maybe we can change the heuristics in memory calculation to use
SparkContext.defaultParallelism if it is local mode.
On Tue, Sep 15, 2015 at 10:28 AM, Pete Robbins wrote:
> Yes and at least there is an override by setting spark.sql.test.master to
> local[8] , in fact local[16] worked on my 8 c
Hi Ravi
Can you share more details? What Spark version are you running?
Ram
On Tue, Sep 15, 2015 at 10:32 AM, Ravi Ravi
wrote:
> Turning on predicate pushdown for ORC datasources results in a
> NoSuchElementException:
>
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
Turning on predicate pushdown for ORC datasources results in a
NoSuchElementException:
scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
df: org.apache.spark.sql.DataFrame = [name: string]
scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "*true*")
scala> df.explai
Yes and at least there is an override by setting spark.sql.test.master to
local[8] , in fact local[16] worked on my 8 core box.
I'm happy to use this as a workaround but the 32 hard-coded will fail
running build/tests on a clean checkout if you only have 8 cores.
On 15 September 2015 at 17:40, M
That test explicitly sets the number of executor cores to 32.
object TestHive
extends TestHiveContext(
new SparkContext(
System.getProperty("spark.sql.test.master", "local[32]"),
On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin wrote:
> Yea I think this is where the heuristics is faili
Hi Yin, posted here because I think it's a bug.
So, it will return null and I can get a nullpointerexception, as I was
getting. Is this really the expected behavior? Never seen something
returning null in other Scala tools that I used.
Regards,
2015-09-14 18:54 GMT-03:00 Yin Huai :
> btw, move
This is the culprit:
https://issues.apache.org/jira/browse/SPARK-8406
"2. Make `TestHive` use a local mode `SparkContext` with 32 threads to
increase parallelism
The major reason for this is that, the original parallelism of 2 is too
low to reproduce
the data loss issue. Also, higher concu
Ok so it looks like the max number of active tasks reaches 30. I'm not
setting anything as it is a clean environment with clean spark code
checkout. I'll dig further to see why so many tasks are active.
Cheers,
On 15 September 2015 at 07:22, Reynold Xin wrote:
> Yea I think this is where the he
16 matches
Mail list logo