Yes that is the option I took while implementing this under Spark 1.4. But
every time there is a major update in Spark, I needed to re-copy the needed
parts, which is very time consuming.
The reason is that InferSchema and JacksonParser uses many more Spark
internal methods, which makes this very
Hi all,
I am currently working on SPARK-15880[^1] and also have some interest
on SPARK-7244[^2] and SPARK-7257[^3]. In fact, SPARK-7244 and SPARK-7257
have some importance on graph analysis field.
Could you make them an exception? Since I am working on graph analysis, I
hope to take them.
If need
+1, we should just fix the error to explain why months aren't allowed and
suggest that you manually specify some number of days.
On Wed, Jan 18, 2017 at 9:52 AM, Maciej Szymkiewicz
wrote:
> Thanks for the response Burak,
>
> As any sane person I try to steer away from the objects which have both
Hm. Unless I am also totally missing or forgetting something, I think
you're right. The equivalent in PairRDDFunctions.scala operations on a
function from T to TraversableOnce[U] and a TraversableOnce is most like
java.util.Iterator.
You can work around it by wrapping it in a faked IteratorIterabl
In Spark 2 + Java + RDD api, the use of iterables was replaced with
iterators. I just encountered an inconsistency in `flatMapValues` that may
be a bug:
`flatMapValues` (https://github.com/apache/spark/blob/master/core/src/
main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677) takes
a `Flat
On Wed, Jan 18, 2017 at 1:29 AM, Jacek Laskowski wrote:
> I'm trying to get the gist of clientMode input parameter for
> RpcEnv.create [1]. It is disabled (i.e. false) by default.
"clientMode" means whether the RpcEnv only opens external connections
(client) or also accepts incoming connections.
On Wed, Jan 18, 2017 at 6:16 AM, Steve Loughran wrote:
> it's failing on the dependency check as the dependencies have changed.
> that's what it's meant to do. should I explicitly be changing the values so
> that the build doesn't notice the change?
Yes. There's no automated way to do that, inten
Thanks for the response Burak,
As any sane person I try to steer away from the objects which have both
calendar and unsafe in their fully qualified names but if there is no
bigger picture I missed here I would go with 1 as well. And of course
fix the error message. I understand this has been intro
That is internal, but the amount of code is not a lot. Can you just copy
the relevant classes over to your project?
On Wed, Jan 18, 2017 at 5:52 AM Brian Hong
wrote:
> I work for a mobile game company. I'm solving a simple question: "Can we
> efficiently/cheaply query for the log of a particular
Personally I'd love to see some kind of pluggability, configurability in the
JSON schema parsing, maybe as an option in the DataFrameReader. Perhaps you can
propose an API?
> On Jan 18, 2017, at 5:51 AM, Brian Hong wrote:
>
> I work for a mobile game company. I'm solving a simple question: "Ca
Based on what you've described, I think you should be able to use Spark's
parquet reader plus partition pruning in 2.1.
> On Jan 17, 2017, at 10:44 PM, Raju Bairishetti wrote:
>
> Thanks for the detailed explanation. Is it completely fixed in spark-2.1.0?
>
> We are giving very high memory t
Hi Maciej,
I believe it would be useful to either fix the documentation or fix the
implementation. I'll leave it to the community to comment on. The code
right now disallows intervals provided in months and years, because they
are not a "consistently" fixed amount of time. A month can be 28, 29, 3
Hello, fellow Apache enthusiast. Thanks for your participation, and
interest in, the projects of the Apache Software Foundation.
I wanted to remind you that the Call For Papers (CFP) for ApacheCon
North America, and Apache: Big Data North America, closes in less than a
month. If you've been puttin
Hi Marco,
What kind of scheduler are you using on your cluster? Yarn?
Also, are you running in client mode or cluster mode on the cluster?
Daniel
On Wed, Jan 18, 2017 at 3:22 PM, marco rocchi <
rocchi.1407...@studenti.uniroma1.it> wrote:
> I have a spark code that works well over a sample of d
I have a spark code that works well over a sample of data in local mode,
but when I pass the same code on a cluster with the entire dataset I
receive GC limited exceed error.
In that section is possible to submit the code and have some hints in order
to solve my problem?
Thanks a lot for the attent
On 18 Jan 2017, at 11:18, Sean Owen
mailto:so...@cloudera.com>> wrote:
It still doesn't pass tests -- I'd usually not look until that point.
it's failing on the dependency check as the dependencies have changed. that's
what it's meant to do. should I explicitly be changing the values so that t
I work for a mobile game company. I'm solving a simple question: "Can we
efficiently/cheaply query for the log of a particular user within given
date period?"
I've created a special JSON text-based file format that has these traits:
- Snappy compressed, saved in AWS S3
- Partitioned by date. ie.
It still doesn't pass tests -- I'd usually not look until that point.
On Wed, Jan 18, 2017 at 11:10 AM Steve Loughran
wrote:
> I've had a PR outstanding on spark/object store integration, works for
> both maven and sbt builds
>
> https://issues.apache.org/jira/browse/SPARK-7481
> https://github.
I've had a PR outstanding on spark/object store integration, works for both
maven and sbt builds
https://issues.apache.org/jira/browse/SPARK-7481
https://github.com/apache/spark/pull/12004
Can I get someone to review this as it appears to be being overlooked amongst
all the PRs
thanks
-steve
Hi,
I'm trying to get the gist of clientMode input parameter for
RpcEnv.create [1]. It is disabled (i.e. false) by default.
I've managed to find out that, in the "general" case, it's enabled for
executors and disabled for the driver.
(it's also used for Spark Standalone's master and workers but
Hi zhenhua,
Thanks for the idea.
Actually, I think we can completely avoid shuffling the data in a limit
operation, no matter LocalLimit or GlobalLimit.
wangzhenhua (G) wrote
> How about this:
> 1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new
> partitioner to unifor
How about this:
1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new
partitioner to uniformly dispatch the data
class LimitUniformPartitioner(partitions: Int) extends Partitioner {
def numPartitions: Int = partitions
var num = 0
def getPartition(key: Any): Int = {
Hi,
Can I ask for some clarifications regarding intended behavior of window
/ TimeWindow?
PySpark documentation states that "Windows in the order of months are
not supported". This is further confirmed by the checks in
TimeWindow.getIntervalInMicroseconds (https://git.io/vMP5l).
Surprisingly eno
On Wed, Jan 18, 2017 at 8:57 AM, Jacek Laskowski wrote:
> p.s. How to know when the deprecation was introduced? The last change
> is for executor blacklisting so git blame does not show what I want :(
> Any ideas?
Figured that out myself!
$ git log --topo-order --graph -u -L
641,641:core/src/ma
Hi,
Given [1]:
> DeprecatedConfig("spark.rpc", "2.0", "Not used any more.")
I believe the comment in [2]:
> A RpcEnv implementation must have a [[RpcEnvFactory]] implementation with an
> empty constructor so that it can be created via Reflection.
Correct? Deserves a pull request to get rid of
25 matches
Mail list logo