Hi all,
I was wondering with the approach of Spark 2.3 if there's any way us "regular"
users can help advance any of JIRAs that could have made it into Spark 2.3 but
are likely to miss now as the pull requests are awaiting detailed review.
For example:
https://issues.apache.org/jira/browse/SPA
This is more of a question for the spark user’s list, but if you look at
FoxyProxy and SSH tunnels it’ll get you going.
These instructions from AWS for accessing EMR are a good start
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html
http://docs.aws.amazon.com
ek for rc.
On Fri, Sep 16, 2016 at 11:16 AM, Ewan Leith
mailto:ewan.le...@realitymine.com>> wrote:
Hi all,
Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix
release, or does a jump to 2.1.0 with additional features seem more probable?
The issues for 2
Hi all,
Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix
release, or does a jump to 2.1.0 with additional features seem more probable?
The issues for 2.0.1 seem pretty much done here
https://issues.apache.org/jira/browse/SPARK/fixforversion/12336857/?selectedTab=com
I think this is more suited to the user mailing list than the dev one, but this
almost always means you need to repartition your data into smaller partitions
as one of the partitions is over 2GB.
When you create your dataset, put something like . repartition(1000) at the end
of the command crea
n that.
I will document this as a known issue in the release notes. We have other bugs
that we have fixed since RC5, and we can fix those together in 2.0.1.
On July 22, 2016 at 10:24:32 PM, Ewan Leith
(ewan.le...@realitymine.com<mailto:ewan.le...@realitymine.com>) wrote:
I think this new
I think this new issue in JIRA blocks the release unfortunately?
https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data frames
with more than 200 columns is wiping out the data
Otherwise there'll need to be 2.0.1 pretty much right after?
Thanks,
Ewan
On 23 Jul 2016 03:46, Xia
When you create a dataframe using the sqlContext.read.schema() API, if you pass
in a schema that's compatible with some of the records, but incompatible with
others, it seems you can't do a .select on the problematic columns, instead you
get an AnalysisException error.
I know loading the wrong
Hi Brandon,
It's relatively straightforward to try out different type options for this in
the spark-shell, try pasting the attached code into spark-shell before you make
a normal postgres JDBC connection.
You can then experiment with the mappings without recompiling Spark or anything
like th
llable = true)
|-- long: string (nullable = true)
|-- null: string (nullable = true)
|-- string: string (nullable = true)
Thanks,
Ewan
From: Yin Huai [mailto:yh...@databricks.com]
Sent: 01 October 2015 23:54
To: Ewan Leith
Cc: r...@databricks.com; dev@spark.apache.org
Subject: Re: Dataframe neste
Thanks Yin, I'll put together a JIRA and a PR tomorrow.
Ewan
-- Original message--
From: Yin Huai
Date: Mon, 5 Oct 2015 17:39
To: Ewan Leith;
Cc: dev@spark.apache.org;
Subject:Re: Dataframe nested schema inference from Json without type conflicts
Hello Ewan,
Adding a
it currently works,
does anyone think a pull request would plausibly get into the Spark main
codebase?
Thanks,
Ewan
From: Ewan Leith [mailto:ewan.le...@realitymine.com]
Sent: 02 October 2015 01:57
To: yh...@databricks.com
Cc: r...@databricks.com; dev@spark.apache.org
Subject: Re: Dataframe
Exactly, that's a much better way to put it.
Thanks,
Ewan
-- Original message--
From: Yin Huai
Date: Thu, 1 Oct 2015 23:54
To: Ewan Leith;
Cc: r...@databricks.com;dev@spark.apache.org;
Subject:Re: Dataframe nested schema inference from Json without type conflicts
Hi Ewan,
hat
we'll probably have to adopt if we can't come up with a way to keep the
inference working.
Thanks,
Ewan
-- Original message--
From: Reynold Xin
Date: Thu, 1 Oct 2015 22:12
To: Ewan Leith;
Cc: dev@spark.apache.org;
Subject:Re: Dataframe nested schema inference fr
Hi all,
We really like the ability to infer a schema from JSON contained in an RDD, but
when we're using Spark Streaming on small batches of data, we sometimes find
that Spark infers a more specific type than it should use, for example if the
json in that small batch only contains integer value
15 matches
Mail list logo