Well this sounds a lot for “only” 17 billion. However you can limit the
resources of the job so no need that it takes all of them (might be a little
bit longer).
Alternatively did you try to use the hbase tables directly in Hive as external
tables and do a simple ctas? Works better if Hive is on
Jorn,
This is kind of one time load from Historical Data to Analytical Hive
engine. Hive version 1.2.1 and Spark version 2.0.1 with MapR distribution.
Writing every table to parquet and reading it could be very much time
consuming, currently entire job could take ~8 hours on 8 node of 100 Gig
ram
Hi,
Do you have a more detailed log/error message?
Also, can you please provide us details on the tables (no of rows, columns,
size etc).
Is this just a one time thing or something regular?
If it is a one time thing then I would tend more towards putting each table in
HDFS (parquet or ORC) and
Hello Spark Developers,
I have 3 tables that i am reading from HBase and wants to do join
transformation and save to Hive Parquet external table. Currently my join
is failing with container failed error.
1. Read table A from Hbase with ~17 billion records.
2. repartition on primary key of table A
I agree, except in this case we probably want some of the fixes that are
going into the maintenance release to be present in the new feature release
(like the CRAN issue).
On Thu, Nov 2, 2017 at 12:12 PM, Reynold Xin wrote:
> Why tie a maintenance release to a feature release? They are supposed
Why tie a maintenance release to a feature release? They are supposed to be
independent and we should be able to make a lot of maintenance releases as
needed.
On Thu, Nov 2, 2017 at 7:13 PM Sean Owen wrote:
> The feature freeze is "mid November" :
> http://spark.apache.org/versioning-policy.html
The feature freeze is "mid November" :
http://spark.apache.org/versioning-policy.html
Let's say... Nov 15? any body have a better date?
Although it'd be nice to get 2.2.1 out sooner than later in all events, and
kind of makes sense to get out first, they need not go in order. It just
might be dist
I’m fine with picking a feature freeze, although then we should branch
close to that point. Is there interest in still seeing 2.3 try and go out
around the nominal schedule?
Personally, from a release stand point, I’d rather see 2.2.1 go out first
so we don’t end up with 2.3 potentially going out
I think it will be great to set a feature freeze date for 2.3.0 first, as a
minor release. There are a few new stuff that would be good to have and then we
will likely need time to stabilize, before cutting RCs.
From: Holden Karau
Sent: Thursday, November 2, 201
Hi Dev
Spark build is failing in Jenkins
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83353/consoleFull
Python versions prior to 2.7 are not supported.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit
If it’s desired I’d be happy to start on 2.3 once 2.2.1 is finished.
On Thu, Nov 2, 2017 at 10:24 AM Felix Cheung
wrote:
> For the 2.2.1, we are still working through a few bugs. Hopefully it won't
> be long.
>
>
> --
> *From:* Kevin Grealish
> *Sent:* Thursday, Nove
For the 2.2.1, we are still working through a few bugs. Hopefully it won't be
long.
From: Kevin Grealish
Sent: Thursday, November 2, 2017 9:51:56 AM
To: Felix Cheung; Sean Owen; Holden Karau
Cc: dev@spark.apache.org
Subject: RE: Kicking off the process around Sp
Any update on expected 2.2.1 (or 2.3.0) release process?
From: Felix Cheung [mailto:felixcheun...@hotmail.com]
Sent: Thursday, October 26, 2017 10:04 AM
To: Sean Owen ; Holden Karau
Cc: dev@spark.apache.org
Subject: Re: Kicking off the process around Spark 2.2.1
Yes! I can take on RM for 2.2.1.
+0 simply because I don't feel I know enough to have an opinion. I have no
reason to doubt the change though, from a skim through the doc.
On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin wrote:
> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/jira
14 matches
Mail list logo