Hello,
I have two stage processing pipeline:
1. Spark streaming job receives data from kafka and saves it to
partitioned orc
2. There is spark etl job that runs ones per day that compact each
partition( i have two variables for partitioning
dt=20180529/location=mumbai ( merge small files to bigg
Thanks for your feedbacks!
Working with Yuming to reduce the risk of stability and quality. Will keep
you posted when the proposal is ready.
Cheers,
Xiao
Ryan Blue 于2019年1月16日周三 上午9:27写道:
> +1 for what Marcelo and Hyukjin said.
>
> In particular, I agree that we can't expect Hive to release a
+1 for what Marcelo and Hyukjin said.
In particular, I agree that we can't expect Hive to release a version that
is now more than 3 years old just to solve a problem for Spark. Maybe that
would have been a reasonable ask instead of publishing a fork years ago,
but I think this is now Spark's probl
I'm thinking of mechanisms like:
https://github.com/apache/spark/blob/c5daccb1dafca528ccb4be65d63c943bf9a7b0f2/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala#L99
On Wed, Jan 16, 2019 at 9:46 AM Ilya Matiach wrote:
>
> Hi Sean and Jatin,
> Could you point to some examples of load()
Hi Sean and Jatin,
Could you point to some examples of load() methods that use the spark version
vs the model version (or the columns available)?
I see only cases where we use the spark version (eg
https://github.com/apache/spark/blob/c04ad17ccf14a07ffdb2bf637124492a341075f2/mllib/src/main/scala/
I know some implementations of model save/load in MLlib use an
explicit version 1.0, 2.0, 3.0 mechanism. I've also seen that some
just decide based on the version of Spark that wrote the model.
Is one or the other preferred?
See https://github.com/apache/spark/pull/23549#discussion_r248318392
for
Hi, all
I took some time to check the recent Jenkins test failures in branch-2.3
(See https://github.com/apache/spark/pull/23507 for detailed).
I'm re-publishing a candidate now, so I think I'll start a first vote for
v2.3.3-rc1 in a few days
after the Jenkins tests checked.
Best,
Takeshi
On S