Hi All,
We have set up 2 node cluster (NODE-DSRV05 and NODE-DSRV02) each is
having 32gb RAM and 1 TB hard disk capacity and 8 cores of cpu. We have set
up hdfs which has 2 TB capacity and the block size is 256 mb When we try
to process 1 gb file on spark, we see the following exception
14/11/
It shows nullPointerException, your data could be corrupted? Try putting a
try catch inside the operation that you are doing, Are you running the
worker process on the master node also? If not, then only 1 node will be
doing the processing. If yes, then try setting the level of parallelism and
numb
Hi,
MapReduce has the feature of skipping bad records. Is there any equivalent
in Spark? Should I use filter API to do this?
Thanks,
Qiuzhuang
Hi Quizhuang - you have two options:
1) Within the map step define a validation function that will be executed
on every record.
2) Use the filter function to create a filtered dataset prior to
processing.
On 11/14/14, 10:28 AM, "Qiuzhuang Lian" wrote:
>Hi,
>
>MapReduce has the feature of skippi
I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is
the current stable Hadoop 2.x, would it make sense for us to update the
poms?
I don't think it's necessary. You're looking at the hadoop-2.4
profile, which works with anything >= 2.4. AFAIK there is no further
specialization needed beyond that. The profile sets hadoop.version to
2.4.0 by default, but this can be overridden.
On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet wrot
In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like
you've mentioned. What prompted me to write this email was that I did not
see any documentation that told me Hadoop 2.5.1 was officially supported by
Spark (i.e. community has been using it, any bugs are being fixed, etc...
You're the second person to request this today. Planning to include this in my
PR for Spark-4338.
-Sandy
> On Nov 14, 2014, at 8:48 AM, Corey Nolet wrote:
>
> In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like
> you've mentioned. What prompted me to write this email wa
Yeah I think someone even just suggested that today in a separate
thread? couldn't hurt to just add an example.
On Fri, Nov 14, 2014 at 4:48 PM, Corey Nolet wrote:
> In the past, I've built it by providing -Dhadoop.version=2.5.1 exactly like
> you've mentioned. What prompted me to write this emai
Hi all, since the vote ends on a Sunday, please let me know if you would
like to extend the deadline to allow more time for testing.
2014-11-13 12:10 GMT-08:00 Sean Owen :
> Ah right. This is because I'm running Java 8. This was fixed in
> SPARK-3329 (
> https://github.com/apache/spark/commit/2b7
+1
Tested on Mac OS X, and verified that sort-based shuffle bug is fixed.
Matei
> On Nov 14, 2014, at 10:45 AM, Andrew Or wrote:
>
> Hi all, since the vote ends on a Sunday, please let me know if you would
> like to extend the deadline to allow more time for testing.
>
> 2014-11-13 12:10 GMT-
A recent patch broke clean builds for me, I am trying to see how
widespread this issue is and whether we need to revert the patch.
The error I've seen is this when building the examples project:
spark-examples_2.10: Could not resolve dependencies for project
org.apache.spark:spark-examples_2.10:j
A work around for this fix is identified here:
http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html
However, if this affects more users I'd prefer to just fix it properly
in our build.
On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell wrote:
> A recent patch bro
Seems like a comment on that page mentions a fix, which would add yet another
profile though — specifically telling mvn that if it is an apple jdk, use the
classes.jar as the tools.jar as well, since Apple-packaged JDK 6 bundled them
together.
Link: http://permalink.gmane.org/gmane.comp.java
I think in this case we can probably just drop that dependency, so
there is a simpler fix. But mostly I'm curious whether anyone else has
observed this.
On Fri, Nov 14, 2014 at 12:24 PM, Hari Shreedharan
wrote:
> Seems like a comment on that page mentions a fix, which would add yet
> another prof
+0
I expect to start testing on Monday but won't have enough results to change
my vote from +0
until Monday night or Tuesday morning.
Thanks,
Zach
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-1-RC1-tp9311p9370.html
S
+1
Tested HiveThriftServer2 against Hive 0.12.0 on Mac OS X. Known issues
are fixed. Hive version inspection works as expected.
On 11/15/14 8:25 AM, Zach Fry wrote:
+0
I expect to start testing on Monday but won't have enough results to change
my vote from +0
until Monday night or Tuesday mo
17 matches
Mail list logo