We are hitting the same issue on Spark 1.6.1 with tungsten enabled, kryo
enabled & sort based shuffle.
Did you find a resolution?
On Sat, Apr 9, 2016 at 6:31 AM, Ted Yu wrote:
> Not much.
>
> So no chance of different snappy version ?
>
> On Fri, Apr 8, 2016 at 1:26 PM, Nicolas Tilmans
> wrote
Hi all,
Doing some simple column transformations (e.g. trimming strings) on a
DataFrame using UDFs. This DataFrame is in Avro format and being loaded off
HDFS. The job has about 16,000 parts/tasks.
About half way through the job, then fails with a message;
org.apache.spark.SparkException: Job ab
me a array double with 3 fields, the prediction,
the class A probability and the class B probability. How could I make those
like 3 columns from my expression? Clearly .withColumn only expects 1
column back.
On Tue, Sep 8, 2015 at 6:21 PM, Night Wolf wrote:
> Sorry for the spam - I had some
t 5:47 PM, Night Wolf wrote:
> So basically I need something like
>
> df.withColumn("score", new Column(new Expression {
> ...
>
> def eval(input: Row = null): EvaluatedType = myModel.score(input)
> ...
>
> }))
>
> But I can't do this, so how can I
e value or some struct...
On Tue, Sep 8, 2015 at 5:33 PM, Night Wolf wrote:
> Not sure how that would work. Really I want to tack on an extra column
> onto the DF with a UDF that can take a Row object.
>
> On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke wrote:
>
>> Can you use a m
rs are Comma-separated...
>
> Le lun. 7 sept. 2015 à 8:35, Night Wolf a écrit :
>
>> Is it possible to have a UDF which takes a variable number of arguments?
>>
>> e.g. df.select(myUdf($"*")) fails with
>>
>> org.apache.spark.sql.AnalysisException: u
Is it possible to have a UDF which takes a variable number of arguments?
e.g. df.select(myUdf($"*")) fails with
org.apache.spark.sql.AnalysisException: unresolved operator 'Project
[scalaUDF(*) AS scalaUDF(*)#26];
What I would like to do is pass in a generic data frame which can be then
passed t
Hey all,
I'm trying to do some stuff with a YAML file in the Spark driver using
SnakeYAML library in scala.
When I put the snakeyaml v1.14 jar on the SPARK_DIST_CLASSPATH and try to
de-serialize some objects from YAML into classes in my app JAR on the
driver (only the driver). I get the exception
Hi guys,
I'm trying to do a cross join (cartesian product) with 3 tables stored as
parquet. Each table has 1 column, a long key.
Table A has 60,000 keys with 1000 partitions
Table B has 1000 keys with 1 partition
Table C has 4 keys with 1 partition
The output should be 240million row combination
ive
> tasks or regular tasks (the first attempt of the task)? Is this error
> deterministic (can you reproduce every time you run this command)?
>
> Thanks,
>
> Yin
>
> On Mon, Jun 15, 2015 at 8:59 PM, Night Wolf
> wrote:
>
>> Looking at the logs of the execut
: Running task 11093.0 in stage 0.0
(TID 9552)
15/06/16 13:43:22 INFO executor.CoarseGrainedExecutorBackend: Got assigned
task 9553
15/06/16 13:43:22 INFO executor.Executor: Running task 10323.1 in stage 0.0
(TID 9553)
On Tue, Jun 16, 2015 at 1:47 PM, Night Wolf wrote:
> Hi guys,
>
> Using
Hi guys,
Using Spark 1.4, trying to save a dataframe as a table, a really simple
test, but I'm getting a bunch of NPEs;
The code Im running is very simple;
qc.read.parquet("/user/sparkuser/data/staged/item_sales_basket_id.parquet").write.format("parquet").saveAsTable("is_20150617_test2")
Logs
How far did you get?
On Tue, Jun 2, 2015 at 4:02 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> We use Scoobi + MR to perform joins and we particularly use blockJoin()
> API of scoobi
>
>
> /** Perform an equijoin with another distributed list where this list is
> considerably smaller
> * than the right (but too la
ain?
>
> spark.sql.hive.metastore.sharedPrefixes
> com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
>
> https://issues.apache.org/jira/browse/SPARK-7819 has more context about
> it.
>
> On Wed, Jun 3, 2015 at 9:38 PM, Nig
Hi all,
Trying out Spark 1.4 RC4 on MapR4/Hadoop 2.5.1 running in yarn-client mode with
Hive support.
*Build command;*
./make-distribution.sh --name mapr4.0.2_yarn_j6_2.10 --tgz -Pyarn -Pmapr4
-Phadoop-2.4 -Pmapr4 -Phive -Phadoop-provided
-Dhadoop.version=2.5.1-mapr-1501 -Dyarn.version=2.5.1-mapr
1.4; it also has been
> working fine for me.
>
> Are you sure you're using exactly the same Hadoop libraries (since you're
> building with -Phadoop-provided) and Hadoop configuration in both cases?
>
> On Tue, Jun 2, 2015 at 5:29 PM, Night Wolf wrote:
>
>> Hi a
tderr)
15/06/03 10:34:26 INFO impl.ContainerManagementProtocolProxy: Opening proxy
: qtausc-pphd0177.hadoop.local:40237
15/06/03 10:34:31 INFO impl.AMRMClientImpl: Received new token for :
qtausc-pphd0132.hadoop.local:44108
15/06/03 10:34:31 INFO yarn.YarnAllocator: Received 1 containers from YARN
Hi all,
Trying out Spark 1.4 on MapR Hadoop 2.5.1 running in yarn-client mode.
Seems the application master doesn't work anymore, I get a 500 connect
refused, even when I hit the IP/port of the spark UI directly. The logs
don't show much.
I build spark with Java 6, hive & scala 2.10 and 2.11. I'v
Hi all,
I have a job that, for every row, creates about 20 new objects (i.e. RDD of
100 rows in = RDD 2000 rows out). The reason for this is each row is tagged
with a list of the 'buckets' or 'windows' it belongs to.
The actual data is about 10 billion rows. Each executor has 60GB of memory.
Cur
Hi guys,
If I load a dataframe via a sql context that has a SORT BY in the query and
I want to repartition the data frame will it keep the sort order in each
partition?
I want to repartition because I'm going to run a Map that generates lots of
data internally so to avoid Out Of Memory errors I n
I'm seeing a similar thing with a slightly different stack trace. Ideas?
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150)
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
org.apache.spark.util.collection.E
Seeing similar issues, did you find a solution? One would be to increase
the number of partitions if you're doing lots of object creation.
On Thu, Feb 12, 2015 at 7:26 PM, fightf...@163.com
wrote:
> Hi, patrick
>
> Really glad to get your reply.
> Yes, we are doing group by operations for our wo
What was the answer, was it only setting spark.sql.shuffle.partitions?
On Thu, Apr 30, 2015 at 12:14 PM, Ulanov, Alexander wrote:
> After day of debugging (actually, more), I can answer my question:
>
> The problem is that the default value 200 of
> “spark.sql.shuffle.partitions” is too small f
was experimenting with Row class in
> python and apparently partitionby automatically takes first column as key.
> However, I am not sure how you can access a part of an object without
> deserializing it (either explicitly or Spark doing it for you)
>
> On Wed, May 6, 2015 at 7:14 PM,
Hi,
If I have an RDD[MyClass] and I want to partition it by the hash code of
MyClass for performance reasons, is there any way to do this without
converting it into a PairRDD RDD[(K,V)] and calling partitionBy???
Mapping it to a tuple2 seems like a waste of space/computation.
It looks like the P
Thanks Andrew. What version of HS2 is the SparkSQL thrift server using?
What would be involved in updating? Is it a simple case of increasing the
deep version in one of the project POMs?
Cheers,
~N
On Sat, May 2, 2015 at 11:38 AM, Andrew Lee wrote:
> Hi N,
>
> See: https://issues.apache.org/jir
Hi guys,
Trying to use the SparkSQL Thriftserver with hive metastore. It seems that
hive meta impersonation works fine (when running Hive tasks). However
spinning up SparkSQL thrift server, impersonation doesn't seem to work...
What settings do I need to enable impersonation?
I've copied the sa
luster, into a common
> location.
>
> On Thu, Apr 23, 2015 at 6:38 PM, Night Wolf
> wrote:
> > Hi guys,
> >
> > Having a problem build a DataFrame in Spark SQL from a JDBC data source
> when
> > running with --master yarn-client and adding the JDBC driver JAR
Hi guys,
Having a problem build a DataFrame in Spark SQL from a JDBC data source
when running with --master yarn-client and adding the JDBC driver JAR with
--jars. If I run with a local[*] master all works fine.
./bin/spark-shell --jars /tmp/libs/mysql-jdbc.jar --master yarn-client
sqlContext.lo
Hey,
Trying to build Spark 1.3 with Scala 2.11 supporting yarn & hive (with
thrift server).
Running;
*mvn -e -DskipTests -Pscala-2.11 -Dscala-2.11 -Pyarn -Pmapr4 -Phive
-Phive-thriftserver clean install*
The build fails with;
INFO] Compiling 9 Scala sources to
/var/lib/jenkins/workspace/cse-Ap
Was a solution ever found for this. Trying to run some test cases with sbt
test which use spark sql and in Spark 1.3.0 release with Scala 2.11.6 I get
this error. Setting fork := true in sbt seems to work but its a less than
idea work around.
On Tue, Mar 17, 2015 at 9:37 PM, Eric Charles wrote:
Tried with that. No luck. Same error on abt-interface jar. I can see maven
downloaded that jar into my .m2 cache
On Friday, March 6, 2015, 鹰 <980548...@qq.com> wrote:
> try it with mvn -DskipTests -Pscala-2.11 clean install package
Hey,
Trying to build latest spark 1.3 with Maven using
-DskipTests clean install package
But I'm getting errors with zinc, in the logs I see;
[INFO]
*--- scala-maven-plugin:3.2.0:compile (scala-compile-first) @
spark-network-common_2.11 --- *
...
[error] Required file not found: sbt-interface
Hey guys,
Trying to build Spark 1.3 for Scala 2.11.
I'm running with the folllowng Maven command;
-DskipTests -Dscala-2.11 clean install package
*Exception*:
[ERROR] Failed to execute goal on project spark-core_2.10: Could not
resolve dependencies for project
org.apache.spark:spark-core_2.10:
to Spark SQL and is used by default
>> when you run .cache on a SchemaRDD or CACHE TABLE.
>>
>> I'd also look at parquet which is more efficient and handles nested data
>> better.
>>
>> On Fri, Feb 13, 2015 at 7:36 AM, Night Wolf > > wrote:
>>
&g
Hi all,
I'd like to build/use column oriented RDDs in some of my Spark code. A
normal Spark RDD is stored as row oriented object if I understand
correctly.
I'd like to leverage some of the advantages of a columnar memory format.
Shark (used to) and SparkSQL uses a columnar storage format using pr
Did you find a work around for this?
Could it be class path ordering. I would expect the "file://..." protocol
to work when you have the MapR jars on the classpath..?
On Tue, Jan 20, 2015 at 4:36 AM, Ted Yu wrote:
> Your classpath has some MapR jar.
>
> Is that intentional ?
>
> Cheers
>
> On M
Hi,
I just built Spark 1.3 master using maven via make-distribution.sh;
./make-distribution.sh --name mapr3 --skip-java-test --tgz -Pmapr3 -Phive
-Phive-thriftserver -Phive-0.12.0
When trying to start the standalone spark master on a cluster I get the
following stack trace;
15/02/04 08:53:56 I
In Spark SQL we have Row objects which contain a list of fields that make
up a row. A Rowhas ordinal accessors such as .getInt(0) or getString(2).
Say ordinal 0 = ID and ordinal 1 = Name. It becomes hard to remember what
ordinal is what, making the code confusing.
Say for example I have the follo
Hi all,
I'd like to leverage some of the fast Spark collection implementations in
my own code.
Particularity for doing things like distinct counts in a mapPartitions
loop.
Are there any plans to make the org.apache.spark.util.collection
implementations public? Is there any other library out ther
test" as Intellij won't provide the "provided" scope
> libraries when running code in "main" source (but it will for sources under
> "test").
>
> With this config you can "sbt assembly" in order to get the fat jar
> without Spar
Hi,
I'm trying to load up an SBT project in IntelliJ 14 (windows) running 1.7
JDK, SBT 0.13.5 -I seem to be getting errors with the project.
The build.sbt file is super simple;
name := "scala-spark-test1"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "s
Hi,
Just to give some context. We are using Hive metastore with csv & Parquet
files as a part of our ETL pipeline. We query these with SparkSQL to do
some down stream work.
I'm curious whats the best way to go about testing Hive & SparkSQL? I'm
using 1.1.0
I see that the LocalHiveContext has bee
43 matches
Mail list logo