Hello,
I'm curious if there is an estimate when 3.5.2 for Spark Core will be released.
There are several bug and security vulnerability fixes in the dependencies we
are excited to receive!
If anyone has any insights, that would be greatly appreciated. Thanks!
- Paul
[cid:8a2e80d5
ce-engineering-track-over-code-brebner/>
- Paul Brebner and Roger Abelenda
Hi,
This is currently my column definition :
Employee ID NameClient Project Team01/01/2022 02/01/2022
03/01/2022 04/01/2022 05/01/2022
12345 Dummy x Dummy a abc team a OFF WO WH WH
WH
As you can see, the outer columns are just d
ct has no attribute 'read_excel'. Can
you advise?
JOHN PAUL JAYME
Data Engineer
[https://app.tdcx.com/email-signature/assets/img/tdcx-logo.png]
m. +639055716384 w. www.tdcx.com<http://www.tdcx.com/>
Winner of over 350 Industry Awards
[Linkedin]<https://www.linkedin.com/comp
Hi Rajat,
I have been facing similar problem recently and could solve it by moving the
UDF implementation into a dedicated class instead having it implemented in the
driver class/object.
Regards,
Paul.
On Tuesday 20 September 2022 10:11:31 (+02:00), rajat kumar wrote:
Hi Alton, it'
produce without reproducer and even couldn't reproduce even
> they spent their time. Memory leak issue is not really easy to reproduce,
> unless it leaks some objects without any conditions.
>
> - Jungtaek Lim (HeartSaVioR)
>
> On Sun, Oct 20, 2019 at 7:18 PM Paul Wais wrote
over time. Those were
very different jobs, but perhaps this issue is bespoke to local mode?
Emphasis: I did try to del the pyspark objects and run python GC.
That didn't help at all.
pyspark 2.4.4 on java 1.8 on ubuntu bionic (tensorflow docker image)
12-core i7
nels%3Acomment-tabpanel#comment-16878896
Cheers,
-Paul
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
I will be streaming data and am trying to understand how to get rid of old
data from a stream so it does not become to large. I will stream in one
large table of buying data and join that to another table of different
data. I need the last 14 days from the second table. I will not need data
that is
I would like to see the full error. However, S3 can give misleading
messages if you don't have the correct permissions.
On Tue, Apr 24, 2018, 2:28 PM Marco Mistroni wrote:
> HI all
> i am using the following code for persisting data into S3 (aws keys are
> already stored in the environment vari
(please notice this question was previously posted to
https://stackoverflow.com/questions/49943655/spark-schedules-single-task-although-rdd-has-48-partitions)
We are running Spark 2.3 / Python 3.5.2. For a job we run following code
(please notice that the input txt files are just a simplified exa
, I know ADL is not heavily used at this time so I wonder if
anyone is seeing this with S3 as well? Maybe not since S3 permissions are
always reported as world-readable (I think) which causes
checkAccessPermission()
to succeed.
Any thoughts or feedback appreciated.
--
Thanks,
Paul
currently streaming apps running on EMR.
Paul Corley | Principle Data Engineer
IgnitionOne | Marketing Technology. Simplified.
Office: 1545 Peachtree St NE | Suite 500 | Atlanta, GA | 30309
Direct: 702.336.0094
Email: paul.cor...@ignitionone.com<mailto:paul.cor...@ignitionone.com>
You say you did the maven package but did you do a maven install and define
your local maven repo in SBT?
-Paul
Sent from my iPhone
> On Oct 11, 2017, at 5:48 PM, Stephen Boesch wrote:
>
> When attempting to run any example program w/ Intellij I am running into
> guava versi
You would set the Kafka topic as your data source and you would write a custom
output to Cassandra everything would be or could be contained within your
stream
-Paul
Sent from my iPhone
> On Sep 8, 2017, at 2:52 PM, kant kodali wrote:
>
> How can I use one SparkSession to tal
ntually throws a java OOM error.
Additionally each cycle through this step takes successively longer.
Hopefully someone can lend some insight as to what is actually taking place in
this step and how to alleviate it
Thanks,
Paul Corley | Principle Data Engineer
as to be split
up, right?
We ended up using a single machine with a single thread to do the
splitting. I just want to make sure I am not missing something obvious.
Thanks!
--
Paul Henry Tremblay
Attunix
the number of
partitions, but get the same error each time.
In contrast, if I run a simple:
rdd = sc.textFile("s3://paulhtremblay/noaa_tmp/")
rdd.coutn()
The job finishes in 15 minutes, even with just 3 nodes.
Thanks
--
Paul Henry Tremblay
Robert Half Technology
ira/browse/SPARK-13330
>
>
>
> Holden Karau 于2017年4月5日周三 上午12:03写道:
>
>> Which version of Spark is this (or is it a dev build)? We've recently
>> made some improvements with PYTHONHASHSEED propagation.
>>
>> On Tue, Apr 4, 2017 at 7:49 AM Eike von Seg
So that means I have to pass that bash variable to the EMR clusters when I
spin them up, not afterwards. I'll give that a go.
Thanks!
Henry
On Tue, Apr 4, 2017 at 7:49 AM, Eike von Seggern
wrote:
> 2017-04-01 21:54 GMT+02:00 Paul Tremblay :
>
>> When I try to to do a groupBy
View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Alternatives-for-dataframe-
> collectAsList-tp28547.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Paul Henry Tremblay
Robert Half Technology
com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Paul Henry Tremblay
Robert Half Technology
d run the history server like:
> ```
> cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
> sbin/start-history-server.sh
> ```
> and then open http://localhost:18080
>
>
>
>
> On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay
> wrote:
>
>> I am looking for tips on
I get the same error:
Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED
Anyone know how to fix this problem in python 3.4?
Thanks
Henry
--
Paul Henry Tremblay
Robert Half Technology
I get the same error:
Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED
Anyone know how to fix this problem in python 3.4?
Thanks
Henry
--
Paul Henry Tremblay
Robert Half Technology
evaluate such things as how many tasks were completed, how many executors
were used, etc. I currently save my logs to S3.
Thanks!
Henry
--
Paul Henry Tremblay
Robert Half Technology
work as well:
http://michaelryanbell.com/processing-whole-files-spark-s3.html
Jon
On Mon, Feb 6, 2017 at 6:38 PM, Paul Tremblay <mailto:paulhtremb...@gmail.com>> wrote:
I've actually been able to trace the problem to the files being
read in. If I change to a different d
chine
On Feb 4, 2017 16:25, "Paul Tremblay" <mailto:paulhtremb...@gmail.com>> wrote:
I am using pyspark 2.1 and am wondering how to convert a flat
file, with one record per row, into a columnar format.
Here is an example of the data:
u'WARC/1.0
I've actually been able to trace the problem to the files being read in.
If I change to a different directory, then I don't get the error. Is one
of the executors running out of memory?
On 02/06/2017 02:35 PM, Paul Tremblay wrote:
When I try to create an rdd using wholeTextFiles
When I try to create an rdd using wholeTextFiles, I get an
incomprehensible error. But when I use the same path with sc.textFile, I
get no error.
I am using pyspark with spark 2.1.
in_path =
's3://commoncrawl/crawl-data/CC-MAIN-2016-50/segments/1480698542939.6/warc/
rdd = sc.wholeTextFiles(
I am using pyspark 2.1 and am wondering how to convert a flat file, with
one record per row, into a columnar format.
Here is an example of the data:
u'WARC/1.0',
u'WARC-Type: warcinfo',
u'WARC-Date: 2016-12-08T13:00:23Z',
u'WARC-Record-ID: ',
u'Content-Length: 344',
u'Content-Type: applicati
Not sure what you mean by "a consistency layer on top." Any explanation would
be greatly appreciated!
Paul
_
Paul Tremblay
Analytics Specialist
THE BOSTON CONSULTING GROUP
Tel.
This seems to have done the trick, although I am not positive. If I have time,
I'll test spinning up a cluster with and without consistent view to pin point
the error.
_____
Paul Tremblay
Anal
.
_
Paul Tremblay
Analytics Specialist
THE BOSTON CONSULTING GROUP
Tel. + ▪ Mobile +
_
From: Eric Dain [mailto:ericdai...@gmail.com]
Sent: Wednesday, January 25, 2017 11:14 PM
To
I am using an EMR cluster, and the latest version offered is 2.02. The link
below indicates that that user had the same problem, which seems unresolved.
Thanks
Paul
_
Paul Tremblay
Analytics
tries
to write multiple times and causes the error. The suggestion is to turn off
speculation, but I believe speculation is turned off by default in pyspark.
Thanks!
Paul
_
Paul Tremblay
Analytic
considered a bug/enhancement?
Regards,
Paul
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
attributes is
significant.
Is there anyway to cause the Encoder().schema() method to return the array
of StructFields in the original definition order of the Bean.class?
Regards,
Paul
-
To unsubscribe e-mail: user-unsubscr
>>>> —
>>>> airis.DATA
>>>> Timothy Spann, Senior Solutions Architect
>>>> C: 609-250-5894
>>>> http://airisdata.com/
>>>> http://meetup.com/nj-datascience
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Donald Drake
>>> Drake Consulting
>>> http://www.drakeconsulting.com/
>>> https://twitter.com/dondrake <http://www.MailLaunder.com/>
>>> 800-733-2143
>>>
>>
>>
>
--
Paul Leclercq | Data engineer
paul.lecle...@tabmo.io | http://www.tabmo.fr/
Topic}/{partitionId}
{newOffset}
Source : https://metabroadcast.com/blog/resetting-kafka-offsets
2016-02-22 11:55 GMT+01:00 Paul Leclercq :
> Thanks for your quick answer.
>
> If I set "auto.offset.reset" to "smallest" as for KafkaParams like this
>
&
ffset.reset" through parameter
> "kafkaParams" which is provided in some other overloaded APIs of
> createStream.
>
> By default Kafka will pick data from latest offset unless you explicitly
> set it, this is the behavior Kafka, not Spark.
>
> Thanks
>
t.reset
> to "earliest" for the new consumer in 0.9 and "smallest" for the old
> consumer.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whydoesmyconsumernevergetanydata?
Thanks
--
Paul Leclercq
ER"
> -Dspark.deploy.zookeeper.url="ZOOKEEPER_IP:2181"
> -Dspark.deploy.zookeeper.dir="/spark"'
A good thing to check if everything went OK is the folder /spark on the
ZooKeeper server. I could not find it on my server.
Thanks for reading,
Paul
2016-01-19 22:12 GMT+01
or time 144894989 ms
> 2015-12-01 06:04:55,064 [JobGenerator] INFO (Logging.scala:59) - Added
> jobs for time 1448949895000 ms
> 2015-12-01 06:05:00,125 [JobGenerator] INFO (Logging.scala:59) - Added
> jobs for time 144894990 ms
>
>
> Thanks
> LCassa
>
--
Paul Leclercq | Data engineer
paul.lecle...@tabmo.io | http://www.tabmo.fr/
ill be unpredictable (some partition may use cache, some
> may not be able to use the cache).
>
> On Wed, Sep 16, 2015 at 1:06 PM, Paul Weiss
> wrote:
>
>> Hi,
>>
>> What is the behavior when calling rdd.unpersist() from a different thread
>> while another thre
has been called?
thanks,
-paul
Maybe you forgot Tod close a reader Ort writer object.
Am 29. Juli 2015 18:04:59 MESZ, schrieb saif.a.ell...@wellsfargo.com:
>Thank you both, I will take a look, but
>
>
>1. For high-shuffle tasks, is this right for the system to have
>the size and thresholds high? I hope there is no bad con
Hey,
I have quite a few jobs appearing in the web-ui with the description "run at
ThreadPoolExecutor.java:1142".
Are these generated by SparkSQL internally?
There are so many that they cause a RejectedExecutionException when the
thread-pool runs out of space for them.
RejectedExecutionExceptio
Sorry, that should be shortest path, and diameter of the graph.
I shouldn't write emails before I get my morning coffee...
> On 06 Jul 2015, at 09:09, Jan-Paul Bultmann wrote:
>
> I would guess the opposite is true for highly iterative benchmarks (common in
> graph processing
I would guess the opposite is true for highly iterative benchmarks (common in
graph processing and data-science).
Spark has a pretty large overhead per iteration, more optimisations and
planning only makes this worse.
Sure people implemented things like dijkstra's algorithm in spark
(a problem
tayed the same though.
But I didn’t run that many iterations due to the problem :).
> As a workaround, you can break the iterations into smaller ones and trigger
> them manually in sequence.
You mean` write` ing them to disk after each iteration?
Thanks :), Jan
> -Original Message
org.apache.spark.sql.DataFrame.persist(StorageLevel) DataFrame.scala:1320
^
|
Application logic.
|
Could someone confirm my suspicion?
And does somebody know why it’s called while caching, and why it walks the
entire tree including cached results?
Cheers, Jan-Paul
t.scala:53)
at
mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)
what i do wrong?
best regards,
paul
Hey,
Is there a way to do a distinct operation on each partition only?
My program generates quite a few duplicate tuples and it would be nice to
remove some of these as an optimisation
without having to reshuffle the data.
I’ve also noticed that plans generated with an unique transformation have
It’s probably not advisable to use 1 though since it will break when `df = df2`,
which can easily happen when you’ve written a function that does such a join
internally.
This could be solved by an identity like function that returns the dataframe
unchanged but with a different identity.
`.as` wo
Hey,
What is the recommended way to create literal columns in java?
Scala has the `lit` function from `org.apache.spark.sql.functions`.
Should it be called from java as well?
Cheers jan
-
To unsubscribe, e-mail: user-unsubscr...
So... one solution would be to use a non-Jurassic version of Jackson. 2.6
will drop before too long, and 3.0 is in longer-term planning. The 1.x
series is long deprecated.
If you're genuinely stuck with something ancient, then you need to include
the JAR that contains the class, and 1.9.13 does
of this ByteBuffer API is possible and leverage it.
Cheers,
-Paul
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
; On 1/11/15 9:51 PM, Paul Wais wrote:
>>
>>
>> Dear List,
>>
>> What are common approaches for addressing over a union of tables / RDDs?
>> E.g. suppose I have a collection of log files in HDFS, one log file per day,
>> and I want to compute the sum of some fi
To force one instance per executor, you could explicitly subclass
FlatMapFunction and have it lazy-create your parser in the subclass
constructor. You might also want to try RDD#mapPartitions() (instead of
RDD#flatMap() if you want one instance per partition. This approach worked
well for me when
interactive
querying).
* Related question: are there plans to use Parquet Index Pages to make
Spark SQL faster? E.g. log indices over date ranges would be relevant here.
All the best,
-Paul
I would suggest checking out disk IO on the nodes in your cluster and then
reading up on the limiting behaviors that accompany different kinds of EC2
storage. Depending on how things are configured for your nodes, you may
have a local storage configuration that provides "bursty" IOPS where you
get
t find out how to do it.
I use spark 1.2.0rc1 with hadoop 2.4 and Riak CS (instead of S3) if
that matters. The s3n:// protocol with same settings work.
Thanks.
--
Paul
-
To unsubscribe, e-mail: user-unsubscr...@spark.apac
Unfortunately, unless you impose restrictions on the XML file (e.g., where
namespaces are declared, whether entity replacement is used, etc.), you
really can't parse only a piece of it even if you have start/end elements
grouped together. If you want to deal effectively (and scalably) with
large X
More thoughts. I took a deeper look at BlockManager, RDD, and friends.
Suppose one wanted to get native code access to un-deserialized blocks.
This task looks very hard. An RDD behaves much like a Scala iterator of
deserialized values, and interop with BlockManager is all on deserialized
data.
h JNA.
Is there a way to expose raw, in-memory partition/block data to native code?
Has anybody else attacked this problem a different way?
All the best,
-Paul
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Native-C-C-code-integration-tp18347.html
Sen
2)
Thanks,
Kevin Paul
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
s also taking memory.
>
> On Oct 30, 2014 6:43 PM, "Paul Wais" >
wrote:
>>
>> Dear Spark List,
>>
>> I have a Spark app that runs native code inside map functions. I've
>> noticed that the native code sometimes sets errno to ENOMEM indicating
freeMemory()
shows gigabytes free and the native code needs only megabytes. Does
spark limit the /native/ heap size somehow? Am poking through the
executor code now but don't see anything obvious.
Best Regards,
-Paul Wais
-
To
Hi all, I tried to use the function SchemaRDD.where() but got some error:
val people = sqlCtx.sql("select * from people")
people.where('age === 10)
:27: error: value === is not a member of Symbol
where did I go wrong?
Thanks,
Kevin Paul
red to set the config using HiveContext's setConf function?
Regards,
Kelvin Paul
Thanks Michael, your patch works for me :)
Regards,
Kelvin Paul
On Fri, Oct 3, 2014 at 3:52 PM, Michael Armbrust
wrote:
> Are you running master? There was briefly a regression here that is
> hopefully fixed by spark#2635 <https://github.com/apache/spark/pull/2635>.
>
> On F
Looks like an OOM issue? Have you tried persisting your RDDs to allow
disk writes?
I've seen a lot of similar crashes in a Spark app that reads from HDFS
and does joins. I.e. I've seen "java.io.IOException: Filesystem
closed," "Executor lost," "FetchFailed," etc etc with
non-deterministic crashe
execute(NativeCommand.scala:38)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
Thanks,
Kelvin Paul
the spark-env.sh but it does not
seem to stop the dynamic port behavior. I have included the startup output
when running spark-shell from the edge server in a different dmz and then from
a node in the cluster. Any help greatly appreciated.
Paul Magid
Toyota Motor Sales IS Enterprise
Derp, one caveat to my "solution": I guess Spark doesn't use Kryo for
Function serde :(
On Fri, Sep 19, 2014 at 12:44 AM, Paul Wais wrote:
> Well it looks like this is indeed a protobuf issue. Poked a little more
> with Kryo. Since protobuf messages are serializable
Well it looks like this is indeed a protobuf issue. Poked a little more
with Kryo. Since protobuf messages are serializable, I tried just making
Kryo use the JavaSerializer for my messages. The resulting stack trace
made it look like protobuf GeneratedMessageLite is actually using the
classloade
es the problem):
https://github.com/apache/spark/blob/2f9b2bd7844ee8393dc9c319f4fefedf95f5e460/core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala#L74
If uber.jar is on the classpath, then the root classloader would have
the code, hence why --driver-class-path fixes the bug.
On Thu, Sep 18, 201
hmm would using kyro help me here?
On Thursday, September 18, 2014, Paul Wais wrote:
> Ah, can one NOT create an RDD of any arbitrary Serializable type? It
> looks like I might be getting bitten by the same
> "java.io.ObjectInputStream uses root class loader only"
d3259.html
* https://github.com/apache/spark/pull/181
*
http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E
* https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I
On Thu, Sep 18, 2014 at 4:51 PM,
ache.org/repos/asf/hadoop/common/branches/branch-2.3.0/hadoop-project/pom.xml
On Thu, Sep 18, 2014 at 1:06 AM, Paul Wais wrote:
> Dear List,
>
> I'm writing an application where I have RDDs of protobuf messages.
> When I run the app via bin/spar-submit with --master local
>
identical keys in the input
tuples.)
SPARK-2926 Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
The Exception is included below.
Paul Magid
Toyota Motor Sales IS Enterprise Architecture (EA)
Architect I R&D
Ph: 310-468-9091 (X69091)
PCN 1C2970, Mail Drop PN12
Excep
is there a document that lists current Spark SQL
limitations/issues?
Paul Magid
Toyota Motor Sales IS Enterprise Architecture (EA)
Architect I R&D
Ph: 310-468-9091 (X69091)
PCN 1C2970, Mail Drop PN12
Successful Re
ark://my.master:7077 )
? I've tried poking through the shell scripts and SparkSubmit.scala
and unfortunately I haven't been able to grok exactly what Spark is
doing with the remote/local JVMs.
Cheers,
-Paul
-
To
Thanks Tim, this is super helpful!
Question about jars and spark-submit: why do you provide
myawesomeapp.jar as the program jar but then include other jars via
the --jars argument? Have you tried building one uber jar with all
dependencies and just sending that to Spark as your app jar?
Also, h
mib13.cloudfront.net/spark-1.1.0-bin-hadoop2.3.tgz
pom.xml snippets: https://gist.github.com/ypwais/ff188611d4806aa05ed9
[1]
http://stackoverflow.com/questions/24747037/how-to-define-a-dependency-scope-in-maven-to-include-a-library-in-compile-run
Thanks everybody!!
-Paul
On Tue, Sep 16, 2014 at 3:
eSize=512m"
mvn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
and hadoop 2.3 / cdh5 from
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0.tar.gz
On Mon, Sep 15, 2014 at 6:49 PM, Christian Chua wrote:
> Hi Paul.
>
> I would recommend building you
distro of hadoop is used at Data Bricks? Are there distros of
Spark 1.1 and hadoop that should work together out-of-the-box?
(Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..)
Thanks for any help anybody can give me here!
-Paul
--
NewHadoopRDD.
I am sure there is some way to use it with convenience methods like
SparkContext.textFile, you could probably set the system property
"mapreduce.input.fileinputformat.split.maxsize".
Regards,
Paul Hamilton
From: Chen Song
Date: Friday, August 8, 2014 at 9:13 PM
In this case any file larger than 256,000,000 bytes is split. If you don't
explicitly set it the limit is infinite which leads to the behavior you are
seeing where it is 1 split per file.
Regards,
Paul Hamilton
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nab
ld and pass tests on Jenkins.
>>
>> You shouldn't expect new features to be added to stable code in
>> maintenance releases (e.g. 1.0.1).
>>
>> AFAIK, we're still on track with Spark 1.1.0 development, which means that
>> it should be released sometime in
gards,
-Paul Wais
We use Luigi for this purpose. (Our pipelines are typically on AWS (no
EMR) backed by S3 and using combinations of Python jobs, non-Spark
Java/Scala, and Spark. We run Spark jobs by connecting drivers/clients to
the master, and those are what is invoked from Luigi.)
—
p...@mult.ifario.us | Multi
Hi, Mans --
Both of those versions of Jackson are pretty ancient. Do you know which of
the Spark dependencies is pulling them in? It would be good for us (the
Jackson, Woodstox, etc., folks) to see if we can get people to upgrade to
more recent versions of Jackson.
-- Paul
—
p
Hi, Robert --
I wonder if this is an instance of SPARK-2075:
https://issues.apache.org/jira/browse/SPARK-2075
-- Paul
—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
On Wed, Jun 25, 2014 at 6:28 AM, Robert James
wrote:
> On 6/24/14, Robert James wrote:
> > My
jar not reporting the files. Also, the classes do
get correctly packaged into the uberjar:
unzip -l /target/[deleted]-driver.jar | grep 'rdd/RDD' | grep 'saveAs'
1519 06-08-14 12:05
org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
1560 06-08-14 12:05
org/ap
entirely different* artifacts
(spark-core-h1, spark-core-h2).
Logged as SPARK-2075 <https://issues.apache.org/jira/browse/SPARK-2075>.
Cheers.
-- Paul
—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
On Fri, Jun 6, 2014 at 2:45 AM, HenriV wrote:
> I
Hi, Adrian --
If my memory serves, you need 1.7.7 of the various slf4j modules to avoid
that issue.
Best.
-- Paul
—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
On Mon, May 12, 2014 at 7:51 AM, Adrian Mocanu wrote:
> Hey guys,
>
> I've asked before, in Spa
sted:
2014050917: 7
2014050918: 42
Persisted:
2014050917: 7
2014050918: 12
Any idea what could account for the differences? BTW I am using Spark
0.9.1.
Thanks,
Paul
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unexpected-results-when-caching-dat
Hi, Laurent --
That's the way we package our Spark jobs (i.e., with Maven). You'll need
something like this:
https://gist.github.com/prb/d776a47bd164f704eecb
That packages separate driver (which you can run with java -jar ...) and
worker JAR files.
Cheers.
-- Paul
—
p...@mult
1 - 100 of 123 matches
Mail list logo