gt; On Tue, May 24, 2016 at 1:00 PM, Adrien Mogenet <
> adrien.moge...@contentsquare.com> wrote:
>
>> Hi,
>>
>> I'm wondering how Spark is setting the "index" of task?
>> I'm asking this question because we have a job that constantly fails a
try to go
further in our understanding on how does Spark behaves.
We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris
or-for-a-Set-in-Spark-tp26510p26514.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional
(CoarseGrainedSchedulerBackend.scala:283)
>> at
>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:180)
>> at
>> org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:439)
>> at
>> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1439)
>> at
>> org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1724)
>> at
>> org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
>> at org.apache.spark.SparkContext.stop(SparkContext.scala:1723)
>> at
>> org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:587)
>> at
>> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>> at scala.util.Try$.apply(Try.scala:161)
>> at
>> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>> at
>> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>> at
>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>> Caused by: akka.pattern.AskTimeoutException:
>> Recipient[Actor[akka://sparkDriver/user/CoarseGrainedScheduler#1432624242]]
>> had already been terminated.
>> at
>> akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
>> at
>> org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:307)
>> ... 23 more
>>
>>
>> --
>> Donald Drake
>> Drake Consulting
>> http://www.drakeconsulting.com/
>> https://twitter.com/dondrake <http://www.MailLaunder.com/>
>> 800-733-2143
>>
>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake <http://www.MailLaunder.com/>
> 800-733-2143
>
>
>
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris
wiki.
>
> On Wed, Dec 2, 2015 at 10:53 AM, Adrien Mogenet
> wrote:
> > Hi folks,
> >
> > You're probably busy, but any update on this? :)
> >
> >
> > On 16 November 2015 at 16:04, Adrien Mogenet
> > wrote:
> >>
> >> Name
Hi folks,
You're probably busy, but any update on this? :)
On 16 November 2015 at 16:04, Adrien Mogenet <
adrien.moge...@contentsquare.com> wrote:
> Name: Content Square
> URL: http://www.contentsquare.com
>
> Description:
> We use Spark to regularly read raw data,
Name: Content Square
URL: http://www.contentsquare.com
Description:
We use Spark to regularly read raw data, convert them into Parquet, and
process them to create advanced analytics dashboards: aggregation,
sampling, statistics computations, anomaly detection, machine learning.
--
*Adrien
eAsHadoopFile?
>
> Cheng
>
>
> On 9/8/15 2:34 PM, Adrien Mogenet wrote:
>
> Hi there,
>
> We've spent several hours to split our input data into several parquet
> files (or several folders, i.e.
> /datasink/output-parquets//foobar.parquet), based on a
> low
th Parquet files.
The only working solution so far is to persist the RDD and then loop over
it N times to write N files. That does not look acceptable...
Do you guys have any suggestion to do such an operation?
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquar
erage/maximum record size
>>4. cache configuration
>>5. shuffle configuration
>>6. serialization
>>7. etc?
>>
>> Any general best practices?
>>
>> Thanks!
>>
>> Romi K.
>>
>
>
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris
ash value and add it as a separate column,
>> but it doesn't sound right to me. Is there any other ways I can try ?
>>
>> Regards,
>> --
>> Kohki Nishio
>>
>
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris
is totally
non-deterministic and I can't reproduce this, probably due to the
asynchronous nature and my lack of understand on how/when stop() is
supposed to be called.
Any idea?
Best,
--
*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.moge...@contentsquare.com
(+33)6.59.16.64.22
http://w
12 matches
Mail list logo