4:35639/jars/SparkPOC.jarAdded By User
>>>
>>> On 4 July 2016 at 21:43, Mich Talebzadeh
>>> wrote:
>>>
>>>> well this will be apparent from the Environment tab of GUI. It will
>>>> show how the job is actually running.
>>>>
>>
est but could be a
>>>>> DecesionTree just for the sake of simplicity.
>>>>>
>>>>> But when I submit the spark application to the cluster via spark
>>>>> submit it is running out of memory. Even though the executors are
>>>>> "taken"/created in the cluster they are esentially doing nothing ( poor
>>>>> cpu, nor memory utilization) while the master seems to do all the work
>>>>> which finally results in OOM.
>>>>>
>>>>> My submission is following:
>>>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>>>> SparkPOC.jar 10 4.3
>>>>>
>>>>> I am submitting from the master node.
>>>>>
>>>>> By default it is running in client mode which the driver process is
>>>>> attached to spark-shell.
>>>>>
>>>>> Do I need to set up some settings to make MLlib algos parallelized and
>>>>> distributed as well or all is driven by parallel factor set on dataframe
>>>>> with input data?
>>>>>
>>>>> Essentially it seems that all work is just done on master and the rest
>>>>> is idle.
>>>>> Any hints what to check?
>>>>>
>>>>> Thx
>>>>> Jakub
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jakub Stransky
>>> cz.linkedin.com/in/jakubstransky
>>>
>>>
>>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
> --
Mathieu Longtin
1-514-803-8977
class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I am submitting from the master node.
>>
>> By default it is running in client mode which the driver process is
>> attached to spark-shell.
>>
>> Do I need to set up some settings
Try to figure out what the env vars and arguments of the worker JVM and
Python process are. Maybe you'll get a clue.
On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin
wrote:
> I started with a download of 1.6.0. These days, we use a self compiled
> 1.6.2.
>
> On Mon, Jul 4,
was built from source code or downloaded as a binary, though that should
> not technically change anything?
>
> On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longtin
> wrote:
>
>> 1.6.1.
>>
>> I have no idea. SPARK_WORKER_CORES should do the same.
>>
>> On
1.6.1.
I have no idea. SPARK_WORKER_CORES should do the same.
On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav wrote:
> Which version of Spark are you using? 1.6.1?
>
> Any ideas as to why it is not working in ours?
>
> On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin
> wrote:
ode (1 parent and 3 workers). Limiting it via spark-env.sh file by
> giving SPARK_WORKER_CORES=1 also didn't help.
>
> When you said it helped you and limited it to 2 processes in your cluster,
> how many cores did each machine have?
>
> On Mon, Jul 4, 2016 at 8:22 PM, Mathieu
ng "spark.executor.cores" to 1? And how can I
> specify "--cores=1" from the application?
>
> On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin
> wrote:
>
>> When running the executor, put --cores=1. We use this and I only see 2
>> pyspark process, one
ew this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limiting-Pyspark-daemons-tp27272.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Mathieu Longtin
1-514-803-8977
Same JVMs.
On Wed, Jun 29, 2016 at 8:48 AM Huang Meilong wrote:
> Hi,
>
> In spark, tasks from different applications run in different JVMs, then
> what about tasks from the same application?
>
--
Mathieu Longtin
1-514-803-8977
It turns out you can easily use a Python set, so I can send back a list of
failed files. Thanks.
On Wed, Jun 15, 2016 at 4:28 PM Ted Yu wrote:
> Have you looked at:
>
> https://spark.apache.org/docs/latest/programming-guide.html#accumulators
>
> On Wed, Jun 15, 2016 at 1:24 PM,
.
Is that possible?
--
Mathieu Longtin
1-514-803-8977
pache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin
> wrote:
> > In standalone mode, executor assume they have access to a shared file
> > system. The driver creates the directory and the executor write files,
ispatcher-2]
> remote.RemoteActorRefProvider$RemotingTerminator
> (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down.
>
>
> I have to do a ctrl-c to terminate the spark-submit process. This is
> really a weird problem and I have no idea how to fix this. Please let me
> know if there are any logs I should be looking at, or doing things
> differently here.
>
>
> --
Mathieu Longtin
1-514-803-8977
in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
>
>
> - To
> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
--
Mathieu Longtin
1-514-803-8977
w-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
> --
Mathieu Longtin
1-514-803-8977
am not convinced if it will use
> those nodes? Someone can possibly clarify this
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh
's an interface you can implement to try
> that if you really want to (ExternalClusterManager), but it's
> currently "private[spark]" and it probably wouldn't be a very simple
> task.
>
>
> On Thu, May 19, 2016 at 10:45 AM, Mathieu Longtin
> wrote:
> &
thieu?
>
> Cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://taleb
PCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 21:33, Mathieu Longtin wrote:
>
>> Driver memory is default. Executor memory depends on jo
zadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 1
d=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 20:37, Mathieu Longtin wrote:
>
>> No master and no nod
alone servers?
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
ay I could do this without the master and worker managers?
Thanks!
--
Mathieu Longtin
1-514-803-8977
upport
> golang with spark .
>
> -Thanks
> Sourav
>
--
Mathieu Longtin
1-514-803-8977
Abi wrote:
>
>
> On Tue, May 10, 2016 at 2:20 PM, Abi wrote:
>
>> Is there any example of this ? I want to see how you write the the
>> iterable example
>
>
> --
Mathieu Longtin
1-514-803-8977
read two text files in spark at the same time and
> associated them with the serial number. Is there a way of doing this in
> place given that we know the directory structure ? OR we should be
> transforming the data anyway to solve this ?
>
--
Mathieu Longtin
1-514-803-8977
.
>
>
> Thanks,
> Divya
>
>
>
>
>
> --
Mathieu Longtin
1-514-803-8977
:
>>>>
>>>>> I see there is a library spark-csv which can be used for removing
>>>>> header and processing of csv files. But it seems it works with sqlcontext
>>>>> only. Is there a way to remove header from csv files without sqlcontext ?
>>>>>
>>>>> Thanks
>>>>> Ashutosh
>>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>> M'BAREK Med Nihed,
>>> Fedora Ambassador, TUNISIA, Northern Africa
>>> http://www.nihed.com
>>>
>>> <http://tn.linkedin.com/in/nihed>
>>>
>>>
>>> --
Mathieu Longtin
1-514-803-8977
o predict a certain label. In a second RDD, you have
> historical data. So for each entry in the first RDD, you want to find
> similar entries in the second RDD and take, let's say, the average. Does
> that fit the Spark model? Is there any alternative?
>
> Thanks in advance
&g
30 matches
Mail list logo