Hi bo
How do we start?
Is there a plan? Onboarding, Arch/design diagram, tasks lined up etc
Thanks
Sarath
Sent from my iPhone
> On Feb 23, 2022, at 10:27 AM, bo yang wrote:
>
>
> Hi Sarath, thanks for your interest and willing to contribute! The project
> supports lo
Hi bo
I am interested to contribute.
But I don’t have free access to any cloud provider. Not sure how I can get free
access. I know Google, aws, azure only provides temp free access, it may not be
sufficient.
Guidance is appreciated.
Sarath
Sent from my iPhone
> On Feb 23, 2022, at 2
I'm using Hadoop 1.0.4 and Spark 1.2.0.
I'm facing a strange issue. I have a requirement to read a small file from
HDFS and all it's content has to be read at one shot. So I'm using spark
context's wholeTextFiles API passing the HDFS URL for the file.
When I try this from a spark shell it's works
count and their types.
Any ideas how to tackle this?
Regards,
Sarath.
On Sat, Oct 31, 2015 at 4:37 PM, ayan guha wrote:
> Can this be a solution?
>
> 1. Write a function which will take a string and convert to md5 hash
> 2. From your base table, generate a string out of all columns yo
t;SJ").withColumn("LINK_ID",
linkIDUDF(src_join("S1.RECORD_ID"),src("S2.RECORD_ID")));*
Then in further lines I'm not able to refer to "s1" columns from "src_link"
like -
*var src_link_s1 = src_link.as
<http://src_link.as>("SL").select($"S1.RECORD_ID");*
Please guide me.
Regards,
Sarath.
erify your executor/driver actually started with this option to
> rule out a config problem.
>
> On Wed, Jul 29, 2015 at 10:45 AM, Sarath Chandra
> wrote:
> > Yes.
> >
> > As mentioned in my mail at the end, I tried with both 256 and 512
> opt
ingle node mesos
cluster on my laptop having 4 CPUs and 12GB RAM.
On Wed, Jul 29, 2015 at 2:49 PM, fightf...@163.com
wrote:
> Hi, Sarath
>
> Did you try to use and increase spark.excecutor.extraJaveOptions
> -XX:PermSize= -XX:MaxPermSize=
>
>
> ----
en I run the same from a spark shell it works fine.
As mentioned in some posts and blogs I tried using the option
spark.driver.extraJavaOptions to increase the size, tried with 256 and 512
but still no luck.
Please help me in resolving the space issue
Thanks & Regards,
Sarath.
I am trying to train a large dataset consisting of 8 million data points and
20 million features using SVMWithSGD. But it is failing after running for
some time. I tried increasing num-partitions, driver-memory,
executor-memory, driver-max-resultSize. Also I tried by reducing the size of
dataset f
Hi,
I'm trying to train an SVM on KDD2010 dataset (available from libsvm). But
I'm getting "java.lang.OutOfMemoryError: Java heap space" error. The dataset
is really sparse and have around 8 million data points and 20 million
features. I'm using a cluster of 8 nodes (each with 8 cores and 64G RAM)
h$mVc$sp(Range.scala:141)*
* at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)*
* at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)*
* at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)*
* at org.apache.spark.SparkContext.(SparkContext.scala:203)*
* at Test.main
ciliation.execution.utils.ExecutionUtils.(ExecutionUtils.java:130)
... 2 more
Regards,
Sarath.
;)*
* .setSparkHome("/usr/local/spark-1.0.1-bin-hadoop1")*
* .set("spark.executor.memory", "3g")*
* .set("spark.cores.max", "4")*
* .set("spark.task.cpus","4")*
* .set("spark.executor.uri",
&qu
ange in the behavior. Also in the spark job submission
program I'm calling SparkContext.stop at the end of execution. Some times
all jobs fail with status as "Exited".
Please let me know what is going wrong and how to overcome the issue?
~Sarath
illed". And I'm not
finding any exceptions being thrown in the logs.
What could be going wrong?
...
var newLines = lines.flatMap(line => process(line));
newLines.saveAsTextFile(hdfsPath);
...
def process(line: String): Array[String] = {
...
Array(str1, str2);
}
...
~Sarath.
ine));
newLines.saveAsTextFile(hdfsPath);
...
...
def myfunc(line: String):Array[String] = {
line.split(";");
}
Thanks,
~Sarath.
Thanks Sean.
Please find attached my code. Let me know your suggestions/ideas.
Regards,
*Sarath*
On Wed, Sep 10, 2014 at 8:05 PM, Sean Owen wrote:
> You mention that you are creating a UserGroupInformation inside your
> function, but something is still serializing it. You should sho
s inside map method, does
it create a new instance for every RDD it is processing?
Thanks & Regards,
*Sarath*
On Sat, Sep 6, 2014 at 4:32 PM, Sean Owen wrote:
> I disagree that the generally right change is to try to make the
> classes serializable. Usually, classes that are not seriali
written it's contents as
anonymous function inside map function. This time the execution succeeded.
I understood the explanation of Sean. But request for references to a more
detailed explanation and examples for writing efficient spark programs
avoiding such pitfalls.
~Sarath
On 06-Sep-2014 4:
Hi Akhil,
I've done this for the classes which are in my scope. But what to do with
classes that are out of my scope?
For example org.apache.hadoop.io.Text
Also I'm using several 3rd part libraries like "jeval".
~Sarath
On Fri, Sep 5, 2014 at 7:40 PM, Akhil Das
wrote:
&
.hadoop.io.Text.
How to overcome these exceptions?
~Sarath.
Added below 2 lines just before the sql query line -
*...*
*file1_schema.count;*
*file2_schema.count;*
*...*
and it started working. But I couldn't get the reason.
Can someone please explain me? What was happening earlier and what is
happening with addition of these 2 lines?
~Sarath
O
;m killing it by pressing Ctrl+C in the terminal.
But the same code runs perfectly when executed from spark shell.
~Sarath
On Thu, Jul 17, 2014 at 1:05 PM, Sonal Goyal wrote:
> Hi Sarath,
>
> Are you explicitly stopping the context?
>
> sc.stop()
>
>
>
>
> Best Re
Hi Michael, Soumya,
Can you please check and let me know what is the issue? what am I missing?
Let me know if you need any logs to analyze.
~Sarath
On Wed, Jul 16, 2014 at 8:24 PM, Sarath Chandra <
sarathchandra.jos...@algofusiontech.com> wrote:
> Hi Michael,
>
> Tried it.
ATH $CONFIG_OPTS test.Test4 spark://master:7077
"/usr/local/spark-1.0.1-bin-hadoop1"
hdfs://master:54310/user/hduser/file1.csv
hdfs://master:54310/user/hduser/file2.csv*
~Sarath
On Wed, Jul 16, 2014 at 8:14 PM, Michael Armbrust
wrote:
> What if you just run something like:
> *sc.te
2014 at 7:59 PM, Soumya Simanta
wrote:
>
>
> Can you try submitting a very simple job to the cluster.
>
> On Jul 16, 2014, at 10:25 AM, Sarath Chandra <
> sarathchandra.jos...@algofusiontech.com> wrote:
>
> Yes it is appearing on the Spark UI, and remains there wit
Yes it is appearing on the Spark UI, and remains there with state as
"RUNNING" till I press Ctrl+C in the terminal to kill the execution.
Barring the statements to create the spark context, if I copy paste the
lines of my code in spark shell, runs perfectly giving the desired output.
~
anything going
wrong, all are info messages.
What else do I need check?
~Sarath
On Wed, Jul 16, 2014 at 7:23 PM, Soumya Simanta
wrote:
> Check your executor logs for the output or if your data is not big collect
> it in the driver and print it.
>
>
>
> On Jul 16, 2014, at 9:21 AM
7;m forcibly
killing it.
But the same program is working well when executed from a spark shell.
What is going wrong? What am I missing?
~Sarath
30 matches
Mail list logo