= 8gb, executor-core=4
Memory:
8gb(0.4% per internal) - 4.8gb for actual computation and storage. lets
consider i have not done any persist in this case i could utilize 4.8gb per
executor.
IS IT POSSIBLE FOR ME TO USE 400MB file for BROADCAST JOIN?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெ
WARN TaskSetManager:66 - Lost task 0.0 in stage 8.0
(TID 14
, localhost, executor driver): java.lang.IllegalArgumentException: image ==
null
!
at javax.imageio.ImageTypeSpecifier.createFromRenderedImage(Unknown
Sour
ce)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Dear All,
i read about higher order function in databricks blog.
https://docs.databricks.com/spark/latest/spark-sql/higher-order-functions-lambda-functions.html
does higher order functionality available in our spark(open source)?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
getting reported correctly in EMR, and YARN itself
> has no control over it, so whatever you put in `spark.executor.cores` will
> be used,
> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>
> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman wrote:
>
emory.
>
> Use see 5 executor because 4 for the job and one for the application
> master.
>
> serr the used menory and the total memory.
>
> On Mon, Feb 26, 2018 at 12:20 PM, Selvam Raman wrote:
>
>> Hi,
>>
>> spark version - 2.0.0
>> spark distributi
: 2500.054
BogoMIPS: 5000.10
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7
On Mon, Feb 26, 2018 at 10:20 AM, Selvam Raman wrote:
>
e 20g+10%overhead ram(22gb),
10 core(number of threads), 1 Vcore(cpu).
please correct me if my understand is wrong.
how can i utilize number of vcore in EMR effectively. Will Vcore boost
performance?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
la:294)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:158)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
I’m also not sure how well
> spacy serializes, so to debug this I would start off by moving the nlp =
> inside of my function and see if it still fails.
>
> On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman wrote:
>
>> import spacy
>>
>> nlp = spacy.load(
of phrases.
def f(x) : print(x)
description =
xmlData.filter(col("dcterms:description").isNotNull()).select(col("dcterms:description").alias("desc"))
description.rdd.flatMap(lambda row: getPhrases(row.desc)).foreach(f)
when i am trying to access getphrases i am get
ot;/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/rdd.py",
line 906, in fold
vals = self.mapPartitions(func).collect()
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/rdd.py",
li
y/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
line 476, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
line 751, in save_tuple
save(element)
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py",
line 476, in save
f(self, obj) # Call unbound method with explicit self
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/cloudpickle.py",
line 368, in save_builtin_function
return self.save_function(obj)
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/cloudpickle.py",
line 247, in save_function
if islambda(obj) or obj.__code__.co_filename == '' or themodule
is None:
AttributeError: 'builtin_function_or_method' object has no attribute
'__code__'
please help me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ema)
>
> empInfoSchema.json
>
> val empInfoStrDF = Seq((emp_info)).toDF("emp_info_str")
> empInfoStrDF.printSchema
> empInfoStrDF.show(false)
>
> val empInfoDF = empInfoStrDF.select(from_json('emp_info_str,
> empInfoSchema).as("emp_info"))
> empInfoDF.printSchema
>
> empInfoDF.select(struct("*")).show(false)
>
> empInfoDF.select("emp_info.name", "emp_info.address",
> "emp_info.docs").show(false)
>
> empInfoDF.select(explode('emp_info.getItem("name"))).show
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
can i get those details.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
frequently i got yan OOM and disk
full issue.
Could you please share your thoughts?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
)
How can i achieve the same df while i am reading from source?
doc = spark.read.text("/Users/rs/Desktop/nohup.out")
how can i create array type with "sentences" column from
doc(dataframe)
The below one creates more than one column.
rdd.map(lambda rdd: rdd[0]).map(lambda row:row.split(" "))
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 15090"...
Killed
Node-45.dev contains 8.9GB free while it throws out of memory. Can anyone
please help me to understand the issue?
On Mon, Apr 24, 2017 at 11:22 AM, Selvam Raman wrote:
> Hi,
>
> I have 1 master
; --num-executors 4
--executor-cores 2 --executor-memory 20g Word2VecExample.py
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Versions/2.7/lib/python2.7/pickle.py",
line 681, in _batch_setitems
save(v)
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py",
line 317, in save
self.save_global(obj, rv)
File
"/Users/rs/Downloads/spark-2.0.1-bin-hadoop2.7/pytho
Test2 2 1
Test3 3 2
Current approach:
1) Delete row in table1 where table1.composite key = table2.composite key.
2) Union all table and table2 to get updated result.
is this right approach?. is there any other way to achieve it?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
In Scala,
val ds = sqlContext.read.text("/home/spark/1.6/lines").as[String]
what is the equivalent code in pyspark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
gt;> That's why your "source" should be defined as an Array[Struct] type
>> (which makes sense in this case, it has an undetermined length , so you
>> can explode it and get the description easily.
>>
>> Now you need write your own UDF, maybe can do what y
t;:{}
}
I am having bzip json files like above format.
some json row contains two objects within source(like F1 and F2), sometime
five(F1,F2,F3,F4,F5),etc. So the final schema will contains combination of
all objects for the source field.
Now, every row will contain n number of objects but only some
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
one column which is longblob, if i convert to
unbase64. I face this problem. i could able to write to parquet without
conversion.
So is there some limit for bytes per line?. Please give me your suggestion.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
Is there a way to read xls and xlsx files using spark?.
is there any hadoop inputformat available to read xls and xlsx files which
could be used in spark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
t 12:30 PM, Selvam Raman wrote:
> Hi,
>
> how can i take heap dump in EMR slave node to analyze.
>
> I have one master and two slave.
>
> if i enter jps command in Master, i could see sparksubmit with pid.
>
> But i could not see anything in slave node.
>
Hi,
how can i take heap dump in EMR slave node to analyze.
I have one master and two slave.
if i enter jps command in Master, i could see sparksubmit with pid.
But i could not see anything in slave node.
how can i take heap dump for spark job.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெ
nUID = 1L;
@Override
public void call(Iterator row) throws Exception
{
while(row.hasNext())
{
//Process data and insert into No-Sql DB
}
}
});
}
}
Now where can i apply rdd.checkpoint().
Thanks,
selvam
On Thu, Dec 15, 2016 at 10:44 PM, Selvam Raman wrote:
> I am using java
re checkpoints on that directory that I called checkpoint.
>
>
> Thank You,
>
> Irving Duran
>
> On Thu, Dec 15, 2016 at 10:33 AM, Selvam Raman wrote:
>
>> Hi,
>>
>> is there any provision in spark batch for checkpoint.
>>
>> I am having huge
there way for checkpoint
provision.
Checkpoint,what i am expecting is start from 71 partition to till end.
Please give me your suggestions.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
er of times (and wanting to spare the resources
> of our submitting machines) we have now switched to use yarn cluster mode
> by default. This seems to resolve the problem.
>
> Hope this helps,
>
> Daniel
>
> On 29 Nov 2016 11:20 p.m., "Selvam Raman" wrote:
>
>>
.
Spark version:2.0( AWS EMR).
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Filed Type in cassandra : List
I am trying to insert Collections.emptyList() from spark to cassandra
list field. In cassandra it stores as null object.
How can i avoid null values here.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ontains. b has just 85 rows and
> around 4964 bytes.
> Help is very much appreciated!!
>
> Thanks
> Swapnil
>
>
>
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
"file:///Users/rs/parti").rdd.partitions.length
res4: Int = 5
so how does parquet partitioning the data in spark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
rder by 1,2
spark.sql(query).show
When i checked wholstagecode, first it reads all data from the table. Why
it is reading all the data from table and doing sort merge join for 3 or 4
tables. Why it is not applying any filtering value.
Though i have given large memory for executor it is still throws the same
error. when spark sql do the joining how it is utilizing memory and cores.
Any guidelines would be greatly welcome.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
faced the problem earlier.
Thanks,
Selvam R
On Mon, Oct 24, 2016 at 10:23 AM, Selvam Raman wrote:
> Hi All,
>
> Please help me.
>
> I have 10 (tables data) parquet file in s3.
>
> I am reading and storing as Dataset then registered as temp table.
>
> One table d
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
I found it. We can use pivot which is similar to cross tab
In postgres.
Thank you.
On Oct 17, 2016 10:00 PM, "Selvam Raman" wrote:
> Hi,
>
> Please share me some idea if you work on this earlier.
> How can i develop postgres CROSSTAB function in spark.
>
> Po
Hi,
I am having 40+ structured data stored in s3 bucket as parquet file .
I am going to use 20 table in the use case.
There s a Main table which drive the whole flow. Main table contains 1k
record.
My use case is for every record in the main table process the rest of
table( join group by depend
('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');
SELECT *
FROM crosstab(
'select rowid, attribute, value
from ct
where attribute = ''att2'' or attribute = ''att3''
order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
row_name | category_1 | category_2 | category_3
--+++
test1| val2 | val3 |
test2| val6 | val7 |
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
What i am trying to achieve is
Trigger query to get number(i.e.,1,2,3,...n)
for every number i have to trigger another 3 queries.
Thanks,
selvam R
On Wed, Oct 12, 2016 at 4:10 PM, Selvam Raman wrote:
> Hi ,
>
> I am reading parquet file and creating temp table. when i am trying to
ang.Thread.run(Thread.java:745)
16/10/12 15:59:53 INFO SparkContext: Invoking stop() from shutdown hook
Please let me know if i am missing anything. Thank you for the help.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
I mentioned parquet as input format.
On Oct 10, 2016 11:06 PM, "ayan guha" wrote:
> It really depends on the input format used.
> On 11 Oct 2016 08:46, "Selvam Raman" wrote:
>
>> Hi,
>>
>> How spark reads data from s3 and runs parallel task
RDD , then we can look at the partitions.size or length to check
how many partition for a file. But how this will be accomplished in terms
of S3 bucket.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ported.
>
> There is an issue open[2]. I hope this is helpful.
>
> Thanks.
>
> [1] https://github.com/apache/spark/blob/27209252f09ff73c58e60c6df8aaba
> 73b308088c/sql/core/src/main/scala/org/apache/spark/sql/
> DataFrameReader.scala#L369
> [2] https://issues.apache.org/jira/brows
.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
Need your input to take decision.
We have an n-number of databases(ie oracle, MySQL,etc). I want to read a
data from the sources but how it is maintaining fault tolerance in source
side.
if source side system went down. how the spark system reads the data.
--
Selvam Raman
"ல
ues)),schema)
in schema fields I have mentioned timestamp as
*StructField*("shipped_datetime", *DateType*),
when I try to show the result, it throws java.util.Date can not convert to
java.sql.Date.
so how can I solve the issue.
First I have converted cassandrascanrdd to
--
Selvam
ABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all respons
ity for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
&g
its very urgent. please help me guys.
On Sun, Sep 4, 2016 at 8:05 PM, Selvam Raman wrote:
> Please help me to solve the issue.
>
> spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.10:1.3.0
> --conf spark.cassandra.connection.host=**
>
> val df
ndra.DefaultSource.createRelation(DefaultSource.scala:56)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)
a
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
please give me any suggestion in terms of dataframe.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
> will only skip writing tombstones.
>
> On Thu, Aug 25, 2016, 1:23 PM Selvam Raman wrote:
>
>> Hi ,
>>
>> Dataframe:
>> colA colB colC colD colE
>> 1 2 3 4 5
>> 1 2 3 null null
>> 1 null null null 5
>> null null 3 4 5
>>
>> I wa
)
Record 2:(1,2,3)
Record 3:(1,5)
Record 4:(3,4,5)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
n the following Window spec I want orderBy ("") to be displayed
>> in descending order please
>>
>> val W = Window.partitionBy("col1").orderBy("col2")
>>
>> If I Do
>>
>> val W = Window.partitionBy("col1").orderBy("col2".desc)
>>
>> It throws error
>>
>> console>:26: error: value desc is not a member of String
>>
>> How can I achieve that?
>>
>> Thanking you
>>
>
>
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
() *function but it gives only null values for the string same
for *to_date()* function.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
ibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction
qlContext.sql("select site,valudf(collect_set(requests)) as test
from sel group by site").first
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
selvam R
On Tue, Aug 9, 2016 at 4:19 PM, Selvam Raman wrote:
> Example:
>
> sel1 test
> sel1 test
> sel1 ok
> sel2 ok
> sel2 test
>
>
> expected result:
>
> sel1, [test,ok]
> sel2,[test,ok]
>
> How to achieve the above result using spark datafra
Example:
sel1 test
sel1 test
sel1 ok
sel2 ok
sel2 test
expected result:
sel1, [test,ok]
sel2,[test,ok]
How to achieve the above result using spark dataframe.
please suggest me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi Team,
how can i use spark as execution engine in sqoop2. i see the patch(S
QOOP-1532 <https://issues.apache.org/jira/browse/SQOOP-1532>) but it shows
in progess.
so can not we use sqoop on spark.
Please help me if you have an any idea.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Hi,
What is skew data.
I read that if the data was skewed while joining it would take long time to
finish the job.(99 percent finished in seconds where 1 percent of task
taking minutes to hour).
How to handle skewed data in spark.
Thanks,
Selvam R
+91-97877-87724
Hi ,
How to connect to sparkR (which is available in Linux env) using
Rstudio(Windows env).
Please help me.
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
XGBoost4J could integrate with spark from 1.6 version.
Currently I am using spark 1.5.2. Can I use XGBoost instead of XGBoost4j.
Will both provides same result.
Thanks,
Selvam R
+91-97877-87724
On Mar 15, 2016 9:23 PM, "Nan Zhu" wrote:
> Dear Spark Users and Developers,
>
> We (Distributed (De
ark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
... 3 m
Ql.scala:1217)
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
as well?
>
> I think this is related with this thread,
> http://apache-spark-user-list.1001560.n3.nabble.com/Error-while-running-example-scala-application-using-spark-submit-td10056.html
> .
>
>
> 2016-03-30 12:44 GMT+09:00 Selvam Raman :
>
>> Hi,
>>
>> i
]
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
72 matches
Mail list logo