---+---+
>
> |hash(40514X)|hash(41751)|
>
> ++---+
>
> | -1898845883| 916273350|
>
> ++---+
>
>
>
>
>
> scala> spark.sql("select hash('14589'),hash('40004')").show()
>
> +-
Hello All,
I am calculating the hash value of few columns and determining whether its
an Insert/Delete/Update Record but found a scenario which is little weird
since some of the records returns same hash value though the key's are
totally different.
For the instance,
scala> spark.sql("select ha
Nirav,
withColumnRenamed() API might help but it does not different column and
renames all the occurrences of the given column. either use select() API
and rename as you want.
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Mon, Jul 2, 2018 at 5:52 PM, Nirav Patel wrote:
> Expr is `df1(a) ===
Do you see any changes or improvments in the *Core-API* in Spark 2.X when
compared with Spark 1.6.0. ?.
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Mon, Sep 25, 2017 at 1:32 PM, Gokula Krishnan D
wrote:
> Thanks for the reply. Forgot to mention that, our Batch ETL Jobs are in
the wall-clock time
3. Check the stages that got slower and compare the DAGs
4. Test with dynamic allocation disabled
On Fri, Sep 22, 2017 at 2:39 PM, Gokula Krishnan D
wrote:
> Hello All,
>
> Currently our Batch ETL Jobs are in Spark 1.6.0 and planning to upgrade
> into Spark 2.1
ry to compare the CPU time instead of the wall-clock time
>
> 3. Check the stages that got slower and compare the DAGs
>
> 4. Test with dynamic allocation disabled
>
> On Fri, Sep 22, 2017 at 2:39 PM, Gokula Krishnan D <mailto:email2...@gmail.com>> wrote:
> Hello A
Hello All,
Currently our Batch ETL Jobs are in Spark 1.6.0 and planning to upgrade
into Spark 2.1.0.
With minor code changes (like configuration and Spark Session.sc) able to
execute the existing JOB into Spark 2.1.0.
But noticed that JOB completion timings are much better in Spark 1.6.0 but
no
n/java/org/apache/hadoop/mapred/InputFormat.java#L85>,
> and the partition desired is at most a hint, so the final result could be a
> bit different.
>
> On 25 July 2017 at 19:54, Gokula Krishnan D wrote:
>
>> Excuse for the too many mails on this post.
>>
>> fo
Excuse for the too many mails on this post.
found a similar issue
https://stackoverflow.com/questions/24671755/how-to-partition-a-rdd
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Tue, Jul 25, 2017 at 8:21 AM, Gokula Krishnan D
wrote:
> In addition to that,
>
> tried to read
In addition to that,
tried to read the same file with 3000 partitions but it used 3070
partitions. And took more time than previous please refer the attachment.
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Tue, Jul 25, 2017 at 8:15 AM, Gokula Krishnan D
wrote:
> Hello All,
>
>
Hello All,
I have a HDFS file with approx. *1.5 Billion records* with 500 Part files
(258.2GB Size) and when I tried to execute the following I could see that
it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't it?
val inputFile =
val inputRdd = sc.textFile(inputFile)
inputRdd.
eduling.html#sch
>> eduling-across-applications
>> https://spark.apache.org/docs/latest/job-scheduling.html#sch
>> eduling-within-an-application
>>
>> On Thu, Jul 20, 2017 at 2:02 PM, Gokula Krishnan D
>> wrote:
>>
>>> Mark, Thanks for the respon
ior
> Task's resources -- the fair scheduler is not preemptive of running Tasks.
>
> On Thu, Jul 20, 2017 at 1:45 PM, Gokula Krishnan D
> wrote:
>
>> Hello All,
>>
>> We are having cluster with 50 Executors each with 4 Cores so can avail
>> max. 200 Executors
Hello All,
We are having cluster with 50 Executors each with 4 Cores so can avail max.
200 Executors.
I am submitting a Spark application(JOB A) with scheduler.mode as FAIR and
dynamicallocation=true and it got all the available executors.
In the meantime, submitting another Spark Application (J
Hello All,
our Spark Applications are designed to process the HDFS Files (Hive
External Tables).
Recently modified the Hive file size by setting the following parameters to
ensure that files are having with the average size of 512MB.
set hive.merge.mapfiles=true
set hive.merge.mapredfiles=true
se
Hello Shiv,
Unit Testing is really helping when you follow TDD approach. And it's a
safe way to code a program locally and also you can make use those test
cases during the build process by using any of the continuous integration
tools ( Bamboo, Jenkins). If so you can ensure that artifacts are be
Hello Sumit -
I could see that SparkConf() specification is not being mentioned in your
program. But rest looks good.
Output:
By the way, I have used the README.md template
https://gist.github.com/jxson/1784669
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Tue, Sep 20, 2016 at 2:15 AM, Cha
Hello All -
Hope, you are doing good.
I have a general question. I am working on Hadoop using Apache Spark.
At this moment, we are not using Cassandra but I would like to know what's
the scope of learning and using it in the Hadoop environment.
It would be great if you could provide the use cas
iter.toList.map(x => index + "," + x).iterator
> }
> x.mapPartitionsWithIndex(myfunc).collect()
> res10: Array[String] = Array(0,1, 0,2, 0,3, 1,4, 1,5, 1,6, 2,7, 2,8, 2,9)
>
> On Tue, Jan 12, 2016 at 2:06 PM, Gokula Krishnan D
> wrote:
>
>> Hello All -
Hello All -
I'm just trying to understand aggregate() and in the meantime got an
question.
*Is there any way to view the RDD databased on the partition ?.*
For the instance, the following RDD has 2 partitions
val multi2s = List(2,4,6,8,10,12,14,16,18,20)
val multi2s_RDD = sc.parallelize(multi2s
Hello Vadim -
Alternatively, you can achieve by using the *window functions* which is
available from 1.4.0
*code_value.txt (Input)*
=
1000,200,Descr-200,01
1000,200,Descr-200-new,02
1000,201,Descr-201,01
1000,202,Descr-202-new,03
1000,202,Descr-202,01
1000,202,Descr-202-old,02
Ted - Thanks for the updates. Then its the same case with sc.parallelize()
or sc.makeRDD() right.
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Tue, Dec 29, 2015 at 1:43 PM, Ted Yu wrote:
> From RDD.scala :
>
> def ++(other: RDD[T]): RDD[T] = withScope {
> this.union(other)
>
> They shou
You can try this .. But slightly modified the input structure since first
two columns were not in Json format.
[image: Inline image 1]
Thanks & Regards,
Gokula Krishnan* (Gokul)*
On Wed, Dec 23, 2015 at 9:46 AM, Eran Witkon wrote:
> Did you get a solution for this?
>
> On Tue, 22 Dec 2015 at
Hello All -
I tried to execute a Spark-Scala Program in order to create a table in HIVE
and faced couple of error so I just tried to execute the "show tables" and
"show databases"
And I have already created a database named "test_db".But I have
encountered the error "Database does not exist"
*N
2 AM, Gokula Krishnan D
wrote:
> Ok, then you can slightly change like
>
> [image: Inline image 1]
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
>
> On Wed, Dec 9, 2015 at 11:09 AM, Prashant Bhardwaj <
> prashant2006s...@gmail.com> wrote:
>
>&g
ords.
>
> On Wed, Dec 9, 2015 at 9:33 PM, Gokula Krishnan D
> wrote:
>
>> Hello Prashant -
>>
>> Can you please try like this :
>>
>> For the instance, input file name is "student_detail.txt" and
>>
>> ID,Name,Sex,Age
>>
Hello Prashant -
Can you please try like this :
For the instance, input file name is "student_detail.txt" and
ID,Name,Sex,Age
===
101,Alfred,Male,30
102,Benjamin,Male,31
103,Charlie,Female,30
104,Julie,Female,30
105,Maven,Male,30
106,Dexter,Male,30
107,Lundy,Male,32
108,Rita,Female,3
k. I think in scala their are transformation such as .toPairRDD().
>
> On Sat, Dec 5, 2015 at 12:01 AM, Gokula Krishnan D
> wrote:
>
>> Hello All -
>>
>> In spark-shell when we press tab after . ; we could see the
>> possible list of transformations and actions.
&g
Hello All -
In spark-shell when we press tab after . ; we could see the
possible list of transformations and actions.
But unable to see all the list. is there any other way to get the rest of
the list. I'm mainly looking for sortByKey()
val sales_RDD = sc.textFile("Data/Scala/phone_sales.txt")
v
29 matches
Mail list logo