Sorry, I had a typo I mean repartitionby("fieldofjoin)
El 2 may. 2017 9:44 p. m., "KhajaAsmath Mohammed"
escribió:
Hi Angel,
I am trying using the below code but i dont see partition on the dataframe.
val iftaGPSLocation_df = sqlContext.sql(iftaGPSLocQry)
import sqlContext._
Hi Angel,
I am trying using the below code but i dont see partition on the dataframe.
val iftaGPSLocation_df = sqlContext.sql(iftaGPSLocQry)
import sqlContext._
import sqlContext.implicits._
datapoint_prq_df.join(geoCacheLoc_df)
Val tableA = DfA.partitionby("joinField").f
Have you tried to make partition by join's field and run it by segments,
filtering both tables at the same segments of data?
Example:
Val tableA = DfA.partitionby("joinField").filter("firstSegment")
Val tableB= DfB.partitionby("joinField").filter("firstSegment")
TableA.join(TableB)
El 2 may
Table 1 (192 GB) is partitioned by year and month ... 192 GB of data is for
one month i.e. for April
Table 2: 92 GB not partitioned .
I have to perform join on these tables now.
On Tue, May 2, 2017 at 1:27 PM, Angel Francisco Orta <
angel.francisco.o...@gmail.com> wrote:
> Hello,
>
> Is the
Hello,
Is the tables partitioned?
If yes, what is the partition field?
Thanks
El 2 may. 2017 8:22 p. m., "KhajaAsmath Mohammed"
escribió:
Hi,
I am trying to join two big tables in spark and the job is running for
quite a long time without any results.
Table 1: 192GB
Table 2: 92 GB
Does any
My suspect is your input file partitions are small. Hence small number of
tasks are started. Can you provide some more details like how you load the
files and how the result size is around 500GBs ?
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedin.com/in/ri
Hi,
You can map your vertices rdd as follow
val pairVertices = verticesRDD.map(vertice => (vertice,null))
the above gives you a pairRDD. After join make sure that you remove
superfluous null value.
On Tue, Dec 23, 2014 at 10:36 AM, Deep Pradhan
wrote:
> Hi,
> I have two RDDs, vertices and edg