Hello, I don't understand my error message.

Basically, all I am doing is :
- dfAgg = df.groupBy("S_ID")
- dfRes = df.join(dfAgg, Seq("S_ID"), "left_outer")

However I get this AnalysisException: "
Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved
attribute(s) S_ID#1903L missing from
Dummy_ID#740,sex#37L,PERSONAL_STATUS#726L,W_DEP_CODE#736,W_SIZE#739L,
POSTAL_CODE#735,COUNTRY_CODE#730,
ID#724L,Dummy_ID_1#741,DEP_CODE#729,HOUSEHOLD_TYPE#733L,
HOUSEHOLD_SIZE#734L,AGE#727L,W_ID#738L,H_ID#732L,AGE_TYPE#728,
S_ID#57L,NATIONALITY#731
in operator !Project [ID#724L, sex#37L, PERSON\
 AL_STATUS#726L, AGE#727L, AGE_TYPE#728, DEP_CODE#729, COUNTRY_CODE#730,
NATIONALITY#731 AS Nationality#77, H_ID#732L, HOUSEHOLD_TYPE#733L,
HOUSEHOLD_SIZE#734L, POSTAL_CODE#735, W_DEP_CODE#736, S_ID#1903L,
W_ID#738L, W_SIZE#739L, Dummy_ID#740, Dummy_ID_1#741];; "

What I don't understand is it says S_ID#1903L is missing
but everything seems fine on the Logical Plan.
+- Join LeftOuter, (S_ID#57L = S_ID#1903L)

   :- Project [W_ID#14L, H_ID#8L, ID#0L, sex#37L, category#97L, AGE#3L,
AGE_TYPE#4, DEP_CODE#5, COUNTRY_CODE#6, Nationality#77, HOUSEHOLD_TYPE#9L,
familySize#117L, POSTAL_CODE#11, W_DEP_CODE#12, S_ID#57\
 L, workplaceSize#137L, Dummy_ID#16, Dummy_ID_1#17, Inc_period#157,
Time_inf#1064, Time_inc#200, Health#1014, Inf_period#1039,
infectedFamily#1355L, infectedWorker#1385L]

    +- Aggregate [S_ID#1903L], [S_ID#1903L, count(1) AS
infectedStreet#1415L]

Does someone have a clue about it?
Thanks,

Reply via email to