Does it have anything to do with the fact that the mail address is displayed as user @spark.apache.org <http://spark.apache.org/>? There is a space before ‘@‘. This is as received in my mail client.
Sivakumaran > On 08-Aug-2016, at 7:42 PM, Chris Mattmann <mattm...@apache.org> wrote: > > Weird! > > > > > > On 8/8/16, 11:10 AM, "Sean Owen" <so...@cloudera.com> wrote: > >> I also don't know what's going on with the "This post has NOT been >> accepted by the mailing list yet" message, because actually the >> messages always do post. In fact this has been sent to the list 4 >> times: >> >> https://www.mail-archive.com/search?l=user%40spark.apache.org&q=dueckm&submit.x=0&submit.y=0 >> >> On Mon, Aug 8, 2016 at 3:03 PM, Chris Mattmann <mattm...@apache.org> wrote: >>> >>> >>> >>> >>> >>> On 8/8/16, 2:03 AM, "matthias.du...@fiduciagad.de" >>> <matthias.du...@fiduciagad.de> wrote: >>> >>>> Hello, >>>> >>>> I write to you because I am not really sure whether I did everything right >>>> when registering and subscribing to the spark user list. >>>> >>>> I posted the appended question to Spark User list after subscribing and >>>> receiving the "WELCOME to user@spark.apache.org" mail from >>>> "user-h...@spark.apache.org". >>>> But this post is still in state "This post has NOT been accepted by the >>>> mailing list yet.". >>>> >>>> Is this because I forgot something to do or did something wrong with my >>>> user account (dueckm)? Or is it because no member of the Spark User List >>>> reacted to that post yet? >>>> >>>> Thanks a lot for yout help. >>>> >>>> Matthias >>>> >>>> Fiducia & GAD IT AG | www.fiduciagad.de >>>> AG Frankfurt a. M. HRB 102381 | Sitz der Gesellschaft: Hahnstr. 48, 60528 >>>> Frankfurt a. M. | USt-IdNr. DE 143582320 >>>> Vorstand: Klaus-Peter Bruns (Vorsitzender), Claus-Dieter Toben (stv. >>>> Vorsitzender), >>>> >>>> Jens-Olaf Bartels, Martin Beyer, Jörg Dreinhöfer, Wolfgang Eckert, Carsten >>>> Pfläging, Jörg Staff >>>> Vorsitzender des Aufsichtsrats: Jürgen Brinkmann >>>> >>>> ----- Weitergeleitet von Matthias Dück/M/FAG/FIDUCIA/DE am 08.08.2016 >>>> 10:57 ----- >>>> >>>> Von: dueckm <matthias.du...@fiduciagad.de> >>>> An: user@spark.apache.org >>>> Datum: 04.08.2016 13:27 >>>> Betreff: Are join/groupBy operations with wide Java Beans using Dataset >>>> API much slower than using RDD API? >>>> >>>> ________________________________________ >>>> >>>> >>>> >>>> Hello, >>>> >>>> I built a prototype that uses join and groupBy operations via Spark RDD >>>> API. >>>> Recently I migrated it to the Dataset API. Now it runs much slower than >>>> with >>>> the original RDD implementation. >>>> Did I do something wrong here? Or is this a price I have to pay for the >>>> more >>>> convienient API? >>>> Is there a known solution to deal with this effect (eg configuration via >>>> "spark.sql.shuffle.partitions" - but now could I determine the correct >>>> value)? >>>> In my prototype I use Java Beans with a lot of attributes. Does this slow >>>> down Spark-operations with Datasets? >>>> >>>> Here I have an simple example, that shows the difference: >>>> JoinGroupByTest.zip >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/JoinGroupByTest.zip> >>>> - I build 2 RDDs and join and group them. Afterwards I count and display >>>> the >>>> joined RDDs. (Method de.testrddds.JoinGroupByTest.joinAndGroupViaRDD() ) >>>> - When I do the same actions with Datasets it takes approximately 40 times >>>> as long (Methodd e.testrddds.JoinGroupByTest.joinAndGroupViaDatasets()). >>>> >>>> Thank you very much for your help. >>>> Matthias >>>> >>>> PS1: excuse me for sending this post more than once, but I am new to this >>>> mailing list and probably did something wrong when registering/subscribing, >>>> so my previous postings have not been accepted ... >>>> >>>> PS2: See the appended screenshots taken from Spark UI (jobs 0/1 belong to >>>> RDD implementation, jobs 2/3 to Dataset): >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/jobs.png> >>>> >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_RDD_Details.png> >>>> >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27473/Job_Dataset_Details.png> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Are-join-groupBy-operations-with-wide-Java-Beans-using-Dataset-API-much-slower-than-using-RDD-API-tp27473.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >