Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
. 2015 à 16:48, a écrit : > Thanks a lot, why you said "the most recent version" ? > > - Mail original - > De: "Jörn Franke" > À: "nibiau" > Cc: banto...@gmail.com, user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 13:56:43 >

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
Thanks a lot, why you said "the most recent version" ? - Mail original - De: "Jörn Franke" À: "nibiau" Cc: banto...@gmail.com, user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 13:56:43 Objet: Re: RE : Re: HDFS small file generation problem Yes the m

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
gt; After a CONCATENATE I suppose the records are still updatable. >> >> Tks to confirm if it can be solution for my use case. Or any other idea.. >> >> Thanks a lot ! >> Nicolas >> >> >> ----- Mail original - >> De: "Jörn Franke" >

RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
firm if it can be solution for my use case. Or any other idea.. Thanks a lot ! Nicolas - Mail original - De: "Jörn Franke" À: nib...@free.fr, "Brett Antonides" Cc: user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation pro

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
re still updatable. > > Tks to confirm if it can be solution for my use case. Or any other idea.. > > Thanks a lot ! > Nicolas > > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, "Brett Antonides" > Cc: user@spark.apache.org >

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
olas > > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 11:17:51 > Objet: Re: HDFS small file generation problem > > > > You can update data

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
;Jörn Franke" À: nib...@free.fr, "Brett Antonides" Cc: user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation problem You can update data in hive if you use the orc format Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit :

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
Mail original - > De: nib...@free.fr > À: "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also update data instead of insert data ? > >

Re: HDFS small file generation problem

2015-10-03 Thread Jagat Singh
er solutions ? > > Nicolas > > - Mail original - > De: nib...@free.fr > À: "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also update data inst

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
7:22 Objet: Re: HDFS small file generation problem Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet: Re: HDFS small file generation problem I had a

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet: Re: HDFS small file generation problem I had a very similar problem and solved it with Hi

Re: HDFS small file generation problem

2015-10-02 Thread Brett Antonides
t; De: "Jörn Franke" > À: nib...@free.fr, "user" > Envoyé: Lundi 28 Septembre 2015 23:53:56 > Objet: Re: HDFS small file generation problem > > > > Use hadoop archive > > > > Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écr

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
-- De: "Jörn Franke" À: nib...@free.fr, "user" Envoyé: Lundi 28 Septembre 2015 23:53:56 Objet: Re: HDFS small file generation problem Use hadoop archive Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit : Hello, I'm still investigating my small fil

Re: HDFS small file generation problem

2015-09-28 Thread Jörn Franke
Use hadoop archive Le dim. 27 sept. 2015 à 15:36, a écrit : > Hello, > I'm still investigating my small file generation problem generated by my > Spark Streaming jobs. > Indeed, my Spark Streaming jobs are receiving a lot of small events (avg > 10kb), and I have to store them inside HDFS in ord

Re: HDFS small file generation problem

2015-09-27 Thread Deenar Toraskar
You could try a couple of things a) use Kafka for stream processing, store current incoming events and spark streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too (in a micro batched mode), so every x minutes. Kafka is more suited to processing lots of small events/ b) Coales

Re: HDFS small file generation problem

2015-09-27 Thread ayan guha
I would suggest not to write small files to hdfs. rather you can hold them in memory, maybe off heap. and then you may flush it to hdfs using another job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark already has something like it) On Sun, Sep 27, 2015 at 11:36 PM, wrote: >