Re: Small File to HDFS

2015-09-04 Thread Jörn Franke
Maybe you can tell us more about your use case, I have somehow the feeling that we are missing sth here Le jeu. 3 sept. 2015 à 15:54, Jörn Franke a écrit : > > Store them as hadoop archive (har) > > Le mer. 2 sept. 2015 à 18:07, a écrit : > >> Hello, >> I'am currently using Spark Streaming to

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
;>>>> and sometimes I will need to replace the content of a file by a new >>>>> content >>>>> (remove/replace) >>>>> >>>>> >>>>> Tks a lot >>>>> Nicolas >>>>> >>&g

Re: Small File to HDFS

2015-09-04 Thread Jörn Franke
R ? >>>> Basically the name of my small files will be the keys of my records , >>>> and sometimes I will need to replace the content of a file by a new content >>>> (remove/replace) >>>> >>>> >>>> Tks a lot >>>&g

Re: Small File to HDFS

2015-09-04 Thread Tao Lu
gt;>> >>> Tks a lot >>> Nicolas >>> >>> - Mail original - >>> De: "Jörn Franke" >>> À: nib...@free.fr >>> Cc: user@spark.apache.org >>> Envoyé: Jeudi 3 Septembre 2015 19:29:42 >>> Objet: Re:

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
>> sometimes I will need to replace the content of a file by a new content >> (remove/replace) >> >> >> Tks a lot >> Nicolas >> >> - Mail original - >> De: "Jörn Franke" >> À: nib...@free.fr >> Cc: user@spark.apache

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
Jörn Franke" > À: nib...@free.fr > Cc: user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 19:29:42 > Objet: Re: Small File to HDFS > > > > Har is transparent and hardly any performance overhead. You may decide not > to compress or use a fast compression algorithm, such as snapp

Re: Small File to HDFS

2015-09-03 Thread nibiau
new content (remove/replace) Tks a lot Nicolas - Mail original - De: "Jörn Franke" À: nib...@free.fr Cc: user@spark.apache.org Envoyé: Jeudi 3 Septembre 2015 19:29:42 Objet: Re: Small File to HDFS Har is transparent and hardly any performance overhead. You may decide not to

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
about performances ? > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HDFS > > > > > Store them as hadoop archive (har) > > > Le mer. 2 se

Re: Small File to HDFS

2015-09-03 Thread Martin Menzel
: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HDFS > > > > > Store them as hadoop archive (har) > > > Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : > > > Hello, > I'am currently using Spark Streaming to collect small messag

Re: Small File to HDFS

2015-09-03 Thread Tao Lu
R usage is , is it possible to use Pig on it > and what about performances ? > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, user@spark.apache.org > Envoyé: Jeudi 3 Septembre 2015 15:54:42 > Objet: Re: Small File to HDFS > > > > > Store

Re: Small File to HDFS

2015-09-03 Thread nibiau
My main question in case of HAR usage is , is it possible to use Pig on it and what about performances ? - Mail original - De: "Jörn Franke" À: nib...@free.fr, user@spark.apache.org Envoyé: Jeudi 3 Septembre 2015 15:54:42 Objet: Re: Small File to HDFS Store them as hado

Re: Small File to HDFS

2015-09-03 Thread Jörn Franke
Store them as hadoop archive (har) Le mer. 2 sept. 2015 à 18:07, a écrit : > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high (several millions per day) and I have to > store those messages in HDFS. > I understood that stori

Re: Small File to HDFS

2015-09-03 Thread Ted Yu
Pig on it >> directly ? >> >> Tks >> Nicolas >> >> - Mail original - >> De: "Tao Lu" >> À: nib...@free.fr >> Cc: "Ted Yu" , "user" >> Envoyé: Mercredi 2 Septembre 2015 19:09:23 >> Objet:

Re: Small File to HDFS

2015-09-03 Thread Ndjido Ardo Bar
n the case of a big zip file, is it possible to easily process Pig on it > directly ? > > Tks > Nicolas > > - Mail original - > De: "Tao Lu" > À: nib...@free.fr > Cc: "Ted Yu" , "user" > Envoyé: Mercredi 2 Septembre 2015 19:0

Re: Small File to HDFS

2015-09-03 Thread nibiau
c: "Ted Yu" , "user" Envoyé: Mercredi 2 Septembre 2015 19:09:23 Objet: Re: Small File to HDFS You may consider storing it in one big HDFS file, and to keep appending new messages to it. For instance, one message -> zip it -> append it to the HDFS as one line On

Re: Small File to HDFS

2015-09-02 Thread Tao Lu
and > don't want to add an other database in the loop > Is it the only solution ? > > Tks > Nicolas > > - Mail original - > De: "Ted Yu" > À: nib...@free.fr > Cc: "user" > Envoyé: Mercredi 2 Septembre 2015 18:34:17 > Objet: Re:

Re: Small File to HDFS

2015-09-02 Thread nibiau
Hi, I already store them in MongoDB in parralel for operational access and don't want to add an other database in the loop Is it the only solution ? Tks Nicolas - Mail original - De: "Ted Yu" À: nib...@free.fr Cc: "user" Envoyé: Mercredi 2 Septembre 2015 18:34

Re: Small File to HDFS

2015-09-02 Thread Ted Yu
Instead of storing those messages in HDFS, have you considered storing them in key-value store (e.g. hbase) ? Cheers On Wed, Sep 2, 2015 at 9:07 AM, wrote: > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high (several millions

Small File to HDFS

2015-09-02 Thread nibiau
Hello, I'am currently using Spark Streaming to collect small messages (events) , size being <50 KB , volume is high (several millions per day) and I have to store those messages in HDFS. I understood that storing small files can be problematic in HDFS , how can I manage it ? Tks Nicolas --