According to the Spark SQL documentation, indeed, this project allows python to be used while reading/writing table ie data which not necessarily in text format.
Thanks a lot! Bertrand Dechoux On Thu, Apr 17, 2014 at 10:06 AM, Bertrand Dechoux <decho...@gmail.com>wrote: > Thanks for the IRA reference. I really need to look at Spark SQL. > > Am I right to understand that due to Spark SQL, hive data can be read (and > it does not need to be a text format) and then 'classical' Spark can work > on this extraction? > > It seems logical but I haven't worked with Spark SQL as of now. > > Does it also imply the reverse is true? That I can write data as hive data > with spark SQL using results from a random (python) Spark application? > > Bertrand Dechoux > > > On Thu, Apr 17, 2014 at 7:23 AM, Matei Zaharia <matei.zaha...@gmail.com>wrote: > >> Yes, this JIRA would enable that. The Hive support also handles HDFS. >> >> Matei >> >> On Apr 16, 2014, at 9:55 PM, Jesvin Jose <frank.einst...@gmail.com> >> wrote: >> >> When this is implemented, can you load/save an RDD of pickled objects to >> HDFS? >> >> >> On Thu, Apr 17, 2014 at 1:51 AM, Matei Zaharia >> <matei.zaha...@gmail.com>wrote: >> >>> Hi Bertrand, >>> >>> We should probably add a SparkContext.pickleFile and >>> RDD.saveAsPickleFile that will allow saving pickled objects. Unfortunately >>> this is not in yet, but there is an issue up to track it: >>> https://issues.apache.org/jira/browse/SPARK-1161. >>> >>> In 1.0, one feature we do have now is the ability to load binary data >>> from Hive using Spark SQL’s Python API. Later we will also be able to save >>> to Hive. >>> >>> Matei >>> >>> On Apr 16, 2014, at 4:27 AM, Bertrand Dechoux <decho...@gmail.com> >>> wrote: >>> >>> > Hi, >>> > >>> > I have browsed the online documentation and it is stated that PySpark >>> only read text files as sources. Is it still the case? >>> > >>> > From what I understand, the RDD can after this first step be any >>> serialized python structure if the class definitions are well distributed. >>> > >>> > Is it not possible to read back those RDDs? That is create a flow to >>> parse everything and then, e.g. the next week, start from the binary, >>> structured data? >>> > >>> > Technically, what is the difficulty? I would assume the code reading a >>> binary python RDD or a binary python file to be quite similar. Where can I >>> know more about this subject? >>> > >>> > Thanks in advance >>> > >>> > Bertrand >>> >>> >> >> >> -- >> We dont beat the reaper by living longer. We beat the reaper by living >> well and living fully. The reaper will come for all of us. Question is, >> what do we do between the time we are born and the time he shows up? -Randy >> Pausch >> >> >> >