Re: How to delete empty columns in df when writing to parquet?

Gourav Sengupta Sat, 07 Apr 2018 20:43:23 -0700

Hi Junfeng,

you are welcome. If users are extremely adamant on seeing only a few
columns try to see if you can create a view on only the selected columns
and give it to them, in case you are using hive metastore.


Regards,
Gourav

On Sun, Apr 8, 2018 at 3:28 AM, Junfeng Chen <darou...@gmail.com> wrote:

> Hi,
> Thanks for explaining!
>
>
> Regard,
> Junfeng Chen
>
> On Wed, Apr 4, 2018 at 7:43 PM, Gourav Sengupta <gourav.sengu...@gmail.com
> > wrote:
>
>> Hi,
>>
>> I do not think that in a columnar database it makes much of a difference.
>> The amount of data that you will be parsing will not be much anyways.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Apr 4, 2018 at 11:02 AM, Junfeng Chen <darou...@gmail.com> wrote:
>>
>>> Our users ask for it....
>>>
>>>
>>> Regard,
>>> Junfeng Chen
>>>
>>> On Wed, Apr 4, 2018 at 5:45 PM, Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
>>>> Hi Junfeng,
>>>>
>>>> can I ask why it is important to remove the empty column?
>>>>
>>>> Regards,
>>>> Gourav Sengupta
>>>>
>>>> On Tue, Apr 3, 2018 at 4:28 AM, Junfeng Chen <darou...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to read data from kafka and writing them in parquet format
>>>>> via Spark Streaming.
>>>>> The problem is, the data from kafka are in variable data structure.
>>>>> For example, app one has columns A,B,C, app two has columns B,C,D. So the
>>>>> data frame I read from kafka has all columns ABCD. When I decide to write
>>>>> the dataframe to parquet file partitioned with app name,
>>>>> the parquet file of app one also contains columns D, where the columns
>>>>> D is empty and it contains no data actually. So how to filter the empty
>>>>> columns when I writing dataframe to parquet?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Regard,
>>>>> Junfeng Chen
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to delete empty columns in df when writing to parquet?

Reply via email to