Re: Dealing with large number of small files

Sid Tue, 26 Apr 2022 11:41:37 -0700

Thanks for your time, everyone :)

Much appreciated.


I solved it using jq utility since I was dealing with JSON. I have solved
it using below script:

find . -name '*.txt' -exec cat '{}' + | jq -s '.' > output.txt


Thanks,

Sid


On Tue, Apr 26, 2022 at 9:37 PM Bjørn Jørgensen <[email protected]>
wrote:

> and the bash script seems to read txt files not json
>
> for f in Agent/*.txt; do cat ${f} >> merged.json;done;
>
>
>
> tir. 26. apr. 2022 kl. 18:03 skrev Gourav Sengupta <
> [email protected]>:
>
>> Hi,
>>
>> what is the version of spark are you using? And where is the data stored.
>>
>> I am not quite sure that just using a bash script will help because
>> concatenating all the files into a single file creates a valid JSON.
>>
>> Regards,
>> Gourav
>>
>> On Tue, Apr 26, 2022 at 3:44 PM Sid <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> Can somebody help me with the below problem?
>>>
>>>
>>> https://stackoverflow.com/questions/72015557/dealing-with-large-number-of-small-json-files-using-pyspark
>>>
>>>
>>> Thanks,
>>> Sid
>>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: Dealing with large number of small files

Reply via email to