Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-29 Thread Asif Ali Hirekumbi via Python-list
Dear Python Experts,

I am working with the Kenna Application's API to retrieve vulnerability
data. The API endpoint provides a single, massive JSON file in gzip format,
approximately 60 GB in size. Handling such a large dataset in one go is
proving to be quite challenging, especially in terms of memory management.

I am looking for guidance on how to efficiently stream this data and
process it in chunks using Python. Specifically, I am wondering if there’s
a way to use the requests library or any other libraries that would allow
us to pull data from the API endpoint in a memory-efficient manner.

Here are the relevant API endpoints from Kenna:

   - Kenna API Documentation
   
   - Kenna Vulnerabilities Export
   

If anyone has experience with similar use cases or can offer any advice, it
would be greatly appreciated.

Thank you in advance for your help!

Best regards
Asif Ali
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-29 Thread Abdur-Rahmaan Janhangeer via Python-list
Idk if you tried Polars, but it seems to work well with JSON data

import polars as pl
pl.read_json("file.json")

Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius


On Mon, Sep 30, 2024 at 8:00 AM Asif Ali Hirekumbi via Python-list <
python-list@python.org> wrote:

> Dear Python Experts,
>
> I am working with the Kenna Application's API to retrieve vulnerability
> data. The API endpoint provides a single, massive JSON file in gzip format,
> approximately 60 GB in size. Handling such a large dataset in one go is
> proving to be quite challenging, especially in terms of memory management.
>
> I am looking for guidance on how to efficiently stream this data and
> process it in chunks using Python. Specifically, I am wondering if there’s
> a way to use the requests library or any other libraries that would allow
> us to pull data from the API endpoint in a memory-efficient manner.
>
> Here are the relevant API endpoints from Kenna:
>
>- Kenna API Documentation
>
>- Kenna Vulnerabilities Export
>
>
> If anyone has experience with similar use cases or can offer any advice, it
> would be greatly appreciated.
>
> Thank you in advance for your help!
>
> Best regards
> Asif Ali
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

2024-09-29 Thread Asif Ali Hirekumbi via Python-list
Thanks Abdur Rahmaan.
I will give it a try !

Thanks
Asif

On Mon, Sep 30, 2024 at 11:19 AM Abdur-Rahmaan Janhangeer <
arj.pyt...@gmail.com> wrote:

> Idk if you tried Polars, but it seems to work well with JSON data
>
> import polars as pl
> pl.read_json("file.json")
>
> Kind Regards,
>
> Abdur-Rahmaan Janhangeer
> about  | blog
> 
> github 
> Mauritius
>
>
> On Mon, Sep 30, 2024 at 8:00 AM Asif Ali Hirekumbi via Python-list <
> python-list@python.org> wrote:
>
>> Dear Python Experts,
>>
>> I am working with the Kenna Application's API to retrieve vulnerability
>> data. The API endpoint provides a single, massive JSON file in gzip
>> format,
>> approximately 60 GB in size. Handling such a large dataset in one go is
>> proving to be quite challenging, especially in terms of memory management.
>>
>> I am looking for guidance on how to efficiently stream this data and
>> process it in chunks using Python. Specifically, I am wondering if there’s
>> a way to use the requests library or any other libraries that would allow
>> us to pull data from the API endpoint in a memory-efficient manner.
>>
>> Here are the relevant API endpoints from Kenna:
>>
>>- Kenna API Documentation
>>
>>- Kenna Vulnerabilities Export
>>
>>
>> If anyone has experience with similar use cases or can offer any advice,
>> it
>> would be greatly appreciated.
>>
>> Thank you in advance for your help!
>>
>> Best regards
>> Asif Ali
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list