may I forward this report to spark list as well.
Thanks.
Wes Peng wrote:
Hello,
This weekend I made a test against a big dataset. spark, drill, mysql,
postgresql were involved.
This is the final report:
https://blog.cloudcache.net/handles-the-file-larger-than-memory/
The simple conclusion
I just did a test, even for a single node (local deployment), spark can
handle the data whose size is much larger than the total memory.
My test VM (2g ram, 2 cores):
$ free -m
totalusedfree shared buff/cache
available
Mem: 19921845
I once had a file which is 100+GB getting computed in 3 nodes, each node
has 24GB memory only. And the job could be done well. So from my
experience spark cluster seems to work correctly for big files larger
than memory by swapping them to disk.
Thanks
rajat kumar wrote:
Tested this with exec
how many executors do you have?
rajat kumar wrote:
Tested this with executors of size 5 cores, 17GB memory. Data vol is
really high around 1TB
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
I made a simple test to query time for several SQL engines including
mysql, hive, drill and spark. The report,
https://cloudcache.net/data/query-time-mysql-hive-drill-spark.pdf
It maybe have no special meaning, just for fun. :)
regards.
Give a look at this:
https://github.com/LucaCanali/sparkMeasure
On 2022/1/20 1:18, Prasad Bhalerao wrote:
Is there any way we can profile spark applications which will show no.
of invocations of spark api and their execution time etc etc just the
way jprofiler shows all the details?
-
How large is the file? From my experience, reading the excel file from
data lake and loading as dataframe, works great.
Thanks
On 2022-01-18 22:16, Heta Desai wrote:
Hello,
I have zip files on SFTP location. I want to download/copy those
files and put into Azure Data Lake. Once the zip files
Are you using IvyVPN which causes this problem? If the VPN software changes
the network URL silently you should avoid using them.
Regards.
On Wed, Dec 22, 2021 at 1:48 AM Pralabh Kumar
wrote:
> Hi Spark Team
>
> I am building a spark in VPN . But the unit test case below is failing.
> This is p