Thank you, that works.
**
*Sincerely yours,*
*Raymond*
On Tue, Jun 19, 2018 at 4:36 PM, Nicolas Paris wrote:
> Hi Raymond
>
> Spark works well on single machine too, since it benefits from multiple
> core.
> The csv parser is based on univocity
Hi Raymond
Spark works well on single machine too, since it benefits from multiple
core.
The csv parser is based on univocity and you might use the
"spark.read.csc" syntax instead of using the rdd api;
>From my experience, this will better than any other csv parser
2018-06-19 16:43 GMT+02:00 Ra
Thank you Matteo, Askash and Georg:
I am attempting to get some stats first, the data is like:
1,4152983,2355072,pv,1511871096
I like to find out the count of Key of (UserID, Behavior Type)
val bh_count =
sc.textFile("C:\\RXIE\\Learning\\Data\\Alibaba\\UserBehavior\\UserBehavior.csv").map(_.sp
Single machine? Any other framework will perform better than Spark
On Tue, 19 Jun 2018 at 09:40, Aakash Basu
wrote:
> Georg, just asking, can Pandas handle such a big dataset? If that data is
> further passed into using any of the sklearn modules?
>
> On Tue, Jun 19, 2018 at 10:35 AM, Georg Heil
Georg, just asking, can Pandas handle such a big dataset? If that data is
further passed into using any of the sklearn modules?
On Tue, Jun 19, 2018 at 10:35 AM, Georg Heiler
wrote:
> use pandas or dask
>
> If you do want to use spark store the dataset as parquet / orc. And then
> continue to pe
use pandas or dask
If you do want to use spark store the dataset as parquet / orc. And then
continue to perform analytical queries on that dataset.
Raymond Xie schrieb am Di., 19. Juni 2018 um
04:29 Uhr:
> I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment
> is 20GB ssd ha