Hi,
I am trying to figure out a way to find the size of *persisted *dataframes
using the *sparkContext.getRDDStorageInfo() *
RDDStorageInfo object has information related to the number of bytes stored
in memory and on disk.
For eg:
I have 3 dataframes which i have cached.
df1.cache()
df2.cache()
d
Hi,
I have compressed data of size 500GB .I am repartitioning this data since
the underlying data is very skewed and is causing a lot of issues for the
downstream jobs.
During repartioning the *shuffles writes* are not getting compressed due to
this I am running into disk space issues.Below is the
If you are looking for workaround, the JIRA ticket clearly show you how to
> increase your driver heap. 1G in today's world really is kind of small.
>
>
> Yong
>
>
> --
> *From:* Bahubali Jain
> *Sent:* Thursday, March 16, 2017 10:34 PM
> *
; issues.apache.org
> Executing a sql statement with a large number of partitions requires a
> high memory space for the driver even there are no requests to collect data
> back to the driver.
>
>
>
> --
> *From:* Bahubali Jain
> *Sent:* Thursday,
Hi,
While saving a dataset using *
mydataset.write().csv("outputlocation") * I am running
into an exception
*"Total size of serialized results of 3722 tasks (1024.0 MB) is bigger than
spark.driver.maxResultSize (1024.0 MB)"*
Does it mean that for saving a dataset whole of
Hi,
Do we have any feature selection techniques implementation(wrapper
methods,embedded methods) available in SPARK ML ?
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
s the model, then
> add the model to the pipeline where it will only transform.
>
> val featureVectorIndexer = new VectorIndexer()
> .setInputCol("feature")
> .setOutputCol("indexedfeature")
> .setMaxCategories(180)
> .fit(completeDataset)
&g
Hi,
I had run into similar exception " java.util.NoSuchElementException: key
not found: " .
After further investigation I realized it is happening due to vectorindexer
being executed on training dataset and not on entire dataset.
In the dataframe I have 5 categories , each of these have to go thru
Hi,
We have a requirement where in we need to process set of xml files, each of
the xml files contain several records (eg:
data of record 1..
data of record 2..
Expected output is
Since we needed file name as well in output ,we chose wholetextfile() . We
had to go against
Hi,
How would the DAG look like for the below code
JavaRDD rdd1 = context.textFile();
JavaRDD rdd2 = rdd1.map();
rdd1 = rdd2.map();
Does this lead to any kind of cycle?
Thanks,
Baahu
Hi,
Why doesn't JavaRDD has saveAsNewAPIHadoopFile() associated with it.
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
How do we read files from multiple directories using newApiHadoopFile () ?
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
I have a requirement in which I plan to use the SPARK Streaming.
I am supposed to calculate the access count to certain webpages.I receive
the webpage access information thru log files.
By Access count I mean "how many times was the page accessed *till now* "
I have the log files for past 2 yea
Feb 11, 2015 at 6:35 AM, Akhil Das
> wrote:
> > Did you try :
> >
> > temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
> String.class,(Class)
> > TextOutputFormat.class);
> >
> > Thanks
> > Best Regards
> >
> > O
Hi,
I am facing issues while writing data from a streaming rdd to hdfs..
JavaPairDstream temp;
...
...
temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
String.class,TextOutputFormat.class);
I see compilation issues as below...
The method saveAsHadoopFiles(String, String, Class, Class, Cla
Hi,
I am trying to use textFileStream("some_hdfs_location") to pick new files
from a HDFS location.I am seeing a pretty strange behavior though.
textFileStream() is not detecting new files when I "move" them from a
location with in hdfs to location at which textFileStream() is checking for
new file
Hi,
You can associate all the messages of a 3min interval with a unique key and
then group by and finally add up.
Thanks
On Dec 1, 2014 9:02 PM, "pankaj" wrote:
> Hi,
>
> My incoming message has time stamp as one field and i have to perform
> aggregation over 3 minute of time slice.
>
> Message
Hi,
Can anybody help me on this please, haven't been able to find the problem
:(
Thanks.
On Nov 15, 2014 4:48 PM, "Bahubali Jain" wrote:
> Hi,
> Trying to use spark streaming, but I am struggling with word count :(
> I want consolidate output of the word count (not on a
Hi,
Trying to use spark streaming, but I am struggling with word count :(
I want consolidate output of the word count (not on a per window basis), so
I am using updateStateByKey(), but for some reason this is not working.
The function it self is not being invoked(do not see the sysout output on
con
Hi,
I have a custom key class.In this class equals() and hashcode() have been
overridden.
I have a javaPairRDD which has this class as the key .When groupbykey() or
reducebykey() is called a null object is being passed to the function
*equals*(Object obj) as a result the grouping is failing.
Is t
20 matches
Mail list logo