Hi,
How do we read files from multiple directories using newApiHadoopFile () ?
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
Why doesn't JavaRDD has saveAsNewAPIHadoopFile() associated with it.
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
We have a requirement where in we need to process set of xml files, each of
the xml files contain several records (eg:
data of record 1..
data of record 2..
Expected output is
Since we needed file name as well in output ,we chose wholetextfile() . We
had to go against
Hi,
How would the DAG look like for the below code
JavaRDD rdd1 = context.textFile();
JavaRDD rdd2 = rdd1.map();
rdd1 = rdd2.map();
Does this lead to any kind of cycle?
Thanks,
Baahu
Hi,
I had run into similar exception " java.util.NoSuchElementException: key
not found: " .
After further investigation I realized it is happening due to vectorindexer
being executed on training dataset and not on entire dataset.
In the dataframe I have 5 categories , each of these have to go thru
s the model, then
> add the model to the pipeline where it will only transform.
>
> val featureVectorIndexer = new VectorIndexer()
> .setInputCol("feature")
> .setOutputCol("indexedfeature")
> .setMaxCategories(180)
> .fit(completeDataset)
&g
Hi,
Do we have any feature selection techniques implementation(wrapper
methods,embedded methods) available in SPARK ML ?
Thanks,
Baahu
--
Twitter:http://twitter.com/Baahu
Hi,
While saving a dataset using *
mydataset.write().csv("outputlocation") * I am running
into an exception
*"Total size of serialized results of 3722 tasks (1024.0 MB) is bigger than
spark.driver.maxResultSize (1024.0 MB)"*
Does it mean that for saving a dataset whole of
; issues.apache.org
> Executing a sql statement with a large number of partitions requires a
> high memory space for the driver even there are no requests to collect data
> back to the driver.
>
>
>
> --
> *From:* Bahubali Jain
> *Sent:* Thursday,
If you are looking for workaround, the JIRA ticket clearly show you how to
> increase your driver heap. 1G in today's world really is kind of small.
>
>
> Yong
>
>
> --
> *From:* Bahubali Jain
> *Sent:* Thursday, March 16, 2017 10:34 PM
> *
Hi,
I have compressed data of size 500GB .I am repartitioning this data since
the underlying data is very skewed and is causing a lot of issues for the
downstream jobs.
During repartioning the *shuffles writes* are not getting compressed due to
this I am running into disk space issues.Below is the
Hi,
I am trying to figure out a way to find the size of *persisted *dataframes
using the *sparkContext.getRDDStorageInfo() *
RDDStorageInfo object has information related to the number of bytes stored
in memory and on disk.
For eg:
I have 3 dataframes which i have cached.
df1.cache()
df2.cache()
d
Hi,
I am facing issues while writing data from a streaming rdd to hdfs..
JavaPairDstream temp;
...
...
temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
String.class,TextOutputFormat.class);
I see compilation issues as below...
The method saveAsHadoopFiles(String, String, Class, Class, Cla
Feb 11, 2015 at 6:35 AM, Akhil Das
> wrote:
> > Did you try :
> >
> > temp.saveAsHadoopFiles("DailyCSV",".txt", String.class,
> String.class,(Class)
> > TextOutputFormat.class);
> >
> > Thanks
> > Best Regards
> >
> > O
Hi,
I have a requirement in which I plan to use the SPARK Streaming.
I am supposed to calculate the access count to certain webpages.I receive
the webpage access information thru log files.
By Access count I mean "how many times was the page accessed *till now* "
I have the log files for past 2 yea
Hi,
I have a custom key class.In this class equals() and hashcode() have been
overridden.
I have a javaPairRDD which has this class as the key .When groupbykey() or
reducebykey() is called a null object is being passed to the function
*equals*(Object obj) as a result the grouping is failing.
Is t
Hi,
Trying to use spark streaming, but I am struggling with word count :(
I want consolidate output of the word count (not on a per window basis), so
I am using updateStateByKey(), but for some reason this is not working.
The function it self is not being invoked(do not see the sysout output on
con
Hi,
Can anybody help me on this please, haven't been able to find the problem
:(
Thanks.
On Nov 15, 2014 4:48 PM, "Bahubali Jain" wrote:
> Hi,
> Trying to use spark streaming, but I am struggling with word count :(
> I want consolidate output of the word count (not on a
Hi,
You can associate all the messages of a 3min interval with a unique key and
then group by and finally add up.
Thanks
On Dec 1, 2014 9:02 PM, "pankaj" wrote:
> Hi,
>
> My incoming message has time stamp as one field and i have to perform
> aggregation over 3 minute of time slice.
>
> Message
Hi,
I am trying to use textFileStream("some_hdfs_location") to pick new files
from a HDFS location.I am seeing a pretty strange behavior though.
textFileStream() is not detecting new files when I "move" them from a
location with in hdfs to location at which textFileStream() is checking for
new file
20 matches
Mail list logo