Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
any issues just by sizing an app that I > would first check memory size, cpu allocations and so on.. > > Best, > > On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah > wrote: > >> Hi Riccardo, >> >> Yes, Thanks for suggesting I do that. >>

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
sage in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >

Re: PySpark working with Generators

2017-07-05 Thread Saatvik Shah
not heard of. Thanks and Regards, Saatvik Shah On Fri, Jun 30, 2017 at 10:16 AM, Jörn Franke wrote: > In this case i do not see so many benefits of using Spark. Is the data > volume high? > Alternatively i recommend to convert the proprietary format into a format > Sparks underst

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
egards, Saatvik Shah On Fri, Jun 30, 2017 at 12:50 AM, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Wouldn’t this work if you load the files in hdfs and let the partitions be > equal to the amount of parallelism you want? > > > > *From:* Saatvik Shah [ma

Re: PySpark working with Generators

2017-06-29 Thread Saatvik Shah
Hey Ayan, This isnt a typical text file - Its a proprietary data format for which a native Spark reader is not available. Thanks and Regards, Saatvik Shah On Thu, Jun 29, 2017 at 6:48 PM, ayan guha wrote: > If your files are in same location you can use sc.wholeTextFile. If not, > sc.te

Re: Merging multiple Pandas dataframes

2017-06-22 Thread Saatvik Shah
, then after some (say 5) I > would write to disk and reload. At that point you should call unpersist to > free the memory as it is no longer relevant. > > > > Thanks, > > Assaf. > > > > *From:* Saatvik Shah [mailto:saatvikshah1...@gmail.com

Re: Merging multiple Pandas dataframes

2017-06-20 Thread Saatvik Shah
ns for optimizing this process further? > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Merging-multiple-Pandas-dataframes-tp28770.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > -----

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Saatvik Shah
elect("col1").filter("col1 in ('happy')") >>>>> } >>>>> override def copy(extra: ParamMap): Transformer = ??? >>>>> @DeveloperApi >>>>> override def transformSchema(schema: StructType): StructType ={ >>

Re: Best alternative for Category Type in Spark Dataframe

2017-06-16 Thread Saatvik Shah
Hi Pralabh, I want the ability to create a column such that its values be restricted to a specific set of predefined values. For example, suppose I have a column called EMOTION: I want to ensure each row value is one of HAPPY,SAD,ANGRY,NEUTRAL,NA. Thanks and Regards, Saatvik Shah On Fri, Jun 16

Re: Best alternative for Category Type in Spark Dataframe

2017-06-16 Thread Saatvik Shah
egards, Saatvik Shah On Fri, Jun 16, 2017 at 1:42 AM, 颜发才(Yan Facai) wrote: > You can use some Transformers to handle categorical data, > For example, > StringIndexer encodes a string column of labels to a column of label > indices: > http://spark.apache.org/docs/latest/ml-featur