Its purely for estimation, when guessing when its safe to do a broadcast join. We picked a random number that we thought was larger than the common case (its better to over estimate to avoid OOM).
On Wed, Oct 7, 2015 at 10:11 PM, vivek bhaskar <vivekw...@gmail.com> wrote: > I want to understand whats use of default size for a given datatype? > > Following link mention that its for internal size estimation. > > https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DataType.html > > Above behavior is also reflected in code where default value seems to be > used for stats purpose only. > > But then we have default size of String datatype as 4096; why we went for > this random number? Or will it also restrict size of data? Any further > elaboration on how string datatype works will also help. > > Regards, > Vivek > > >