Re: run reduceByKey on huge data in spark

2015-06-30 Thread barge.nilesh
"I 'm using 50 servers , 35 executors per server, 140GB memory per server" 35 executors *per server* sounds kind of odd to me. With 35 executors per server and server having 140gb, meaning each executor is going to get only 4gb, 4gb will be divided in to shuffle/storage memory fractions... assumi

Re: Spark SQL and Hive interoperability

2015-05-09 Thread barge.nilesh
hi, try your first method but create an external table in hive. like: hive -e "CREATE *EXTERNAL* TABLE people (name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-and-Hive-interop

Re: Schema change on Spark Hive (Parquet file format) table not working

2014-10-07 Thread barge.nilesh
To find root cause, I installed hive 0.12 separately and tried the exact same test through Hive CLI and it *passed*. So, looks like it is a problem with Spark-SQL. Has anybody else faced this issue (Hive-parquet table schema change)?? Should I create JIRA ticket for this? -- View this message

Re: timestamp not implemented yet

2014-10-01 Thread barge.nilesh
Parquet format seems to be comparatively better for analytic load, it has performance & compression benefits for large analytic workload. A workaround could be to use long datatype to store epoch timestamp value. If you already have existing parquet files (impala tables) then you may need to consid

Re: timestamp not implemented yet

2014-09-30 Thread barge.nilesh
Spark 1.1 comes with Hive 0.12 and Hive 0.12, for parquet format, doesn't support timestamp datatype. https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Limitations -- View this message in context:

Re: Schema change on Spark Hive (Parquet file format) table not working

2014-09-30 Thread barge.nilesh
code snippet in short: hiveContext.sql("*CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'*"); h

Schema change on Spark Hive (Parquet file format) table not working

2014-09-29 Thread barge.nilesh
I am using following releases: Spark 1.1 (built using */sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly/*) , Apache HDFS 2.2 My job is able to create/add/read data in hive, parquet formatted, tables using HiveContext. But, after changing schema, job is not able to read existing data and throws fo