In your JSON snippet, 111 and 222 are quoted, namely they are strings.
Thus they are automatically inferred as string rather than tinyint by
|jsonRDD|. Try this in Spark shell:
|val sparkContext = sc
import org.apache.spark.sql._
import sparkContext._
val sqlContext = new SQLContext(sparkContext)
import sqlContext._
val r0 = jsonRDD(makeRDD("""{"a": "111", "b": "222"}""" ::"""{"a": "111", "b":
"222"}""" ::Nil))
r0.printSchema()
|
You’ll see the inferred schema:
|root
|-- a: string (nullable = true)
|-- b: string (nullable = true)
|
Now remove the quotes around 111 and 222, repeat the snippet above:
|val r0 = jsonRDD(makeRDD("""{"a": 111, "b": 222}""" ::"""{"a": 111, "b":
222}""" ::Nil))
r0.printSchema()
|
You’ll see:
|root
|-- a: integer (nullable = true)
|-- b: integer (nullable = true)
|
If you’re not satisfied with the auto inferred schema, you can also
specify your own (for example you want tinyint rather than integer):
|import org.apache.spark.sql.catalyst.types._
// SQL TINYINT is mapped to short integer in Spark SQL
val schema = StructType(StructField("a",ShortType,true)
::StructField("b",ShortType,true) ::Nil)
val r1 = applySchema(r0, schema)
r1.printSchema()
|
Here it is:
|root
|-- a: short (nullable = true)
|-- b: short (nullable = true)
|
On 10/18/14 9:50 PM, valgrind_girl wrote:
The complete code is as follows:
JavaHiveContext ctx;
JavaSchemaRDD schemas=ctx.jsonRDD(arg0);
schemas.insertInto("test", true);
JavaSchemaRDD teeagers=ctx.hql("SELECT a,b FROM test");
List<String> teeagerNames1=teeagers.map(new Function<Row,String>()
{
/**
*
*/
private static final long serialVersionUID = 1L;
@Override
public String call(Row arg1) throws Exception {
// TODO Auto-generated method stub
System.out.println(arg1.length());
System.out.println("Name:"+arg1.getString(0));
System.out.println("Name:"+arg1.getString(1));
return "Name:"+arg1.getString(0);
}
}).collect();
the input is :
{"a":"111","b":"222"}
{"a":"111","b":"222"}
the output is:
2
Name:111
Name:NULL
while from hive:
hive> select * from test;
OK
111 NULL
111 NULL
the schema of test into:
create table test(
a tinyint,
b tinyint
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
so,what's going wrong? Can insertinto function be applied to a hive table?
why I keep geting 111,NULL instead of 111,222?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/a-hivectx-insertinto-issue-can-inertinto-function-be-applied-to-a-hive-table-tp16738.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org