Re: a hivectx insertinto issue-can inertinto function be applied to a hive table

Cheng Lian Sat, 18 Oct 2014 07:39:06 -0700

In your JSON snippet, 111 and 222 are quoted, namely they are strings.Thus they are automatically inferred as string rather than tinyint by|jsonRDD|. Try this in Spark shell:


|val  sparkContext  =  sc


import  org.apache.spark.sql._
import  sparkContext._

val  sqlContext  =  new  SQLContext(sparkContext)
import  sqlContext._

val  r0  =  jsonRDD(makeRDD("""{"a": "111", "b": "222"}"""  ::"""{"a": "111", "b": 
"222"}"""  ::Nil))
r0.printSchema()
|

You’ll see the inferred schema:

|root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
|

Now remove the quotes around 111 and 222, repeat the snippet above:

|val  r0  =  jsonRDD(makeRDD("""{"a": 111, "b": 222}"""  ::"""{"a": 111, "b": 
222}"""  ::Nil))
r0.printSchema()
|

You’ll see:

|root
  |-- a: integer (nullable = true)
  |-- b: integer (nullable = true)
|

If you’re not satisfied with the auto inferred schema, you can alsospecify your own (for example you want tinyint rather than integer):


|import  org.apache.spark.sql.catalyst.types._
// SQL TINYINT is mapped to short integer in Spark SQL
val  schema  =  StructType(StructField("a",ShortType,true) 
::StructField("b",ShortType,true) ::Nil)
val  r1  =  applySchema(r0, schema)
r1.printSchema()
|

Here it is:

|root
 |-- a: short (nullable = true)
 |-- b: short (nullable = true)
|

On 10/18/14 9:50 PM, valgrind_girl wrote:

The complete code is as follows:

JavaHiveContext ctx;
JavaSchemaRDD schemas=ctx.jsonRDD(arg0);
schemas.insertInto("test", true);

JavaSchemaRDD teeagers=ctx.hql("SELECT a,b FROM test");

List<String> teeagerNames1=teeagers.map(new Function<Row,String>()
                                             {
                                                                         /**
                                                                          *
                                                                          */

private static final long serialVersionUID = 1L;

@Overridepublic String call(Row arg1) throws Exception {// TODO Auto-generated method stubSystem.out.println(arg1.length());System.out.println("Name:"+arg1.getString(0));System.out.println("Name:"+arg1.getString(1));return "Name:"+arg1.getString(0);

                                                                         }
                                             }).collect();
the input is :
{"a":"111","b":"222"}
{"a":"111","b":"222"}


the output is:
2
Name:111
Name:NULL


while from hive:

hive> select * from test;
OK
111     NULL
111     NULL


the schema of test into:


create table test(
a tinyint,
b tinyint
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';



so,what's going wrong? Can  insertinto function be applied to a hive table?

why I keep geting 111,NULL instead of 111,222?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/a-hivectx-insertinto-issue-can-inertinto-function-be-applied-to-a-hive-table-tp16738.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: a hivectx insertinto issue-can inertinto function be applied to a hive table

Reply via email to