unsubscribe
serialized. you can get the value
> out on the driver side and refer to it in foreach, then the value would be
> serialized with the lambda expr and sent to workers.
>
> On Fri, Jun 16, 2017 at 2:29 AM, Anton Kravchenko <
> kravchenko.anto...@gmail.com> wrote:
>
>> How
How one would access a broadcasted variable from within
ForeachPartitionFunction Spark(2.0.1) Java API ?
Integer _bcv = 123;
Broadcast bcv = spark.sparkContext().broadcast(_bcv);
Dataset df_sql = spark.sql("select * from atable");
df_sql.foreachPartition(new ForeachPartitionFunction() {
publ
aki Ishizaki
>
>
>
> From:Anton Kravchenko
> To:"user @spark"
> Date:2017/06/14 01:16
> Subject:Java access to internal representation of
> DataTypes.DateType
> --
>
>
>
> How one would access to i
How one would access to internal representation of DataTypes.DateType from
Spark (2.0.1) Java API?
From
https://github.com/apache/spark/blob/51b1c1551d3a7147403b9e821fcc7c8f57b4824c/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DateType.scala
:
"Internally, this is represented as the numb
call(Iterator t) throws Exception{
List rows = new ArrayList();
while(t.hasNext()) {
Row irow = t.next();
rows.add(irow.toString());
}
System.out.println(rows.toString());
}
}
On Tue, May 30, 2017 at 2:01 PM, Anton Kravchenko
y 30, 2017 at 10:58 AM, Anton Kravchenko <
kravchenko.anto...@gmail.com> wrote:
> What would be a Java equivalent of the Scala code below?
>
> def void_function_in_scala(ipartition: Iterator[Row]): Unit ={
> var df_rows=ArrayBuffer[String]()
> for(iro
What would be a Java equivalent of the Scala code below?
def void_function_in_scala(ipartition: Iterator[Row]): Unit ={
var df_rows=ArrayBuffer[String]()
for(irow<-ipartition){
df_rows+=irow.toString
}
val df = spark.read.csv("file:///C:/input_data/*.csv")
d
df.rdd.foreachPartition(convert_to_sas_single_partition)
def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = {
for (irow <- ipartition) {
Hi there,
When I do rdd map with more than 22 columns - I get "error: too many
arguments for unapply pattern, maximum = 22".
scala> val rddRes=rows.map{case Row(col1,..col23) => Row(...)}
Is there a known way to get around this issue?
p.s. Here is a full traceback:
C:\spark-2.0.1-bin-hadoop2.7>b
Thanks,
> Divya
>
> On 9 December 2016 at 00:53, Anton Kravchenko <
> kravchenko.anto...@gmail.com> wrote:
>
>> Hello,
>>
>> I wonder if there is a way (preferably efficient) in Spark to reshape
>> hive table and save it to parquet.
>>
>>
I am looking for something like:
# prepare input data
val input_schema = StructType(Seq(
StructField("col1", IntegerType),
StructField("col2", IntegerType),
StructField("col3", IntegerType)))
val input_data = spark.createDataFrame(
sc.parallelize(Seq(
Row(1, 2, 3),
Hello,
I wonder if there is a way (preferably efficient) in Spark to reshape hive
table and save it to parquet.
Here is a minimal example, input hive table:
col1 col2 col3
1 2 3
4 5 6
output parquet:
col1 newcol2
1 [2 3]
4 [5 6]
p.s. The real input hive table has ~1000 columns.
Thank you,
Anto
13 matches
Mail list logo