2 => val2
> //... cases up to 26
> }
> }
>
> hence expecting an approach to convert SchemaRDD to RDD without using
> Tuple or Case Class as we have restrictions in Scala 2.10
>
> Regards
> Satish Chandra
>
(that: Any): Boolean = that.isInstanceOf[MyRecord]
def productArity: Int = 26 // example value, it is amount of arguments
def productElement(n: Int): Serializable = n match {
case 1 => val1
case 2 => val2
//... cases up to 26
}
}
hence expecting an approach to convert
Have you seen this thread ?
http://search-hadoop.com/m/q3RTt9YBFr17u8j8&subj=Scala+Limitation+Case+Class+definition+with+more+than+22+arguments
On Fri, Oct 16, 2015 at 7:41 AM, satish chandra j
wrote:
> Hi All,
> To convert SchemaRDD to RDD below snipped is working if SQL sta
Hi All,
To convert SchemaRDD to RDD below snipped is working if SQL statement has
columns in a row are less than 22 as per tuple restriction
rdd.map(row => row.toString)
But if SQL statement has columns more than 22 than the above snippet will
error "*object Tuple27 is not a member of
, September 29, 2015 at 5:09 PM
To: Daniel Haviv, user
Subject: RE: Converting a DStream to schemaRDD
Something like:
dstream.foreachRDD { rdd =>
val df = sqlContext.read.json(rdd)
df.select(…)
}
https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations
ork it as if it were a standard RDD dataset.
Ewan
From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com]
Sent: 29 September 2015 15:03
To: user
Subject: Converting a DStream to schemaRDD
Hi,
I have a DStream which is a stream of RDD[String].
How can I pass a DStream to sqlContext.jsonRDD
Hi,
I have a DStream which is a stream of RDD[String].
How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ?
Thank you.
Daniel
wrote:
>
>> I wrote a brief howto on building nested records in spark and storing
>> them in parquet here:
>> http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
>>
>> 2015-06-23 16:12 GMT-07:00 Richard Catlin :
>>
>>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
>>> column? Is there an example? Will this store as a nested parquet file?
>>>
>>> Thanks.
>>>
>>> Richard Catlin
>>>
>>
>>
>
015-06-23 16:12 GMT-07:00 Richard Catlin :
>
>> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
>> column? Is there an example? Will this store as a nested parquet file?
>>
>> Thanks.
>>
>> Richard Catlin
>>
>
>
I wrote a brief howto on building nested records in spark and storing them
in parquet here:
http://www.congiu.com/creating-nested-data-parquet-in-spark-sql/
2015-06-23 16:12 GMT-07:00 Richard Catlin :
> How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
> column? Is
How do I create a DataFrame(SchemaRDD) with a nested array of Rows in a
column? Is there an example? Will this store as a nested parquet file?
Thanks.
Richard Catlin
Depending on your spark version, you can convert schemaRDD to a dataframe
and then use .show()
On 30 May 2015 10:33, "Minnow Noir" wrote:
> I"m trying to debug query results inside spark-shell, but finding it
> cumbersome to save to file and then use file system utils to
ay to present the contents of an RDD/SchemaRDD on the
screen in a formatted way? For example, say I want to take() the first 30
lines/rows in an *RDD and present them in a readable way on the screen so
that I can see what's missing or invalid. Obviously, I'm just trying to
sample the result
Hello
I am trying to create a SchemaRDD from a RDD of case classes. Depending on
an argument to the program, the program reads data of specified type and
maps it to the correct case class. But this throws an exception. I am using
Spark version 1.1.0 and Scala version 2.10.4
The exception can be
Hello
I am trying to create a SchemaRDD from a RDD of case classes. Depending on
an argument to the program these case classes should be different. But this
throws an exception. I am using Spark version 1.1.0 and Scala version 2.10.4
The exception can be reproduced by:
val table = "t
Hi all, following the
import com.datastax.spark.connector.SelectableColumnRef;
import com.datastax.spark.connector.japi.CassandraJavaUtil;
import org.apache.spark.sql.SchemaRDD;
import static com.datastax.spark.connector.util.JavaApiHelper.toScalaSeq;
import scala.collection.Seq;
SchemaRDD
one long list
of columns as I would be able to find some weird stuff by doing that. So my
question is the following:
1. Does SchemaRDD support something like multi value attributes? It might look
like and array of values that lives in just one
column. Although it’s not clear how I’d aggregate
Hi experts!
I would like to know is there anyway to store schemaRDD to cassandra?
if yes then how to store in existing cassandra column family and new column
family?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/saving-schemaRDD-to-cassandra
g our assumptions about partitioning.
On Mon, Mar 23, 2015 at 10:22 AM, Stephen Boesch wrote:
>
> Is there a way to take advantage of the underlying datasource partitions
> when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql
> module that the only options are Ran
Is there a way to take advantage of the underlying datasource partitions
when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql
module that the only options are RangePartitioner and HashPartitioner - and
further that those are selected automatically by the code . It was not
Looks like if I use unionAll this works.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-regular-rdd-transforms-on-schemaRDD-tp22105p22107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi All,
I was wondering how rdd transformation work on schemaRDDs. Is there a way
to force the rdd transform to keep the schemaRDD types or do I need to
recreate the schemaRDD by applying the applySchema method?
Currently what I have is an array of SchemaRDDs and I just want to do a
union
owing type data from a parquet file, stored in a schemaRDD
[7654321,2015-01-01 00:00:00.007,0.49,THU]
Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you please
tell me how to iterate over the data in this sc
Spark Version - 1.1.0
Scala - 2.10.4
I have loaded following type data from a parquet file, stored in a schemaRDD
[7654321,2015-01-01 00:00:00.007,0.49,THU]
Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Ca
s far as I know, registerTempTable is just a Map[String, SchemaRDD]
insertion, nothing that would be measurable. But there are no
distributed/RDD operations involved, I think.
Tobias
transformers classes for feature extraction, and If I need to save the
input and maybe output SchemaRDD of the transform function in every
transformer, this may not very efficient.
Thanks
On Tue, Mar 10, 2015 at 8:20 PM, Tobias Pfeiffer wrote:
> Hi,
>
> On Tue, Mar 10, 2015 at 2:13 PM, Ces
Hi,
On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores wrote:
> I am new to the SchemaRDD class, and I am trying to decide in using SQL
> queries or Language Integrated Queries (
> https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
> ).
>
> C
They should have the same performance, as they are compiled down to the
same execution plan.
Note that starting in Spark 1.3, SchemaRDD is renamed DataFrame:
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
On Tue, Mar 10, 2015 at 2:13
I am new to the SchemaRDD class, and I am trying to decide in using SQL
queries or Language Integrated Queries (
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD
).
Can someone tell me what is the main difference between the two approaches,
besides using
Hi Wush,
I'm CC'ing user@spark.apache.org (which is the new list) and BCC'ing
u...@spark.incubator.apache.org.
In Spark 1.3, schemaRDD is in fact being renamed to DataFrame (see:
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.h
Dear all,
I am a new spark user from R.
After exploring the schemaRDD, I notice that it is similar to data.frame.
Is there a feature like `model.matrix` in R to convert schemaRDD to model
matrix automatically according to the type without explicitly converting
them one by one?
Thanks,
Wush
Hi, in the roadmap of Spark in 2015 (link:
http://files.meetup.com/3138542/Spark%20in%202015%20Talk%20-%20Wendell.p
ptx), I saw SchemaRDD is designed to be the basis of BOTH Spark
Streaming and Spark SQL.
My question is: what's the typical usage of SchemaRDD in a Spark
Streaming applic
have seen it looks like spark sql is
> the way to go and how I would go about this would be to load in the csv
> file
> into an RDD and convert it into a schemaRDD by injecting in the schema via
> a
> case class.
>
> What I want to avoid is hard coding in the case class itself. I
Hi All,
I am currently trying to build out a spark job that would basically convert
a csv file into parquet. From what I have seen it looks like spark sql is
the way to go and how I would go about this would be to load in the csv file
into an RDD and convert it into a schemaRDD by injecting in
round and see others have asked similar questions.
>
> Given a schemaRDD I extract a restless that contains numbers, both Int and
> Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a
> textile and then read them back in splitting them with some code I found in
I've been searching around and see others have asked similar questions.
Given a schemaRDD I extract a restless that contains numbers, both Int and
Doubles. How do I construct a RDD[Vector]? In 1.2 I wrote the results to a
textile and then read them back in splitting them with some code I fou
t;
>> Hi Michael,
>>
>> I think that the feature (convert a SchemaRDD to a structured class RDD)
>> is
>> now available. But I didn't understand in the PR how exactly to do this.
>> Can
>> you give an example or doc links?
>>
>> Best regards
&g
eb 22, 2015 at 11:51 AM, stephane.collot wrote:
> Hi Michael,
>
> I think that the feature (convert a SchemaRDD to a structured class RDD) is
> now available. But I didn't understand in the PR how exactly to do this.
> Can
> you give an example or doc links?
>
> B
Hi Michael,
I think that the feature (convert a SchemaRDD to a structured class RDD) is
now available. But I didn't understand in the PR how exactly to do this. Can
you give an example or doc links?
Best regards
--
View this message in context:
http://apache-spark-user-list.10015
Hi,
can some one guide how to get SQL Exception trapped for query executed using
SchemaRDD,
i mean suppose table not found
thanks in advance,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-SchemaRDD-SQL-exceptions-i-e-table-not-found
timisation-in-SchemaRDD-tp21555p21613.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi All,
I have a use case where I have cached my schemaRDD and I want to launch
executors just on the partition which I know of (prime use-case of
PartitionPruningRDD).
I tried something like following :-
val partitionIdx = 2
val schemaRdd = hiveContext.table("myTable") //myTable is
Why don't you just map rdd's rows to lines and then call saveAsTextFile()?
On 3.2.2015. 11:15, Hafiz Mujadid wrote:
I want to write whole schemardd to single in hdfs but facing following
exception
rg.apache.hadoop.ipc.Remot
Hi,
Any thoughts ?
Thanks,
On Sun, Feb 1, 2015 at 12:26 PM, Manoj Samel
wrote:
> Spark 1.2
>
> SchemaRDD has schema with decimal columns created like
>
> x1 = new StructField("a", DecimalType(14,4), true)
>
> x2 = new StructField("b", DecimalType(14,4)
I want to write whole schemardd to single in hdfs but facing following
exception
rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57
I think I found the issue causing it.
I was calling schemaRDD.coalesce(n).saveAsParquetFile to reduce the number
of partitions in parquet file - in which case the stack trace happens.
If I compress the partitions before creating schemaRDD then the
schemaRDD.saveAsParquetFile call works for
Spark 1.2
SchemaRDD has schema with decimal columns created like
x1 = new StructField("a", DecimalType(14,4), true)
x2 = new StructField("b", DecimalType(14,4), true)
Registering as SQL Temp table and doing SQL queries on these columns ,
including SUM etc. works fine, s
Hi,
I am getting a stack overflow error when querying a schemardd comprised of
parquet files. This is (part of) the stack trace:
Caused by: java.lang.StackOverflowError
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at
scala.collection.TraversableOnce
Nathan <mailto:nathan.mccar...@quantium.com.au>>, Michael Armbrust
mailto:mich...@databricks.com>>
Cc: "user@spark.apache.org <mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: Re: SparkSQL schemaRDD & MapPartitions calls - performance
issues -
> Cheers,
> Nathan
>
> From: Cheng Lian
> Date: Monday, 12 January 2015 1:21 am
> To: Nathan , Michael Armbrust <
> mich...@databricks.com>
> Cc: "user@spark.apache.org"
>
> Subject: Re: SparkSQL schemaRDD & MapPartitions calls - performance
>
apache.org>>
Subject: Re: SparkSQL schemaRDD & MapPartitions calls - performance issues -
columnar formats?
On 1/11/15 1:40 PM, Nathan McCarthy wrote:
Thanks Cheng & Michael! Makes sense. Appreciate the tips!
Idiomatic scala isn't performant. I’ll definitely start using while loo
Nathan <mailto:nathan.mccar...@quantium.com.au>>, "user@spark.apache.org
<mailto:user@spark.apache.org>" <mailto:user@spark.apache.org>>
Subject: Re: SparkSQL schemaRDD & MapPartitions calls - performance
issues - columnar formats?
The other thing to note here i
t mailto:mich...@databricks.com>>
Date: Saturday, 10 January 2015 3:41 am
To: Cheng Lian mailto:lian.cs@gmail.com>>
Cc: Nathan
mailto:nathan.mccar...@quantium.com.au>>,
"user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
nwrapping" processes you mentioned below).
>
>
> Now this takes around ~49 seconds… Even though test1 table is 100%
> cached. The number of partitions remains the same…
>
> Now if I create a simple RDD of a case class HourSum(hour: Int, qty:
> Double, sales: Double)
>
49 seconds… Even though test1 table is 100%
cached. The number of partitions remains the same…
Now if I create a simple RDD of a case class HourSum(hour: Int, qty:
Double, sales: Double)
Convert the SchemaRDD;
val rdd = sqlC.sql("select * from test1").map{ r =>
HourSum(r.getInt(1), r.
Any ideas? :)
From: Nathan
mailto:nathan.mccar...@quantium.com.au>>
Date: Wednesday, 7 January 2015 2:53 pm
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
mailto:user@spark.apache.org>>
Subject: SparkSQL schemaRDD & MapPartitions calls - performance
ithIndex.map(_.swap).iterator
}.reduceByKey((a,b) => (a._1 + b._1, a._2 + b._2)).collect().foreach(println)
Now this takes around ~49 seconds… Even though test1 table is 100% cached. The
number of partitions remains the same…
Now if I create a simple RDD of a case class HourSum(hour: Int, qty
Hi Michael,
On Tue, Jan 6, 2015 at 3:43 PM, Michael Armbrust
wrote:
> Oh sorry, I'm rereading your email more carefully. Its only because you
> have some setup code that you want to amortize?
>
Yes, exactly that.
Concerning the docs, I'd be happy to contribute, but I don't really
understand w
QL doesn't really have a great support for partitions in general...
> We do support for Hive TGFs though and we could possibly add better scala
> syntax for this concept or something else.
>
> On Mon, Jan 5, 2015 at 9:52 PM, Tobias Pfeiffer wrote:
>
>> Hi,
>>
>>
Pfeiffer wrote:
> Hi,
>
> I have a SchemaRDD where I want to add a column with a value that is
> computed from the rest of the row. As the computation involves a
> network operation and requires setup code, I can't use
> "SELECT *, myUDF(*) FROM rdd",
> but I w
Hi,
I have a SchemaRDD where I want to add a column with a value that is
computed from the rest of the row. As the computation involves a
network operation and requires setup code, I can't use
"SELECT *, myUDF(*) FROM rdd",
but I wanted to use a combination of:
- get schema of
schemaRDD.first,list) and see if you get
anything
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-to-RDD-String-tp20846p20910.html
Sent from the Apache Spark User List mailing list a
You might also try the following, which I think is equivalent:
schemaRDD.map(_.mkString(","))
On Wed, Dec 24, 2014 at 8:12 PM, Tobias Pfeiffer wrote:
> Hi,
>
> On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid
> wrote:
>>
>> I want to convert a schemaRDD into
Hi,
On Wed, Dec 24, 2014 at 3:18 PM, Hafiz Mujadid
wrote:
>
> I want to convert a schemaRDD into RDD of String. How can we do that?
>
> Currently I am doing like this which is not converting correctly no
> exception but resultant strings are empty
>
> here is my code
>
...@gmail.com]
*Sent:* Wednesday, December 24, 2014 4:26 AM
*To:* user@spark.apache.org
*Subject:* SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I
Hi dears!
I want to convert a schemaRDD into RDD of String. How can we do that?
Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty
here is my code
def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = {
var
@spark.apache.org
Subject: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I creat
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
the external table.
For example:
rdd_table.saveAsParquetFile("/
I'm using JDBCRDD
<https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD>
+ Hbase JDBC driver <http://phoenix.apache.org/>+ schemaRDD
<https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD>
make sure to use
Hi ,
Can someone help me , Any pointers would help.
Thanks
Subacini
On Fri, Dec 19, 2014 at 10:47 PM, Subacini B wrote:
> Hi All,
>
> Is there any API that can be used directly to write schemaRDD to HBase??
> If not, what is the best way to write schemaRDD to HBase.
>
> Thanks
> Subacini
>
Hi All,
Is there any API that can be used directly to write schemaRDD to HBase??
If not, what is the best way to write schemaRDD to HBase.
Thanks
Subacini
ect(Star(Node), 'seven.getField("mod"), 'eleven.getField("mod"))
>
> You need to import org.apache.spark.sql.catalyst.analysis.Star in advance.
>
> #2
>
> After you make the transform above, you do not need to make SchemaRDD
> manually.
> Because that jdata.select
nds up being, but it is certainly possible.
>
> On Thu, Dec 11, 2014 at 3:55 AM, nitin wrote:
>>
>> Can we take this as a performance improvement task in Spark-1.2.1? I can
>> help
>> contribute for this.
>>
>>
>>
>>
>> --
>> View
After you make the transform above, you do not need to make SchemaRDD
manually.
Because that jdata.select() return a SchemaRDD and you can operate on it
directly.
For example, the following code snippet will return a new SchemaRDD with
longer Row:
val t1 = jdata.select(Star(Node), 'seven.getFie
over for how to do this if my added value is a
> scala function, with no luck.
>
> Let's say I have a SchemaRDD with columns A, B, and C, and I want to add a
> new column, D, calculated using Utility.process(b, c), and I want (of
> course) to pass in the value B and C from each ro
nitin wrote:
>
> Can we take this as a performance improvement task in Spark-1.2.1? I can
> help
> contribute for this.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-val
(1) I understand about immutability, that's why I said I wanted a new
SchemaRDD.
(2) I specfically asked for a non-SQL solution that takes a SchemaRDD, and
results in a new SchemaRDD with one new function.
(3) The DSL stuff is a big clue, but I can't find adequate documentation
for it
ext(sc)
import sqlContext._
val d1 = sc.parallelize(1 to 10).map { i => Person(i,i+1,i+2)}
val d2 = d1.select('id, 'score, 'id + 'score)
d2.foreach(println)
2014-12-12 14:11 GMT+08:00 Nathan Kronenfeld :
> Hi, there.
>
> I'm trying to understand how to augment d
Hi, there.
I'm trying to understand how to augment data in a SchemaRDD.
I can see how to do it if can express the added values in SQL - just run
"SELECT *,valueCalculation AS newColumnName FROM table"
I've been searching all over for how to do this if my added value is a
sca
Can we take this as a performance improvement task in Spark-1.2.1? I can help
contribute for this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20623.html
Sent from the Apache Spark User List mailing
7;ve created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
>>>
>>> Jianshi
>>>
>>> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> What's the best way to
ps://issues.apache.org/jira/browse/SPARK-4782
>>
>> Jianshi
>>
>> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
>> wrote:
>>
>>> Hi,
>>>
>>> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>>>
>>> I
a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
>
> Jianshi
>
> On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
> wrote:
>
>> Hi,
>>
>> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>>
>> I'm currently converting each Ma
Hmm..
I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
wrote:
> Hi,
>
> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>
> I'm currently converting ea
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String and do
JsonRDD.inferSchema.
How about adding inferSchema support to Map[String, Any] directly? It would
be very useful.
Thanks,
--
Jianshi Huang
LinkedI
gt;
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> --
nge
before JOIN step) and improve overall performance?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350p20424.html
Sent from the Apache Spark User List mailing list a
oid the partitioning based on ID by
preprocessing it (and then cache it).
Thanks in Advance
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-tp20350.html
Sent from the Apache Spark User List mailing list a
Hi All,
My question is about lazy running mode for SchemaRDD, I guess. I know lazy
mode is good, however, I still have this demand.
For example, here is the first SchemaRDD, named result.(select * from table
where num>1 and num < 4):
results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[
gs
>> where rid = 'dd4455ee' and module = 'query' ")
>>
>> when i run filteredLogs.collect.foreach(println) , i see all of the 16GB
>> data loaded.
>>
>> How do I load only the columns used in filters first and then load the
>> payload
row matching the filter criteria?
>
> Let me know if this can be done in a different way.
>
> Thanks you,
> Vishnu.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189
Thanks! I'll give it a try.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197p20202.html
Sent from the Apache Spark User List mailing list archive at Nabbl
Jim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---
.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-tp20197.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-SQL-loading-projection-columns-tp20189.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail
>
> On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust > wrote:
>
>> You probably don't need to create a new kind of SchemaRDD. Instead I'd
>> suggest taking a look at the data sources API that we are adding in Spark
>> 1.2. There is not a ton of docu
Hi Michael,
About this new data source API, what type of data sources would it support?
Does it have to be RDBMS necessarily?
Cheers
On Sat, Nov 29, 2014 at 12:57 AM, Michael Armbrust
wrote:
> You probably don't need to create a new kind of SchemaRDD. Instead I'd
> suggest
You probably don't need to create a new kind of SchemaRDD. Instead I'd
suggest taking a look at the data sources API that we are adding in Spark
1.2. There is not a ton of documentation, but the test cases show how to
implement the various interfaces
<https://github.com/apache/spar
Hi,
I am evaluating Spark for an analytic component where we do batch
processing of data using SQL.
So, I am particularly interested in Spark SQL and in creating a SchemaRDD
from an existing API [1].
This API exposes elements in a database as datasources. Using the methods
allowed by this data
takeOrdered, etc.
On Wed, Nov 26, 2014 at 5:05 AM, Jörg Schad wrote:
> Hi,
> I have a short question regarding the compute() of an SchemaRDD.
> For SchemaRDD the actual queryExecution seems to be triggered via
> collect(), while the compute triggers only the compute() of the parent
1 - 100 of 220 matches
Mail list logo