Re: custom rdd - do I need a hadoop input format?

2019-09-18 Thread Marcelo Valle
To implement a custom RDD with getPartitions, I have to extend `NewHadoopRDD` informing the hadoop input format class, right? What input format could I inform so the file won't be read all at once and my getPartitions method could split by block? On Tue, 17 Sep 2019 at 18:53, Arun Maha

Re: custom rdd - do I need a hadoop input format?

2019-09-17 Thread Arun Mahadevan
You can do it with custom RDD implementation. You will mainly implement "getPartitions" - the logic to split your input into partitions and "compute" to compute and return the values from the executors. On Tue, 17 Sep 2019 at 08:47, Marcelo Valle wrote: > Just to

Re: custom rdd - do I need a hadoop input format?

2019-09-17 Thread Marcelo Valle
with spark On Tue, 17 Sep 2019 at 16:28, Marcelo Valle wrote: > Hi, > > I want to create a custom RDD which will read n lines in sequence from a > file, which I call a block, and each block should be converted to a spark > dataframe to be processed in parallel. > > Qu

custom rdd - do I need a hadoop input format?

2019-09-17 Thread Marcelo Valle
Hi, I want to create a custom RDD which will read n lines in sequence from a file, which I call a block, and each block should be converted to a spark dataframe to be processed in parallel. Question - do I have to implement a custom hadoop input format to achieve this? Or is it possible to do it

Re: Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-04 Thread Pedro Rodriguez
TaskMetrics#inputMetrics by yourself. // maropu On Mon, Jul 4, 2016 at 11:46 AM, Pedro Rodriguez wrote: Hi All, I noticed on some Spark jobs it shows you input/output read size. I am implementing a custom RDD which reads files and would like to report these metrics to Spark since they are available

Re: Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-03 Thread Takeshi Yamamuro
How about using `SparkListener`? You can collect IO statistics thru TaskMetrics#inputMetrics by yourself. // maropu On Mon, Jul 4, 2016 at 11:46 AM, Pedro Rodriguez wrote: > Hi All, > > I noticed on some Spark jobs it shows you input/output read size. I am > implementing a cust

Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-03 Thread Pedro Rodriguez
Hi All, I noticed on some Spark jobs it shows you input/output read size. I am implementing a custom RDD which reads files and would like to report these metrics to Spark since they are available to me. I looked through the RDD source code and a couple different implementations and the best I

Re: Custom RDD in spark, cannot find custom method

2016-03-28 Thread Ted Yu
oject then the custom method can be called in the main function and it >> works. >> I misunderstand the usage of custom rdd, the custom rdd does not have to be >> written to the spark project like UnionRDD, CogroupedRDD, and just add it to >> your own project. >>

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
inally I write a seperate spark application and add the MyRDD.scala to > the project then the custom method can be called in the main function and > it works. > I misunderstand the usage of custom rdd, the custom rdd does not have to > be written to the spark project like UnionRDD,

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
n the custom method can be called in the main function and it works. I misunderstand the usage of custom rdd, the custom rdd does not have to be written to the spark project like UnionRDD, CogroupedRDD, and just add it to your own project. On Mon, Mar 28, 2016 at 4:28 AM, Ted Yu wrote:

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
t;>> and the customable method in PairRDDFunctions.scala is >>> >>> def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope >>> { >>> new MyRDD[K, V](self, partitioner) >>> } >>> >>> Thanks:) >>> >

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
Well, passing state between custom methods is trickier. But why don't you merge both methods into one and then no need to pass state. -- Alexander aka Six-Hat-Thinker > On 27 Mar 2016, at 19:24, Tenghuan He wrote: > > Hi Alexander, > Thanks for your reply > > In th

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
Hi Alexander, Thanks for your reply In the custom rdd, there are some fields I have defined so that both custom method and compute method can see and operate them, can the method in implicit class implement that? On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin wrote: > Extending bre

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
2016 at 12:28 AM, Ted Yu wrote: >> >>> Can you show the full stack trace (or top 10 lines) and the snippet >>> using your MyRDD ? >>> >>> Thanks >>> >>> On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He >>> wrote: >>> >&

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
new MyRDD[K, V](self, partitioner) >> } >> >> Thanks:) >> >> On Mon, Mar 28, 2016 at 12:28 AM, Ted Yu wrote: >> >>> Can you show the full stack trace (or top 10 lines) and the snippet >>> using your MyRDD ? >>> >>> Thanks

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
; Thanks:) > > On Mon, Mar 28, 2016 at 12:28 AM, Ted Yu wrote: > >> Can you show the full stack trace (or top 10 lines) and the snippet using >> your MyRDD ? >> >> Thanks >> >> On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He >> wrote: >> >&

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
s > > On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote: > >> ​Hi everyone, >> >> I am creating a custom RDD which extends RDD and add a custom method, >> however the custom method cannot be found. >> The custom RDD looks like the following: >

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
Can you show the full stack trace (or top 10 lines) and the snippet using your MyRDD ? Thanks On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote: > ​Hi everyone, > > I am creating a custom RDD which extends RDD and add a custom method, > however the custom method cannot be foun

Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
​Hi everyone, I am creating a custom RDD which extends RDD and add a custom method, however the custom method cannot be found. The custom RDD looks like the following: class MyRDD[K, V]( var base: RDD[(K, V)], part: Partitioner ) extends RDD[(K, V)](base.context, Nil) { def

Custom RDD for Proprietary MPP database

2015-10-05 Thread VJ Anand
Hi, I need to build a RDD that supports a custom built Database (Which is sharded) across several nodes. I need to build an RDD that can support and provide the partitions specific to this database. I would like to do this in Java - I see there are JavaRDD, and other specific RDD available - my qu

Re: custom RDD in java

2015-07-01 Thread Feynman Liang
Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Sure, you can create custom RDDs. Haven’t done so in Java, but in >>> Scala absolutely. >>> >>> From: Shushant Arora >>> Date: Wednesday, July 1, 2015 at 1:44 PM &

Re: custom RDD in java

2015-07-01 Thread Shushant Arora
>> >> From: Shushant Arora >> Date: Wednesday, July 1, 2015 at 1:44 PM >> To: Silvio Fiorito >> Cc: user >> Subject: Re: custom RDD in java >> >> ok..will evaluate these options but is it possible to create RDD in >> java? >> >&g

Re: custom RDD in java

2015-07-01 Thread Feynman Liang
or...@granturing.com> wrote: > Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala > absolutely. > > From: Shushant Arora > Date: Wednesday, July 1, 2015 at 1:44 PM > To: Silvio Fiorito > Cc: user > Subject: Re: custom RDD in java > > ok..wil

Re: custom RDD in java

2015-07-01 Thread Silvio Fiorito
Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala absolutely. From: Shushant Arora Date: Wednesday, July 1, 2015 at 1:44 PM To: Silvio Fiorito Cc: user Subject: Re: custom RDD in java ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1

Re: custom RDD in java

2015-07-01 Thread Shushant Arora
if you need to run this in Spark could you just use the > existing JdbcRDD? > > > From: Shushant Arora > Date: Wednesday, July 1, 2015 at 10:19 AM > To: user > Subject: custom RDD in java > > Hi > > Is it possible to write custom RDD in java? > > Requiremen

Re: custom RDD in java

2015-07-01 Thread Silvio Fiorito
If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it

custom RDD in java

2015-07-01 Thread Shushant Arora
Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a List tables = {dbname.tablename,dbname.tablename2..}; then JavaRDD rdd = javasparkcontext.parllelise(tables); JavaRDDString> tablecont

Re: Implementing custom RDD in Java

2015-05-26 Thread Alex Robbins
bcRDD but in Java. > > I am looking to do something similar to what they have done here: > https://github.com/lagerspetz/TimeSeriesSpark/blob/master/src/spark/timeseries/dynamodb/DynamoDbRDD.scala. > This one reads data from Dynamo, my custom RDD would query DynamoDB for the > S3 file key

Re: Re: is there any easier way to define a custom RDD in Java

2015-05-25 Thread Ted Yu
ew this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-easier-way-to-define-a-custom-RDD-in-Java-tp6917p23027.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: Re: is there any easier way to define a custom RDD in Java

2015-05-25 Thread swaranga
Has this changed now? Can a new RDD be implemented in Java? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-easier-way-to-define-a-custom-RDD-in-Java-tp6917p23027.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Implementing custom RDD in Java

2015-05-25 Thread Swaranga Sarma
series/dynamodb/DynamoDbRDD.scala. This one reads data from Dynamo, my custom RDD would query DynamoDB for the S3 file keys, and then load them from S3. On Mon, May 25, 2015 at 8:19 PM, Alex Robbins wrote: > If a Hadoop InputFormat already exists for your data source, you can load > it from there. Oth

Implementing custom RDD in Java

2015-05-25 Thread swaranga
and could not find any resources. Any pointers? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-custom-RDD-in-Java-tp23026.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Implementing custom RDD in Java

2015-05-25 Thread Swaranga Sarma
Hello, I have a custom data source and I want to load the data into Spark to perform some computations. For this I see that I might need to implement a new RDD for my data source. I am a complete Scala noob and I am hoping that I can implement the RDD in Java only. I looked around the internet an

Cluster Aware Custom RDD

2015-01-16 Thread Jim Carroll
Hello all, I have a custom RDD for fast loading of data from a non-partitioned source. The partitioning happens in the RDD implementation by pushing data from the source into queues picked up by the current active partitions in worker threads. This works great on a multi-threaded single host

Re: Re: is there any easier way to define a custom RDD in Java

2014-06-04 Thread bluejoe2008
easier way to define a custom RDD in Java Hey There, This is only possible in Scala right now. However, this is almost never needed since the core API is fairly flexible. I have the same question as Andrew... what are you trying to do with your RDD? - Patrick On Wed, Jun 4, 2014 at 7:49 AM, Andrew Ash

Re: is there any easier way to define a custom RDD in Java

2014-06-04 Thread Patrick Wendell
do you want your custom RDD to do that the normal ones > don't? > > > On Wed, Jun 4, 2014 at 6:30 AM, bluejoe2008 wrote: >> >> hi, folks, >> is there any easier way to define a custom RDD in Java? >> I am wondering if I have to define a new java cl

Re: is there any easier way to define a custom RDD in Java

2014-06-04 Thread Andrew Ash
Just curious, what do you want your custom RDD to do that the normal ones don't? On Wed, Jun 4, 2014 at 6:30 AM, bluejoe2008 wrote: > hi, folks, > is there any easier way to define a custom RDD in Java? > I am wondering if I have to define a new java class which exte

is there any easier way to define a custom RDD in Java

2014-06-04 Thread bluejoe2008
hi, folks, is there any easier way to define a custom RDD in Java? I am wondering if I have to define a new java class which extends RDD from scratch? It is really a hard job for developers! 2014-06-04 bluejoe2008

Re: Custom RDD

2014-03-10 Thread Prashant Sharma
Hi David, There are many implementations of RDD available in org.apache.spark. All you have to do is implement RDD class. Ofcourse this is not possible from java AFAIK. Prashant Sharma On Tue, Mar 11, 2014 at 1:00 AM, David Thomas wrote: > Is there any guide available on creating a cus

Re: Custom RDD

2014-03-10 Thread Mayur Rustagi
copy paste? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Mon, Mar 10, 2014 at 12:30 PM, David Thomas wrote: > Is there any guide available on creating a custom RDD? >

Custom RDD

2014-03-10 Thread David Thomas
Is there any guide available on creating a custom RDD?