Could you post your code what you are using for streaming context !
On Mon, Mar 28, 2016 at 10:31 AM, lokeshkumar wrote:
> Hi forum
>
> For some reason if I include a twitter receiver and start the streaming
> context, I get the below exception not sure why
> Can someone let me know if anyone ha
Hi Ted
I changed
def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
to
def customable(partitioner: Partitioner): MyRDD[K, V] = self.withScope {
after rebuilding the whole spark project(Since it takes long time, I didn't
do as you told at first), it also works.
Thnaks
On Mon,
Hi forum
For some reason if I include a twitter receiver and start the streaming
context, I get the below exception not sure why
Can someone let me know if anyone has already encountered this issue or am I
doing something wrong?
java.lang.ArithmeticException: / by zero
at org.apache.spark
hmm, I finally found that the following syntax works better :
scala> val mappingFunction = (key: String, value: Option[Int], state:
State[Int]) => {
| Option(key)
| }
mappingFunction: (String, Option[Int], org.apache.spark.streaming.State[Int])
=> Option[String] =
scala> val spec = S
Hi,
I'm testing a sample code and I get this error. Here is the sample code I use :
scala> import org.apache.spark.streaming._
import org.apache.spark.streaming._
scala> def mappingFunction(key: String, value: Option[Int], state: State[Int]):
Option[String] = {
|Option(key)
| }
ma
Hi Wenchao,
I use steps described in the page and it works great, you can have a try:)
http://danielnee.com/2015/01/setting-up-intellij-for-spark/
On Mon, Mar 28, 2016 at 9:38 AM, 吴文超 wrote:
> for the simplest word count,
> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word =
For now they are for free
Sent from my Samsung Galaxy smartphone.
Original message
From: Sage Meng
Date: 3/28/2016 04:14 (GMT+02:00)
To: Raymond Honderdors
Cc: mich.talebza...@gmail.com, user@spark.apache.org
Subject: Re: Does SparkSql has official jdbc/odbc driver?
Hi Ra
Thanks very much Ted
I added MyRDD.scala to the spark source code and rebuilt the whole spark
project, using myrdd.asInstanceOf[MyRDD] doesn't work. It seems that MyRDD
is not exposed to the spark-shell.
Finally I write a seperate spark application and add the MyRDD.scala to the
project then the
I run PySpark with CSV support like so: IPYTHON=1 pyspark --packages
com.databricks:spark-csv_2.10:1.4.0
I don't want to type this --packages argument each time. Is there a config
item for --packages? I can't find one in the reference at
http://spark.apache.org/docs/latest/configuration.html
If t
Hi All,
I'm running into an error that's not making a lot of sense to me, and
couldn't find sufficient info on the web to answer it myself.
BTW, you can also reply on Stack Overflow:
http://stackoverflow.com/questions/36254005/nosuchelementexception-in-chisqselector-fit-method-version-1-6-0
I'v
for the simplest word count,
val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey((a, b) => a + b)
In the idea IDE, it shows reduceByKey can not be resolved.
Maybe it is the problem of version incompatible。What do you use to construct
the projec
t?
Hi Raymond Honderdors,
are odbc/jdbc drivers for spark sql from databricks free, or the drivers
from databricks can only be used on databricks's spark-sql release?
2016-03-25 17:48 GMT+08:00 Raymond Honderdors :
> Recommended drivers for spark / thrift are the once from databricks (simba)
>
>
It turned out that there is a conflict for this within the Spark
environment. This error got resolved after I shaded out org.iq80.snappy
via the maven shade plugin. Just posting so others may benefit.
On Sun, Mar 20, 2016 at 11:40 PM, Akhil Das
wrote:
> Looks like a jar conflict, could you p
Could you, pls share your code, so that I could try it.
--
Be well!
Jean Morozov
On Sun, Mar 27, 2016 at 5:20 PM, 吴文超 wrote:
> I am a newbie to spark, when I use IntelliJ idea to write some scala code,
> i found it reports error when using spark's implicit conversion.e.g. whe
> use the RDD as P
Hi,
A while back I was looking for functional programming to filter out
transactions older > n months etc.
This turned out to be pretty easy.
I get today's day as follows
var today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
'-MM-dd') ").collect.apply(0).getString(0)
CSV data
Thanks guys. But the issue seems orthogonal to what output committer is
used, no?
When writing out a dataframe as parquet, does the job recover if one task
crashes mid-way, leaving a half written file? What we observe is that when
the task is re-tried, it tries to open a "new" file of the same nam
My interpretation is that variable myrdd is of type RDD to REPL, though it
was an instance of MyRDD.
Using asInstanceOf in spark-shell should allow you to call your custom
method.
Here is declaration of RDD:
abstract class RDD[T: ClassTag](
You can extend RDD and include your custom logic in th
Well, passing state between custom methods is trickier. But why don't you merge
both methods into one and then no need to pass state.
--
Alexander
aka Six-Hat-Thinker
> On 27 Mar 2016, at 19:24, Tenghuan He wrote:
>
> Hi Alexander,
> Thanks for your reply
>
> In the custom rdd, there are some
Hi Alexander,
Thanks for your reply
In the custom rdd, there are some fields I have defined so that both custom
method and compute method can see and operate them, can the method in
implicit class implement that?
On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin wrote:
> Extending breaks ch
Thanks Ted,
but I have a doubt that as the code above (line 4) in the spark-shell
shows myrdd is already a MyRDD, does that not make sense?
1 scala> val part = new org.apache.spark.HashPartitioner(10)
2 scala> val baseRDD = sc.parallelize(1 to 10).map(x => (x,
"hello")).partitionBy(part).ca
Extending breaks chaining and not nice. I think it is much better to write
implicit class with extra methods. This way you add new methods without
touching hierarchy at all i.e.
object RddFunctions {
implicit class RddFunctionsImplicit[T](rdd: RDD[T]) {
/***
* Cache RDD and name it in o
bq. def customable(partitioner: Partitioner): RDD[(K, V)] =
self.withScope {
In above, you declare return type as RDD. While you actually intended to
declare MyRDD as the return type.
Or, you can cast myrdd as MyRDD in spark-shell.
BTW I don't think it is good practice to add custom method to b
Hi Ted,
The codes are running in spark-shell
scala> val part = new org.apache.spark.HashPartitioner(10)
scala> val baseRDD = sc.parallelize(1 to 10).map(x => (x,
"hello")).partitionBy(part).cache()
scala> val myrdd = baseRDD.customable(part) // here customable is a method
added to the abstra
Can you show the full stack trace (or top 10 lines) and the snippet using
your MyRDD ?
Thanks
On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote:
> Hi everyone,
>
> I am creating a custom RDD which extends RDD and add a custom method,
> however the custom method cannot be found.
> The
Hi everyone,
I am creating a custom RDD which extends RDD and add a custom method,
however the custom method cannot be found.
The custom RDD looks like the following:
class MyRDD[K, V](
var base: RDD[(K, V)],
part: Partitioner
) extends RDD[(K, V)](base.context, Nil) {
def *
I am a newbie to spark, when I use IntelliJ idea to write some scala code, i
found it reports error when using spark's implicit conversion.e.g. whe use the
RDD as Pair RDD to get reduceByKey function. However, the project can run
normally in the cluster.
As somebody says it needs import org.apac
Pretty simple as usual it is a combination of ETL and ELT.
Basically csv files are loaded into staging directory on host, compressed
before pushing into hdfs
1. ETL --> Get rid of the header blank line on the csv files
2. ETL --> Compress the csv files
3. ETL --> Put the compressed CVF
Please take a look at the MyRDD class in:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
There is scaladoc for the class. See how getPreferredLocations() is
implemented.
Cheers
On Sun, Mar 27, 2016 at 2:01 AM, chenyong wrote:
> Thank you Ted for your reply.
>
> Your ex
Hello all,
We were using the old "Artificial Neural Network" :
https://github.com/apache/spark/pull/1290
This code appears to have been incorporated in 1.5.2 but it's only exposed
publicly via the MultilayerPerceptronClassifier. Is there a way to use the
old feedforward/backprop non-classificatio
Hi,
I have DF of user events from log file and I'm trying to construct sessions
of same user. Session is list of events from same user, where time
difference between two consecutive events (when sorted by time) within the
session isn't greater than 30 minutes.
Currently I'm using DF's Lag functio
Hi Eric and Michael:
I run into this problem with Spark 1.4.1 too. The error stack is:
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$
at
org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.sc
Thank you Ted for your reply.Your example code is little bit difficult for me
to understanding. Could youn please give me an explaination about it.
From: cy...@hotmail.com
To: user@spark.apache.org
Subject: whether a certain piece can be assigned to a specicified node by some
codes in my program
[+spark list again] (I did not want to send "commercial spam" to the list :-))
The reduce function for CMSs is element-wise addition, and the reverse
reduce function is element-wise subtraction.
The heavy hitters list does not have monoid addition, but you can
cheat. I suggest creating a heavy hi
33 matches
Mail list logo