Yes, I got it working once but I can't exactly remember how.
I think what I did was the following:
· To the environment variables, add a variable named PYTHONPATH with
the path to your pyspark python directory (in my case,
C:\spark-2.1.0-bin-hadoop2.7\python)
· To the environmen
I am commiting offsets to Kafka after my output has been stored, using the
commitAsync API.
My question is if I increase/decrease the number of kafka partitions, will
the saved offsets will become invalid.
Thanks
hello all, I am building Spark1.6.2 and I meet a problem when doing mvn test
The command is mvn -e -Pyarn -Phive -Phive-thriftserver
-DwildcardSuites=org.apache.spark.serializer.ProactiveClosureSerializationSuite
test
and the test error is
ProactiveClosureSerializationSuite:
- throws expected
Through the JDBC connection spark thriftserver, execte hive SQL, check
whether the table read or write permission to expand hook in hive on spark,
you can control permissions, spark on hive what is the point of expansion?
Anyone have this working - either in 1.X or 2.X?
thanks
Try to use --packages to include the jars. From error it seems it's looking
for main class in jars but u r running a python script...
On 25 Feb 2017 10:36 pm, "Raymond Xie" wrote:
That's right Anahita, however, the class name is not indicated in the
original github project so I don't know wh
Thank you very much Marco,
I am a beginner in this area, is it possible for you to show me what you
think the right script should be to get it executed in terminal?
**
*Sincerely yours,*
*Raymond*
On Sat, Feb 25, 2017 at 6:00 PM, Marco Mistroni
That's right Anahita, however, the class name is not indicated in the
original github project so I don't know what class should be used here. The
github only says:
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar
\
exa
You're welcome.
You need to specify the class. I meant like that:
spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"
On Saturday, February 25, 2017, Raymond Xie wrote:
> Thank you, it is still not
Thank you, it is still not working:
[image: Inline image 1]
By the way, here is the original source:
https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py
**
*Sincerely yours,*
*Raymond*
On Sat, Feb
Hi,
I think if you remove --jars, it will work. Like:
spark-submit /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.
0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
I had the same problem before and solved it by removing --jars.
Cheers,
Anahita
On Saturday, February 25, 2017, Raymond Xie wrot
You should read (again?) the Spark documentation about submitting an
application: http://spark.apache.org/docs/latest/submitting-applications.html
Try with the Pi computation example available with Spark.
For example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/sp
I am doing a spark streaming on a hortonworks sandbox and am stuck here
now, can anyone tell me what's wrong with the following code and the
exception it causes and how do I fix it? Thank you very much in advance.
spark-submit --jars
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-124
Hello,
I'm trying to run pyspark using the following setup:
- spark 1.6.1 standalone cluster on ec2
- virtualenv installed on master
- app is run using the following command:
export PYSPARK_DRIVER_PYTHON=/path_to_virtualenv/bin/python
export PYSPARK_PYTHON=/usr/bin/python
/root/spark/bin/spark-
I am reading in a single small file from hadoop with wholeText. If I
process each line and create a row with two cells, the first cell equal
to the name of the file, the second cell equal to the line. That code
runs fine.
But if I just add two line of code and change the first cell based on
p
On 24 Feb 2017, at 07:47, Femi Anthony
mailto:femib...@gmail.com>> wrote:
Have you tried reading using s3n which is a slightly older protocol ? I'm not
sure how compatible s3a is with older versions of Spark.
I would absolutely not use s3n with a 1.2 GB file.
There is a WONTFIX JIRA on how it
Hi, I think you are using the local model of Spark. There are mainly four models, which are local, standalone, yarn and Mesos. Also, "blocks" is relative to hdfs, "partitions" is relative to spark.liangyihuai---Original---From: "Jacek Laskowski "Date: 2017/2/25 02:45:20To: "prithish";Cc: "user";Su
One of the ways of ingesting data into HDFS is to use Spark JDBC connection
to connect to soured and ingest data into the underlying files or Hive
tables.
One question has come out is under controlled test conditions what would
the measurements of io, cpu etc across the cluster.
Assuming not usin
18 matches
Mail list logo