I’m trying to run the stateful network word count at
https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/stateful_network_wordcount.py
using the command:
./bin/spark-submit
examples/src/main/python/streaming/stateful_network_wordcount.py
localhost
I am also running
I am trying to use Naive Bayes for a project of mine in Python and I want
to obtain the probability value after having built the model.
Suppose I have two classes - A and B. Currently there is an API to to find
which class a sample belongs to (predict). Now, I want to find the
probability of it be
4 4:59 AM, "Samarth Mailinglist" <
> mailinglistsama...@gmail.com> wrote:
>
>> I am trying to run a job written in python with the following command:
>>
>> bin/spark-submit --master spark://localhost:7077
>> /path/spark_solution_basic.py --py-files /path/*.py
Check this video out:
https://www.youtube.com/watch?v=dmL0N3qfSc8&list=UURzsq7k4-kT-h3TDUBQ82-w
On Mon, Nov 17, 2014 at 9:43 AM, Deep Pradhan
wrote:
> Hi,
> Is there any way to know which of my functions perform better in Spark? In
> other words, say I have achieved same thing using two differen
I am trying to run a job written in python with the following command:
bin/spark-submit --master spark://localhost:7077
/path/spark_solution_basic.py --py-files /path/*.py --files
/path/config.properties
I always get an exception that config.properties is not found:
INFO - IOError: [Errno 2] No
I was about to ask this question.
On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash wrote:
> Jeremy,
>
> Did you complete this benchmark in a way that's shareable with those
> interested here?
>
> Andrew
>
> On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>>
Instead of a file path, use a HDFS URI.
For example: (In Python)
data = sc.textFile("hdfs://localhost/user/someuser/data")
On Wed, Nov 12, 2014 at 10:12 AM, rapelly kartheek
wrote:
> Hi
>
> I am trying to access a file in HDFS from spark "source code". Basically,
> I am tweaking the spark
p[i]] = values[i]
> return json
>
> *doc_ids = data.mapPartitions(mapper)*
>
>
>
>
> On Mon, May 19, 2014 at 8:00 AM, Samarth Mailinglist <
> mailinglistsama...@gmail.com> wrote:
>
>> db = MongoClient()['spark_test_db']
>> *collec
db = MongoClient()['spark_test_db']
*collec = db['programs']*
def mapper(val):
asc = val.encode('ascii','ignore')
json = convertToJSON(asc, indexMap)
collec.insert(json) # *this is not working*
def convertToJSON(string, indexMap):
values = string.strip().split(",")
json = {}
Hi all,
I am trying to store the results of a reduce into mongo.
I want to share the variable "collection" in the mappers.
Here's what I have so far (I'm using pymongo)
db = MongoClient()['spark_test_db']
collec = db['programs']
db = MongoClient()['spark_test_db']
*collec = db['programs']*
def
10 matches
Mail list logo