Re: Using Spark like a search engine

2015-05-24 Thread ayan guha
Yes, spark will be useful for following areas of your application: 1. Running same function on every CV in parallel and score 2. Improve scoring function by better access to classification and clustering algorithms, within and beyond mllib. These are first benefits you can start with and then thin

Intellij IDEA import spark souce code error

2015-05-24 Thread huangzheng
Hi all I want to learn spark source code recently. Git clone spark code from git , and exec sbt gen-idea command . import the project into intellij , have such error below: Anyone could help me? Spark version is 1.4, operation system is windows 7

The stage slow when I have for loop inside (Java)

2015-05-24 Thread allanjie
Hi all, I only have one stage which is "mapToPair" and inside the function, I have a for loop which will do about 133433 times. But then it becomes slow, when I replace 133433 with just 133, it works very fast. But I think this is just a simple operation even in normal Java. You can look at th

RE: Using Spark like a search engine

2015-05-24 Thread ankur chauhan
Hi, I am sure you can use spark for this but it seems like a problem that should be delegated to a text based indexing technology like elastic search or something based on lucene to serve the requests. Spark can be used to prepare the data that can be fed to the indexing service. Using spark

Re: Re: how to distributed run a bash shell in spark

2015-05-24 Thread madhu phatak
Hi, You can use pipe operator, if you are running shell script/perl script on some data. More information on my blog . Regards, Madhukara Phatak http://datamantra.io/ On Mon, May 25, 2015 at 8:02 AM, wrote: > Thanks Akhil, > >your c

Using Spark like a search engine

2015-05-24 Thread Сергей Мелехин
HI! We are developing scoring system for recruitment. Recruiter enters vacancy requirements, and we score tens of thousands of CVs to this requirements, and return e.g. top 10 matches. We do not use fulltext search and sometimes even dont filter input CVs prior to scoring (some vacancies do not hav

RE: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-24 Thread Cheng, Hao
Thanks for reporting this. We intend to support the multiple metastore versions in a single build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re hitting the bug, please file a jira issue for this. I will keep investigating on this also. Hao From: Mark Hamstra [mail

Re: How to use zookeeper in Spark Streaming

2015-05-24 Thread Ted Yu
I think the Zookeeper watcher code should reside in task code. Haven't found guide on this subject so far. Cheers On Sun, May 24, 2015 at 7:15 PM, bit1...@163.com wrote: > Can someone please help me on this? > > -- > bit1...@163.com > > > *发件人:* bit1...@163.com > *发

回复:Re: how to distributed run a bash shell in spark

2015-05-24 Thread luohui20001
Thanks Akhil, your code is a big help to me,'cause perl script is the exactly thing i wanna try to run in spark. I will have a try. Thanks&Best regards! San.Luo - 原始邮件 - 发件人:Akhil Das 收件人:罗辉 抄送人:user 主题:Re: how to distributed run a bas

回复: How to use zookeeper in Spark Streaming

2015-05-24 Thread bit1...@163.com
Can someone please help me on this? bit1...@163.com 发件人: bit1...@163.com 发送时间: 2015-05-24 13:53 收件人: user 主题: How to use zookeeper in Spark Streaming Hi, In my spark streaming application, when the application starts and get running, the Tasks running on the Worker nodes need to be notified

Powered by Spark listing

2015-05-24 Thread Michael Roberts
Information Innovators, Inc. http://www.iiinfo.com/ Spark, Spark Streaming, Spark SQL, MLLib Developing data analytics systems for federal healthcare, national defense and other programs using Spark on YARN. -- This page tracks the users of Spark. To add yourself to the list, please email user@spa

Re: Doubts about SparkSQL

2015-05-24 Thread Renato Marroquín Mogrovejo
Hi all, Many thanks for all responses, but I think I just pressed enter too quickly without explaining correctly what I meant. What I mean is if the optimizer is able to optimize the processing if an inner query contains a blocking operator like an aggregation and if it knows the partitioning sch

Help optimizing some spark code

2015-05-24 Thread Tal
Hi, I'm running this piece of code in my program: smallRdd.join(largeRdd) .groupBy { case (id, (_, X(a, _, _))) => a } .map { case (a, iterable) => a-> iterable.size } .sortBy({ case (_, count) => count }, ascending = false) .take(k) where basically smallRdd is an rd

Re: Trying to connect to many topics with several DirectConnect

2015-05-24 Thread Akhil Das
I used to hit a NPE when i don't add all the dependency jars to my context while running it in standalone mode. Can you try adding all these dependencies to your context? sc.addJar("/home/akhld/.ivy2/cache/org.apache.spark/spark-streaming-kafka_2.10/jars/spark-streaming-kafka_2.10-1.3.1.jar")

Re: how to distributed run a bash shell in spark

2015-05-24 Thread Akhil Das
You mean you want to execute some shell commands from spark? Here's something i tried a while back. https://github.com/akhld/spark-exploit Thanks Best Regards On Sun, May 24, 2015 at 4:53 PM, wrote: > hello there > > I am trying to run a app in which part of it needs to run a > shell.how

Re: Spark Streaming - Design considerations/Knobs

2015-05-24 Thread Maiti, Samya
Really good list to brush up basics. Just one input, regarding * An RDD's processing is scheduled by driver's jobscheduler as a job. At a given point of time only one job is active. So, if one job is executing the other jobs are queued. We can have multiple jobs running in a given applicat

Re: Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread Joe Wass
This may sound like an obvious question, but are you sure that the program is doing any work when you don't have a saveAsTextFile? If there are transformations but no actions to actually collect the data, there's no need for Spark to execute the transformations. As to the question of 'is this taki

Re: Strange ClassNotFound exeption

2015-05-24 Thread Ted Yu
Can you pastebin the class path ? Thanks > On May 24, 2015, at 5:02 AM, boci wrote: > > Yeah, I have same jar with same result, I run in docker container and I using > same docker container with my another project... the only difference is the > postgresql jdbc driver and the custom RDD...

Re: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-24 Thread Mark Hamstra
This discussion belongs on the dev list. Please post any replies there. On Sat, May 23, 2015 at 10:19 PM, Cheolsoo Park wrote: > Hi, > > I've been testing SparkSQL in 1.4 rc and found two issues. I wanted to > confirm whether these are bugs or not before opening a jira. > > *1)* I can no longer

Re: Strange ClassNotFound exeption

2015-05-24 Thread boci
Yeah, I have same jar with same result, I run in docker container and I using same docker container with my another project... the only difference is the postgresql jdbc driver and the custom RDD... no additional dependencies (both single jar generated with same assembly configuration with same dep

how to distributed run a bash shell in spark

2015-05-24 Thread luohui20001
hello there I am trying to run a app in which part of it needs to run a shell.how to run a shell distributed in spark cluster.thanks. here's my code:import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.

Spark dramatically slow when I add "saveAsTextFile"

2015-05-24 Thread allanjie
*Problem Description*: The program running in stand-alone spark cluster (1 master, 6 workers with 8g ram and 2 cores). Input: a 468MB file with 133433 records stored in HDFS. Output: just 2MB file will stored in HDFS The program has two map operations and one reduceByKey operation. Finally I save

Re: Spark Streaming - Design considerations/Knobs

2015-05-24 Thread Tathagata Das
Blocks are replicated immediately, before the driver launches any jobs using them. On Thu, May 21, 2015 at 2:05 AM, Hemant Bhanawat wrote: > Honestly, given the length of my email, I didn't expect a reply. :-) > Thanks for reading and replying. However, I have a follow-up question: > > I don't t