RE: Using Spark like a search engine

2015-05-24 Thread ankur chauhan
Hi, I am sure you can use spark for this but it seems like a problem that should be delegated to a text based indexing technology like elastic search or something based on lucene to serve the requests. Spark can be used to prepare the data that can be fed to the indexing service. Using spark

Re: use S3-Compatible Storage with spark

2015-07-17 Thread Ankur Chauhan
The endpoint is the property you want to set. I would look at the source for that. Sent from my iPhone > On Jul 17, 2015, at 08:55, Sujit Pal wrote: > > Hi Schmirr, > > The part after the s3n:// is your bucket name and folder name, ie > s3n://${bucket_name}/${folder_name}[/${subfolder_name}]

Spark streaming and session windows

2015-08-07 Thread Ankur Chauhan
new window. Any help would be appreciated. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

[Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Ankur Chauhan
new window. Any help would be appreciated. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
spark jobs reach out to a separate hdfs/tachyon cluster. - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVGy4bAAoJEOSJAMhvLp3L5bkH/0MECyZkh3ptWzmsNnSNfGWp Oh93TUfD+foXO2ya9D+hxuyAxbjfXs/68aCWZsUT6qdlBQU9T1vX+CmPOnpY1KPN NJP3af+VK0osaFPo6k28OTql1iTnvb9Nq+WDloh

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
hdfs configuration to talk to s3 or the hdfs datanode) and the mesos slave process. Is this correct? On 31/03/2015 16:43, Haoyuan Li wrote: > Tachyon should be co-located with Spark in this case. > > Best, > > Haoyuan > > On Tue, Mar 31, 2015 at 4:30 PM, Ankur Chauhan > m

Mesos - spark task constraints

2015-04-02 Thread Ankur Chauhan
available) and then prefer other nodes. I realize that the "prefer" part may not be possible but I atleast want to start with just getting them to run only on the tachyon enabled nodes. Also, if someone could give me a pointer to the mesos scheduler code in spark that'll be great. - -

spark mesos deployment : starting workers based on attributes

2015-04-03 Thread Ankur Chauhan
behavior. Thanks! - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVHvMlAAoJEOSJAMhvLp3LaV0H/jtX+KQDyorUESLIKIxFV9KM QjyPtVquwuZYcwLqCfQbo62RgE/LeTjjxzifTzMM5D6cf4ULBH1TcS3Is2EdOhSm UTMfJyvK06VFvYMLiGjqN4sBG3DFdamQif18qUJoKXX/Z9cUQO9SaSjIezSq2gd8

Re: spark mesos deployment : starting workers based on attributes

2015-04-03 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Thanks! I'll add the JIRA. I'll also try to work on a patch this weekend . - -- Ankur Chauhan On 03/04/2015 13:23, Tim Chen wrote: > Hi Ankur, > > There isn't a way to do that yet, but it's simple to add. >

Re: spark mesos deployment : starting workers based on attributes

2015-04-04 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Created issue: https://issues.apache.org/jira/browse/SPARK-6707 I would really appreciate ideas/views/opinions on this feature. - -- Ankur Chauhan On 03/04/2015 13:23, Tim Chen wrote: > Hi Ankur, > > There isn't a way to do that

Spark + Mesos + HDFS resource split

2015-04-27 Thread Ankur Chauhan
(or EC2 instance or VM). My question is: What is the recommended resource splitting? How much memory and CPU should I preallocate for HDFS and how much should I set aside as allocatable by mesos? In addition, is there some rule-of-thumb recommendation around this? - -- Ankur Chauhan -BEGIN PGP

Nightly builds/releases?

2015-05-04 Thread Ankur Chauhan
Hi, Does anyone know if spark has any nightly builds or equivalent that provides binaries that have passed a CI build so that one could try out the bleeding edge without having to compile. -- Ankur signature.asc Description: Message signed with OpenPGP using GPGMail

Re: Nightly builds/releases?

2015-05-04 Thread Ankur Chauhan
owse/SPARK-1517 > > > > >> On May 4, 2015, at 10:25 PM, Ankur Chauhan wrote: >> >> Hi, >> >> Does anyone know if spark has any nightly builds or equivalent that provides >> binaries that have passed a CI build so that one could try out the bl

Re: history server

2015-05-07 Thread Ankur Chauhan
Hi, Sorry this may be a little off topic but I tried searching for docs on history server but couldn't really find much. Can someone point me to a doc or give me a point of reference for the use and intent of a history server? -- Ankur > On 7 May 2015, at 12:06, Koert Kuipers wrote: > > got

Spark streaming updating a large window more frequently

2015-05-08 Thread Ankur Chauhan
dating it every 5 seconds". I would really appreciate pointers to code samples or some blogs that could help me identify best practices. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

kafka + Spark Streaming with checkPointing fails to start with

2015-05-12 Thread Ankur Chauhan
usercontent.com/ankurcha/f35df63f0d8a99da0be4/raw/ec96b932540ac87577e4ce8385d26699c1a7d05e/spark-console.log Could someone tell me what it causes this problem? I tried looking at the stacktrace but I am not very familiar with the codebase to make solid assertions. Any ideas as to what may be happening here. --- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread Ankur Chauhan
c9 6b932540ac87577e4ce8385d26699c1a7d05e/spark-console.log Could someone tell me what it causes this problem? I tried looking at the stacktrace but I am not very familiar with the codebase to make solid assertions. Any ideas as to what may be happening here. - --- Ankur

Re: how to monitor multi directories in spark streaming task

2015-05-13 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I would suggest creating one DStream per directory and then using StreamingContext#union(...) to get a union DStream. - -- Ankur On 13/05/2015 00:53, hotdog wrote: > I want to use use fileStream in spark streaming to monitor multi > hdfs directories,

kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread Ankur Chauhan
c9 6b932540ac87577e4ce8385d26699c1a7d05e/spark-console.log Could someone tell me what it causes this problem? I tried looking at the stacktrace but I am not very familiar with the codebase to make solid assertions. Any ideas as to what may be happening here. - --- Ankur Chauhan -BEGIN PGP

Re: how to monitor multi directories in spark streaming task

2015-05-13 Thread Ankur Chauhan
2015/05/12/data.txt > /user/root/2015/05/13/data.txt > > like this. > > and one new directory one day. > > how to create the new DStream for tomorrow’s new > directory(/user/root/2015/05/13/) ?? > > >> 在 2015年5月13日,下午4:59,Ankur Chauhan 写道: >>

data schema and serialization format suggestions

2015-05-13 Thread Ankur Chauhan
this problem. At the high level, the requirements are fairly simple: 1. Simple and easy to understand and extend. 2. Usable in places other than spark. ( I would want to use them in other applications and tools ). 3. Ability to play nice with parquet and Kafka (nice to have). - -- Ankur Ch

Re: kafka + Spark Streaming with checkPointing fails to restart

2015-05-14 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks everyone, that was the problem. the "create new streaming context" function was supposed to setup the stream processing as well as the checkpoint directory. I had missed the whole process of checkpoint setup. With that done, everything works as

Spark on Mesos vs Yarn

2015-05-14 Thread Ankur Chauhan
were equally important, but has this changed as spark has now reached almost 1.4.0 stage? - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVVZKGAAoJEOSJAMhvLp3L0vEIAI4edLB2rMGk+OTI4WujxX6k Ud5NyFUpaQ8WDjOhwcWB9RK5EoM7X3wGzRcGza1HLVnvdSUBG8Ltabt47GsP2lo0 7H9y2GluUZg/RJXbN0Ehp6moW

Re: Spark on Mesos vs Yarn

2015-05-15 Thread Ankur Chauhan
e some links to the JIRA and pull requests so that I can keep track on the progress/features. Again, thanks for replying. - -- Ankur Chauhan On 15/05/2015 00:39, Tim Chen wrote: > Hi Ankur, > > This is a great question as I've heard similar concerns about Spark > on Mesos. &g