Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-12 Thread Haoyuan Li
This link should be helpful: https://alluxio.org/docs/1.7/en/Running-Spark-on-Alluxio.html Best regards, Haoyuan (HY) alluxio.com | alluxio.org | powered by Alluxio On Thu, Apr 12, 2018 at 6:32 PM, jb44 wrote: > I'm runn

Re: Writing files to s3 with out temporary directory

2017-11-22 Thread Haoyuan Li
This blog / tutorial maybe helpful to run Spark in the Cloud with Alluxio. Best regards, Haoyuan On Mon, Nov 20, 2017 at 2:12 PM, lucas.g...@gmail.com wrote: > That sounds like allot of work and if I understand you correctly it

Re: How to keep RDDs in memory between two different batch jobs?

2015-07-22 Thread Haoyuan Li
he Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Haoyuan Li CEO, Tachyon Nexus <http://www.tachyonnexus.com/>

Re: How to stop making Multiple copies in memory when running multiple Spark jobs?

2015-07-05 Thread Haoyuan Li
You can also find more info here: http://tachyon-project.org/master/Running-Spark-on-Tachyon.html Hope this helps. Haoyuan On Tue, Jun 30, 2015 at 11:28 PM, Himanshu Mehra < himanshumehra@gmail.com> wrote: > Hi neprasad, > > You should give a try to Tachyon system. or any other in memory db

Fast big data analytics with Spark on Tachyon in Baidu

2015-05-12 Thread Haoyuan Li
Dear all, We’re organizing a meetup on May 28th at IBM in Forster City that might be of interest to the Spark community. The focus is a production use case of Spark and Tachyon at Baidu. You can sign up here: http://www.meetup.com/Tachyon/events/2

Re: tachyon on machines launched with spark-ec2 scripts

2015-04-24 Thread Haoyuan Li
onnect(Socket.java:579) > at > tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180) > ... 20 more > > > What do I need to do before I can use tachyon? > > thanks > Daniel > -- Haoyuan Li CEO, Tachyon Nexus <http://www.tachyonnexus.com/> AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-04-01 Thread Haoyuan Li
memory) and the rest >> >for regular mesos tasks? >> >> >This means, on each slave node I would have tachyon worker (+ hdfs >> >configuration to talk to s3 or the hdfs datanode) and the mesos slave >> ?process. Is this correct? >> >> >> > > > -- > --Sean > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li
ve node I would have tachyon worker (+ hdfs > configuration to talk to s3 or the hdfs datanode) and the mesos slave > process. Is this correct? > On each slave node, you would run a Tachyon worker. For underfs, you can configure it to use S3 or HDFS or others. Best, Haoyuan > > On 31/03/2015

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Haoyuan Li
tO8eXL > jJsKaT8ne9WZPhZwA4PkdzTxkXF3JNveCIKPzNttsJIaLlvd0nLA/wu6QWmxskp6 > iliGSmEk5P1zZWPPnk+TPIqbA0Ttue7PeXpSrbA9+pYiNT4R/wAneMvmpTABuR4= > =8ijP > -END PGP SIGNATURE- > > - > To unsubscribe, e-mail: user-uns

Re: StorageLevel: OFF_HEAP

2015-03-18 Thread Haoyuan Li
k launch worker-0} >>>> TachyonFS.java[connect]:364) - Invalid method name: >>>> 'getUserUnderfsTempFolder' >>>> ERROR [2015-03-16 22:22:54,050] ({Executor task launch worker-0} >>>> TachyonFS.java[getFileId]:1020) - Invalid method name: 'user_getFileId' >>>> >>>> Is this because of a version mis-match? >>>> >>>> On a different note, I was wondering if Tachyon has been used in a >>>> production environment by anybody in this group? >>>> >>>> Appreciate your help with this. >>>> >>>> >>>> - Ranga >>>> >>>> >>> >> > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: RE: Building spark over specified tachyon

2015-03-15 Thread Haoyuan Li
th maven and targeting on specific tachyon > version (let's say the most recent 0.6.0 release), > > how should that be done? What maven compile command should be like ? > > > > Thanks, > > Sun. > > > -- > > fightf...@163.com > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: Spark or Tachyon: capture data lineage

2015-01-02 Thread Haoyuan Li
a graph like (A and B)->C->E. > > Is this something already possible with spark/tachyon? If not, do you > think it is possible? Does anyone mind to share their experience in > capturing the data lineage in a data processing pipeline? > > Best Regards, > &

Re: spark broadcast unavailable

2014-12-10 Thread Haoyuan Li
to some kind of database,although I prefer save data in >> memory. >> >> here is come code snippets: >> val esRdd = kafkaDStreams.flatMap(_.split("\\n")) >> .map{ >> case esregex(datetime, time_request) => >> var ipInfo:Array[String]=Array

Re: Persist kafka streams to text file, tachyon error?

2014-11-22 Thread Haoyuan Li
at java.net.Socket.connect(Socket.java:579) > at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180) > ... 31 more > 14/11/21 14:17:54 ERROR storage.TachyonBlockManager: Failed 10 attempts to > create tachyon dir in > /tmp_spark_tachyon/spark-3dbec68b-f5b8-45e1-bb68-370439839d4a/ > > I looked at the code. It has the following part. Is that a problem? > > .persist(StorageLevel.OFF_HEAP) > > Any advice? > > Thank you! > > J > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: Saving very large data sets as Parquet on S3

2014-10-24 Thread Haoyuan Li
ave-a-multi-terabyte-schemardd-in-parquet-format-on-s3 > + > http://stackoverflow.com/questions/26321947/multipart-uploads-to-amazon-s3-from-apache-spark > + > http://stackoverflow.com/questions/26291165/spark-sql-unable-to-complete-writing-parquet-data-with-a-large-number-of-shards > > thanks > Daniel > > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Fwd: Second Bay Area Tachyon meetup: October 21st, hosted by Pivotal (Limited Space)

2014-10-02 Thread Haoyuan Li
-- Forwarded message -- From: Haoyuan Li Date: Thu, Oct 2, 2014 at 10:12 AM Subject: Second Bay Area Tachyon meetup: October 21st, hosted by Pivotal (Limited Space) To: tachyon-us...@googlegroups.com Hi folks, We've posted the second Tachyon meetup featuring exciting up

First Bay Area Tachyon meetup: August 25th, hosted by Yahoo! (Limited Space)

2014-08-19 Thread Haoyuan Li
Hi folks, We've posted the first Tachyon meetup, which will be on August 25th and is hosted by Yahoo! (Limited Space): http://www.meetup.com/Tachyon/events/200387252/ . Hope to see you there! Best, Haoyuan -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/

Re: share/reuse off-heap persisted (tachyon) RDD in SparkContext or saveAsParquetFile on tachyon in SQLContext

2014-08-11 Thread Haoyuan Li
Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Haoyuan Li AMPLab, EECS, UC Berkeley http://www.cs.berkeley.edu/~haoyuan/