n stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0
(TID 4, 172.31.61.41): java.lang.IllegalArgumentException: Unknown codec:
com.hadoop.compression.lzo.LzoCodec
Could you please help me reading the file with pyspark ?
Thank you for your help,
Cheers,
Bertrand
--
View t
Thanks for your prompt reply.
I will follow https://issues.apache.org/jira/browse/SPARK-2394 and will let
you know if everything works.
Cheers,
Bertrand
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Google-Books-Ngrams-with-pyspark-1-4-1
age 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3, 172.31.12.23): java.lang.IllegalArgumentException: Unknown codec:
com.hadoop.compression.lzo.LzoCodec
Thanks for your help,
Cheers,
Bertrand
--
View this message in context:
http://apache-spark-user-list.1001560.n3
/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
: java.lang.ClassNotFoundException: com.hadoop.mapreduce.LzoTextInputFormat
v
Thanks for yo
n get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
: java.lang.ClassNotFoundException: com.hadoop.mapreduce.LzoTextInputFormat
v
Could you please help me read a LZO file with pyspark ?
Thank you for your he
2. Who is or was using the *interruptOnCancel* ? Do you got burn? It is
still working without any incident?
Thanks in advance for any info, feedbacks and war stories.
Bertrand Dechoux
proven wrong (which also would implies EsperTech
is dead, which I also doubt...)
Bertrand
On Mon, Sep 14, 2015 at 2:31 PM, Todd Nist wrote:
> Stratio offers a CEP implementation based on Spark Streaming and the
> Siddhi CEP engine. I have not used the below, but they may be of some
>
?
Which ones?
LibDAI, which created the supported format, "supports parameter learning of
conditional probability tables by Expectation Maximization" according to
the documentation. Is it your reference tool?
Bertrand
On Thu, Dec 15, 2016 at 5:21 AM, Bryan Cutler wrote:
> I'll ch
chael already explained.
Bertrand
On Mon, Jun 16, 2014 at 1:23 PM, Michael Cutler wrote:
> Hello Wei,
>
> I talk from experience of writing many HPC distributed application using
> Open MPI (C/C++) on x86, PowerPC and Cell B.E. processors, and Parallel
> Virtual Machine (PVM) way
not a
good idea to compete against the optimizer, it is of course also true for
'BigData'.
Bertrand
On Sun, Jun 22, 2014 at 1:32 PM, Flavio Pompermaier
wrote:
> Hi folks,
> I was looking at the benchmark provided by Cloudera at
> http://blog.cloudera.com/blog/2014/05/new-sql
d version of it.
Regards
Bertrand Dechoux
t the Pig 0.13
release?
Is the pluggable execution engine flexible enough in order to avoid having
Spork as a fork of Pig? Pig + Spark + Fork = Spork :D
As a (for now) external observer, I am glad to see competition in that
space. It can only be good for the community in the end.
Bertrand Dechoux
#x27; for Spark.
@Zhang : Could you elaborate your reference about Twitter?
Bertrand Dechoux
On Tue, Jul 8, 2014 at 4:04 AM, 张包峰 wrote:
> Hi guys, previously I checked out the old "spork" and updated it to Hadoop
> 2.0, Scala 2.10.3 and Spark 0.9.1, see github project of mine
&g
A picture is worth a thousand... Well, a picture with this dataset, what
you are expecting and what you get, would help answering your initial
question.
Bertrand
On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk wrote:
> Can someone please run the standard kMeans code on this input wit
A patch proposal on the apache JIRA for Spark?
https://issues.apache.org/jira/browse/SPARK/
Bertrand
On Thu, Jul 10, 2014 at 2:37 PM, Rahul Bhojwani wrote:
> And also that there is a small bug in implementation. As I mentioned this
> earlier also.
>
> This is my first time I am re
concept. As long as you apply functions with no side effect (ie the only
impact is the returned results), then you just need to not take into
account results from additional attempts of the same task/operator.
Bertrand Dechoux
On Tue, Jul 15, 2014 at 9:34 PM, Andrew Ash wrote:
> Hi Nan,
>
>
And you might want to apply clustering before. It is likely that every user
and every item are not unique.
Bertrand Dechoux
On Fri, Jul 18, 2014 at 9:13 AM, Nick Pentreath
wrote:
> It is very true that making predictions in batch for all 1 million users
> against the 10k items will be
> Is there any documentation from cloudera on how to run Spark apps on CDH
Manager deployed Spark ?
Asking the cloudera community would be a good idea.
http://community.cloudera.com/
In the end only Cloudera will fix quickly issues with CDH...
Bertrand Dechoux
On Wed, Jul 23, 2014 at 9:28
Well, anyone can open an account on apache jira and post a new
ticket/enhancement/issue/bug...
Bertrand Dechoux
On Fri, Jul 25, 2014 at 4:07 PM, Sparky wrote:
> Thanks for the suggestion. I can confirm that my problem is I have files
> with zero bytes. It's a known bug and is
.
Has another name been already discussed? It could be keep() or remove().
But take() could also be reused and instead of providing a number, the
filter function could be requested.
Regards
Bertrand
I understand the explanation but I had to try. However, the change could be
made without breaking anything but that's another story.
Regards
Bertrand
Bertrand Dechoux
On Thu, Feb 27, 2014 at 2:05 PM, Nick Pentreath wrote:
> filter comes from the Scala collection method "filter&q
out. I understand that the ROI is
really likely not worth it.
Thanks for the feedback
Bertrand
On Thu, Feb 27, 2014 at 3:38 PM, Nick Pentreath wrote:
> Agree that filter is perhaps unintuitive. Though the Scala collections API
> has "filter" and "filterNot" which
In a single phrase : if you understand what map() does and what a flatten()
might do, then flatMap() is like a map() followed by a flatten().
Like previously said, the concepts in themselves are not Spark specific.
Bertrand
On Wed, Mar 12, 2014 at 1:19 PM, Xuefeng Wu wrote:
> It is the s
But you might run into performance issue. I don't know the subject about
Spark but with Hadoop MapReduce, Sqoop might be a solution in order to
handle with care the database
Bertrand Dechoux
On Fri, Mar 14, 2014 at 4:47 AM, Christopher Nguyen wrote:
> Nicholas,
>
> > (Can we
I don't know the Spark issue but the Hadoop context is clear.
old api -> org.apache.hadoop.mapred
new api -> org.apache.hadoop.mapreduce
You might only need to change your import.
Regards
Bertrand
On Wed, Mar 19, 2014 at 11:29 AM, Pariksheet Barapatre wrote:
> Hi,
>
>
his subject?
Thanks in advance
Bertrand
Spark SQL as of now.
Does it also imply the reverse is true? That I can write data as hive data
with spark SQL using results from a random (python) Spark application?
Bertrand Dechoux
On Thu, Apr 17, 2014 at 7:23 AM, Matei Zaharia wrote:
> Yes, this JIRA would enable that. The Hive support
According to the Spark SQL documentation, indeed, this project allows
python to be used while reading/writing table ie data which not necessarily
in text format.
Thanks a lot!
Bertrand Dechoux
On Thu, Apr 17, 2014 at 10:06 AM, Bertrand Dechoux wrote:
> Thanks for the IRA reference. I rea
Cool, thanks for the link.
Bertrand Dechoux
On Mon, Apr 21, 2014 at 7:31 PM, Nick Pentreath wrote:
> Also see: https://github.com/apache/spark/pull/455
>
> This will add support for reading sequencefile and other inputformat in
> PySpark, as long as the Writables are either simple
dfs.namenode.path.based.cache.refresh.interval.ms might be too large?
You might want to ask a broader mailing list. This is not related to Spark.
Bertrand
On Fri, May 16, 2014 at 2:56 AM, hequn cheng wrote:
> I tried centralized cache step by step following the apache hadoop oficial
> website, but it seems centralized cache d
http://spark-summit.org ?
Bertrand
On Thu, May 8, 2014 at 2:05 AM, Ian Ferreira wrote:
> Folks,
>
> I keep getting questioned on real world experience of Spark as in mission
> critical production deployments. Does anyone have some war stories to share
> or know of reso
31 matches
Mail list logo