Fwd: Help: Configure `secrets.CODECOV_TOKEN` for Github Spark repo to ensure successful daily scheduling of workflow `build_coverage.yml`

2025-01-07 Thread Pan Bingkun
Hi, all, As described in the email, I would like to add a configuration item *`secrets.CODECOV_TOKEN` *in *Apache Spark Settings*. The problem now is that I don't have the permission to *see this tab settings*, and I don't know who has the permission to see this tab. *Can anyone tell me how to do

Fwd: Why spark-submit works with package not with jar

2024-05-04 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct t

Fwd: [SparkListener] Accessing classes loaded via the '--packages' option

2024-05-04 Thread Damien Hawes
Hi folks, I'm contributing to the OpenLineage project, specifically the Apache Spark integration. My current focus is on extending the project to support data lineage extraction for Spark Streaming, beginning with Apache Kafka sources and sinks. I've encountered an obstacle when attempting to acc

Fwd: Apache Spark 3.4.3 (?)

2024-04-07 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to

Fwd: 📅 Wednesday: Join 6 Members at "Ofir Press | Complementing Scale: Novel Guidance Methods for Improving LMs"

2023-08-24 Thread Mich Talebzadeh
They recently combined Apache Spark and AI meeting in London. An online session worth attending for some? HTH Mich view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. An

Fwd: Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

2023-08-22 Thread Mich Talebzadeh
I found some of the notes on Volcano and my tests back in Feb 2022. I did my volcano tests on Spark 3.1.1. The results were not very great then. Hence I asked in thread from @santosh, if any updated comparisons are available. I will try the test with Spark 3.4.1 at some point. Maybe some users have

Fwd: Question about ARRAY_INSERT between Spark and Databricks

2023-08-14 Thread Ran Tao
> Forward to dev Yes, the databricks runtime 13.0 and 13.1 and 13.2 are all ok and have the same behavior with open source Apache Spark 3.4.x. But I think the docs of databricks need to be updated[1]. It's confusing. [1] https://docs.databricks.com/en/sql/language-manual/functions/array_insert.ht

Fwd: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Pavan Kotikalapudi
Hi Spark Dev, I have extended traditional DRA to work for structured streaming use-case. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc: https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing Please

Fwd: Time to start publishing Spark Docker Images?

2023-05-09 Thread Mich Talebzadeh
Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and a

Fwd: 自动回复: Re: [DISCUSS] Show Python code examples first in Spark documentation

2023-02-26 Thread Mich Talebzadeh
Hi, Can someone disable the below login from spark forums please? Sounds like someone left this email and we are receiving a spam type message anytime we respond. thanks view my Linkedin profile https://en.everybodywiki.com/Mich

Fwd: Shutting down spark structured streaming when the streaming process completed current process

2023-02-07 Thread Mich Talebzadeh
Resending this feature request and proposing a possible solution Can some advise if I need to complete Spark project improvement proposal Thanks view my Linkedin profile https

Fwd: CRAN submission SparkR 3.2.0

2021-10-20 Thread Felix Cheung
-- Forwarded message - From: Gregor Seyer Date: Wed, Oct 20, 2021 at 4:42 AM Subject: Re: CRAN submission SparkR 3.2.0 To: Felix Cheung , CRAN < cran-submissi...@r-project.org> Thanks, Please add \value to .Rd files regarding exported methods and explain the functions results in

Fwd: Unpacking and using external modules with PySpark inside k8s

2021-07-21 Thread Mich Talebzadeh
Hi, I am aware that some fellow members in this dev group were involved in creating scripts for running spark on kubernetes # To build additional PySpark docker image$ ./bin/docker-image-tool.sh -r -t my-tag -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build The problem I have

Fwd: CRAN package SparkR

2021-06-13 Thread Felix Cheung
It looks like they would not allow caching the Spark Distribution. I’m not sure what can be done about this. If I recall, the package should remove this during test. Or maybe spark.install() ie optional (hence getting user confirmation?) -- Forwarded message - Date: Sun, Jun 13,

Fwd: Companies and Organizations listing for datapipelines.com

2020-08-12 Thread roland
Hi there, Just following up on my previous email since I haven't heard back. Some more info about the project: - recently upgraded to use spark 3.0.0 - uses spark-core and spark-sql modules - AWS S3 support - JDBC support Short description: Create and schedule data pipelines step-by-step using

Fwd: Announcing ApacheCon @Home 2020

2020-07-01 Thread Felix Cheung
-- Forwarded message - We are pleased to announce that ApacheCon @Home will be held online, September 29 through October 1. More event details are available at https://apachecon.com/acah2020 but there’s a few things that I want to highlight for you, the members. Yes, the CFP has

Fwd: Automatic PR labeling

2020-04-02 Thread Hyukjin Kwon
Seems like this email missed to cc the mailing list, forwarding it for trackability. -- Forwarded message - 보낸사람: Ismaël Mejía Date: 2020년 4월 2일 (목) 오후 4:46 Subject: Re: Automatic PR labeling To: Hyukjin Kwon +1 Just for ref there is a really simple Github App for this: https:

Fwd: dataframe null safe joins given a list of columns

2020-02-08 Thread Enrico Minack
Hi Devs, I am forwarding this from the user mailing list. I agree that the <=> version of join(Dataset[_], Seq[String]) would be useful. Does any PMC consider this useful enough to be added to the Dataset API? I'd be happy to create a PR in that case. Enrico Weitergeleitete Nach

Fwd: Spark 2.4.4, RPC encryption and Python

2020-01-19 Thread Luca Toscano
Hi everybody, trying to ask the same question in dev@ since it would be helpful some info about how to debug this :) Thanks in advance, Luca -- Forwarded message - Da: Luca Toscano Date: gio 16 gen 2020 alle ore 09:16 Subject: Spark 2.4.4, RPC encryption and Python To: Hi ev

Fwd: [apache/spark] [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ (#26332)

2019-10-30 Thread Sean Owen
I wanted to raise this to dev@. So, updating dropwizard metrics from 3.2.x to 4.x might be important for JDK 11 support. Our tests pass as-is without this update. But we don't test some elements of this metrics support, like Ganglia integration. And I have heard reports that downstream custom usag

Fwd: [PMCs] Any project news or announcements this week?

2019-10-20 Thread Sean Owen
I wonder if we are likely to have a Spark 3.0 preview release this week? no rush, but if we do, let's CC Sally to maybe mention at ApacheCon. -- Forwarded message - From: Sally Khudairi Date: Sun, Oct 20, 2019 at 4:00 PM Subject: [PMCs] Any project news or announcements this week?

Fwd: Check

2019-09-27 Thread Aniket Khandelwal
Hi all, I was stuck on a problem that I faced recently. The problem statement is like : Event Bean consists of eventId, eventTag, text, . We need to run a spark job that aggregates the eventTag column and picks top K1 of them. Additionally, we need for each eventTag, list of eventIds (first K2

Fwd: Custom aggregations: modular and lightweight solutions?

2019-08-21 Thread Andrew Leverentz
Hi All, Apologies for cross-posting this, but I'm wondering if the dev list might be a better place for my questions below. For now, I'm developing set of utilities for my own use, but if I can get these utilities working, I'd like to see if it might be worth contributing them to the Spark projec

Fwd: The result of Math.log(3.0) is different on x86_64 and aarch64?

2019-07-29 Thread Sean Owen
That is really interesting re: the recent threads about the value of log() and pow() in the JVM. I think it's worth copying to dev@ here. -- Forwarded message - From: Tianhua huang Date: Mon, Jul 29, 2019 at 5:28 AM Subject: Fwd: The result of Math.log(3.0) is different on x

Fwd: Sample date_trunc error for webpage (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc )

2019-07-07 Thread Chris Lambertus
Spark, We received this message. I have not ACKd it. -Chris INFRA > Begin forwarded message: > > From: "binggan1989" > Subject: Sample date_trunc error for webpage > (https://spark.apache.org/docs/2.3.0/api/sql/#date_trunc ) > Date: July 5, 2019 at 2:54:54 AM PDT > To: "webmaster" > Reply-T

Fwd: Apache Roadshow India in 2020?

2019-06-04 Thread Swapnil M Mane
Hello Spark Devs, Just for your information, we are exploring the possibility of Apache Roadshow India in 2020 [1]. If you and interested and want to give feedback or help with planning, please subscribe to d...@community.apache.org and provide feedback there. [1] https://s.apache.org/BJTl - B

Fwd: Uncaught Exception Handler in master

2019-04-16 Thread Alessandro Liparoti
Hi everyone, I have a spark libary where I would like to do some action before an uncaught exception happens (log it, increment an error metric, ...). I tried multiple times to use setUncaughtExceptionHandler in the current Thread but this doesn't work. If I spawn another thread this works fine. A

Fwd: Dataset schema incompatibility bug when reading column partitioned data

2019-04-11 Thread Dávid Szakállas
+dev for more visibility. Is this a known issue? Is there a plan for a fix? Thanks, David > Begin forwarded message: > > From: Dávid Szakállas > Subject: Dataset schema incompatibility bug when reading column partitioned > data > Date: 2019. March 29. 14:15:27 CET > To: u...@spark.apache.org >

Fwd: CRAN submission SparkR 2.3.3

2019-02-24 Thread Shivaram Venkataraman
FYI here is the note from CRAN from submitting 2.3.3. There were some minor issues with the package description file in our CRAN submission. We are discussing with the CRAN team about this and also Felix has a patch to address this for upcoming releases. One thing I was wondering is that if there

Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-07 Thread Sean Owen
See below: Apache projects are migrating to a new git infrastructure, and are seeking projects to volunteer to move earlier than later. I believe Spark should volunteer. This should mostly affect committers, who would need to point to the new remote. It could affect downstream consumers of the Apa

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Felix Cheung
FYI. SparkR submission failed. It seems to detect Java 11 correctly with vignettes but not skipping tests as would be expected. Error: processing vignette ‘sparkr-vignettes.Rmd’ failed with diagnostics: Java version 8 is required for this package; found version: 11.0.1 Execution halted * checkin

Fwd: [VOTE] SPARK 2.4.0 (RC3)

2018-10-11 Thread Wenchen Fan
Forgot to cc dev-list -- Forwarded message - From: Wenchen Fan Date: Thu, Oct 11, 2018 at 10:14 AM Subject: Re: [VOTE] SPARK 2.4.0 (RC3) To: Cc: Sean Owen Ah sorry guys, I just copy-paste the voting email from the last RC and forgot to update the date :P The voting should be

Fwd: data source api v2 refactoring

2018-09-04 Thread Ryan Blue
Latest from Wenchen in case it was dropped. -- Forwarded message - From: Wenchen Fan Date: Mon, Sep 3, 2018 at 6:16 AM Subject: Re: data source api v2 refactoring To: Cc: Ryan Blue , Reynold Xin , < dev@spark.apache.org> Hi Mridul, I'm not sure what's going on, my email was CC

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.2.2

2018-07-09 Thread Shivaram Venkataraman
The upcoming 2.2.2 release was submitted to CRAN. I think there are some knows issues on Windows, but does anybody know what the following error with Netty is ? > WARNING: Illegal reflective access by > io.netty.util.internal.PlatformDependent0$1 > (file:/home/hornik/.cache/spark/spark-2.2.2

Fwd: Beam's recent community development work

2018-07-02 Thread Sean Owen
Worth, I think, a read and consideration from Spark folks. I'd be interested in comments; I have a few reactions too. -- Forwarded message - From: Kenneth Knowles Date: Sat, Jun 30, 2018 at 1:15 AM Subject: Beam's recent community development work To: , , Griselda Cuevas < g...@ap

Fwd: [SparkML] Random access in SparseVector will slow down inference stage for some tree based models

2018-07-01 Thread Vincent Wang
Hi there, I'm using *GBTClassifier* do some classification jobs and find the performance of scoring stage is not quite satisfying. The trained model has about 160 trees and the input feature vector is sparse and its size is about 20+. After some digging, I found the model will repeatedly and rand

Fwd: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread 刘崇光
-- Forwarded message -- From: 刘崇光 Date: Thu, Jun 14, 2018 at 11:08 AM Subject: array_contains in package org.apache.spark.sql.functions To: u...@spark.apache.org Hello all, I ran into a use case in project with spark sql and want to share with you some thoughts about the functio

Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.3.1

2018-06-12 Thread Shivaram Venkataraman
Corresponding to the Spark 2.3.1 release, I submitted the SparkR build to CRAN yesterday. Unfortunately it looks like there are a couple of issues (full message from CRAN is forwarded below) 1. There are some builds started with Java 10 (http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/

Fwd: Help needed in R documentation generation

2018-02-27 Thread Mihály Tóth
Hi, Actually, when I open the link you provided and click on - for example - 'sin' the page does not seem to describe that function at all. Actually I get same effect that I get locally. I have attached a screenshot about that: [image: Szövegközi kép 1] I tried with Chrome and then with Safari

Fwd: ***UNCHECKED*** [jira] [Resolved] (SPARK-23218) simplify ColumnVector.getArray

2018-01-26 Thread Sean Owen
This is an example of the "*** UNCHECKED ***" message I was talking about -- it's part of the email subject rather than JIRA. -- Forwarded message - From: Xiao Li (JIRA) Date: Fri, Jan 26, 2018 at 11:18 AM Subject: ***UNCHECKED*** [jira] [Resolved] (SPARK-23218) simplify ColumnVec

Fwd: Publishing official docker images for KubernetesSchedulerBackend

2017-12-19 Thread Erik Erlandson
Here are some specific questions I'd recommend for the Apache Spark PMC to bring to ASF legal counsel: 1) Does the philosophy described on LEGAL-270 still represent a sanctioned approach to publishing releases via container image? 2) If the transitive closure of pulled-in licenses on each of these

Fwd: [jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-30 Thread Jacek Laskowski
Hi, I think that's the code change (by Marcelo Vanzin) that has changed how logging works as of now which seems not to load conf/log4j.properties by default. Can anyone explain how it's supposed to work in 2.3? I could not figure it out from the code and conf/log4j.properties is not picked up (bu

Fwd: spark dataframe jdbc Amazon RDS problem

2017-08-29 Thread 刘虓
+dev -- Forwarded message -- From: 刘虓 Date: 2017-08-27 1:02 GMT+08:00 Subject: Re: spark dataframe jdbc Amazon RDS problem To: user my code is here: from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() mysql_jdbc_url = 'mydb/test' table = "test" props

Fwd: SPIP: Spark on Kubernetes

2017-08-17 Thread Timothy Chen
-- Forwarded message -- From: Timothy Chen Date: Thu, Aug 17, 2017 at 2:48 PM Subject: Re: SPIP: Spark on Kubernetes To: Marcelo Vanzin Hi Marcelo, Agree with your points, and I had that same thought around Resource staging server and like to share that with Spark on Mesos (onc

Re: Fwd: spark git commit: [SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow vectors.

2017-07-21 Thread Liang-Chi Hsieh
Yeah, I think it should be "ColumnVector.Array". Already ping @ueshin for this issue. Jacek Laskowski wrote > Hi, > > Looks like the change has broken the build for me: > > [INFO] --- scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) @ > spark-sql_2.11 --- > /Users/jacek/dev/oss/spark/sql/co

Fwd: spark git commit: [SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow vectors.

2017-07-21 Thread Jacek Laskowski
Hi, Looks like the change has broken the build for me: [INFO] --- scala-maven-plugin:3.2.2:doc-jar (attach-scaladocs) @ spark-sql_2.11 --- /Users/jacek/dev/oss/spark/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ArrowColumnVector.java:243: error: not found: type Array public

Fwd: Does pyspark worker really use pipe?

2017-07-19 Thread Cyanny LIANG
Hello, As pyspark internals wiki said, pyspark worker use pipe to communicate, not socket. https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals I have checked the pyspark/worker.py code: if __name__ == '__main__': # Read a local port to connect to from stdin java_port = int

Fwd: Testing Apache Spark with JDK 9 Early Access builds

2017-07-14 Thread Matei Zaharia
FYI, the JDK group at Oracle is reaching out to see whether anyone wants to test with JDK 9 and give them feedback. Just contact them directly if you'd like to. -- Forwarded message -- From: dalibor topic Date: Wed, Jul 12, 2017 at 3:16 AM Subject: Testing Apache Spark with JDK 9

Fwd: A question about rdd transformation

2017-06-22 Thread Lionel Luffy
add dev list. Who can help on below question? Thanks & Best Regards, LL -- Forwarded message -- From: Lionel Luffy Date: Fri, Jun 23, 2017 at 11:20 AM Subject: Re: A question about rdd transformation To: u...@spark.apache.org Now I found the root cause is a Wrapper class in Any

Re: Fwd: [SparkSQL] Project using NamedExpression

2017-04-03 Thread Aviral Agarwal
;> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsR > >> DD.scala:38) > >> at > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > >>

Re: Fwd: [SparkSQL] Project using NamedExpression

2017-03-28 Thread Liang-Chi Hsieh
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.sca >> la:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor. >> scala:214) >> at java.util.concurrent.ThreadPoolEx

Re: Fwd: [SparkSQL] Project using NamedExpression

2017-03-27 Thread Aviral Agarwal
t; > at org.apache.spark.executor.Executor$TaskRunner.run(Executor. > > scala:214) > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > > Executor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > >

Re: Fwd: [SparkSQL] Project using NamedExpression

2017-03-24 Thread Aviral Agarwal
nWorker(ThreadPool > Executor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > This might be because the Expression is unresolved. > > Any help would be appreciated

Re: Fwd: [SparkSQL] Project using NamedExpression

2017-03-23 Thread Liang-Chi Hsieh
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > Executor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > lExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > This might be because the Expressio

Fwd: [SparkSQL] Project using NamedExpression

2017-03-21 Thread Aviral Agarwal
Hi guys, I want transform Row using NamedExpression. Below is the code snipped that I am using : def apply(dataFrame: DataFrame, selectExpressions: java.util.List[String]): RDD[UnsafeRow] = { val exprArray = selectExpressions.map(s => Column(SqlParser.parseExpression(s)).named )

Fwd: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-20 Thread Steve Loughran
There's an RC of Hadoop 2.8.0 out for people to build and test against. This is the last chance for people to find problems and complain before it gets out the door Spark happily builds against it. my tests all work. For anyone wanting to do the build, the maven arguments are: -Phive,yarn,hadoo

Fwd: Question on Spark's graph libraries roadmap

2017-03-13 Thread enzo
Please see this email trail: no answer so far on the user@spark board. Trying the developer board for better luck The question: I am a bit confused by the current roadmap for graph and graph analytics in Apache Spark. I understand that we have had for some time two libraries (the following

Fwd: Build completed: spark 866-master

2017-03-04 Thread Shivaram Venkataraman
I'm not sure why the AppVeyor updates are coming to the dev list. Hyukjin -- Do you know if we made any recent changes that might have caused this ? Thanks Shivaram -- Forwarded message -- From: AppVeyor Date: Sat, Mar 4, 2017 at 2:46 PM Subject: Build completed: spark 866-maste

Fwd: tylerchap...@yahoo-inc.com is no longer with Yahoo! (was: Dealing with missing columns in SPARK SQL in JSON)

2017-02-14 Thread Aseem Bansal
Can someone please remove tylerchap...@yahoo-inc.com from the mailing list? I was told in a spark JIRA that dev mailing list is the right place to ask for this. -- Forwarded message -- From: Yahoo! No Reply Date: Tue, Feb 14, 2017 at 8:00 PM Subject: tylerchap...@yahoo-inc.com is

Fwd: Handling Skewness and Heterogeneity

2017-02-14 Thread Anis Nasir
Dear all, Can you please comment on the below mentioned use case. Thanking you in advance Regards, Anis -- Forwarded message - From: Anis Nasir Date: Tue, 14 Feb 2017 at 17:01 Subject: Handling Skewness and Heterogeneity To: Dear All, I have few use cases for spark streami

Fwd: Google Summer of Code 2017 is coming

2017-02-03 Thread Jacek Laskowski
Hi, Is this something Spark considering? Would be nice to mark issues as GSoC in JIRA and solicit feedback. What do you think? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jac

Fwd: Question about SPARK-11374 (skip.header.line.count)

2016-12-08 Thread Dongjoon Hyun
+dev I forget to add @user. Dongjoon. -- Forwarded message - From: Dongjoon Hyun Date: Thu, Dec 8, 2016 at 16:00 Subject: Question about SPARK-11374 (skip.header.line.count) To: Hi, All. Could you give me some opinion? There is an old SPARK issue, SPARK-11374, about rem

Fwd: SparkILoop doesn't run

2016-11-17 Thread Mohit Jaggi
I am trying to use SparkILoop to write some tests(shown below) but the test hangs with the following stack trace. Any idea what is going on? import org.apache.log4j.{Level, LogManager} import org.apache.spark.repl.SparkILoop import org.scalatest.{BeforeAndAfterAll, FunSuite} class SparkReplSpec

Fwd: using Spark Streaming with Kafka 0.9/0.10

2016-11-15 Thread aakash aakash
Re-posting it at dev group. Thanks and Regards, Aakash -- Forwarded message -- From: aakash aakash Date: Mon, Nov 14, 2016 at 4:10 PM Subject: using Spark Streaming with Kafka 0.9/0.10 To: user-subscr...@spark.apache.org Hi, I am planning to use Spark Streaming to consume mes

Fwd:

2016-11-15 Thread Anton Okolnychyi
Hi, I have experienced a problem using the Datasets API in Spark 1.6, while almost identical code works fine in Spark 2.0. The problem is related to encoders and custom aggregators. *Spark 1.6 (the aggregation produces an empty map):* implicit val intStringMapEncoder: Encoder[Map[Int, String]]

Fwd: Large variation in spark in Task Deserialization Time

2016-10-17 Thread Pulasthi Supun Wickramasinghe
Hi Devs/All, I am seeing a huge variation on spark Task Deserialization Time for my collect and reduce operations. while most tasks complete within 100ms a few take mote than a couple of seconds which slows the entire program down. I have attached a screen shot of the web UI where you can see the

Fwd: Looking for a Spark-Python expert

2016-10-07 Thread Boris Lenzinger
Hi all, I don't know where to post this announce so I really apologize to pollute the ML with such a mail. I'm looking for an expert in Spark 2.0 and its Python API. I have a customer that is looking for an expertise mission (for one month but I guess it can spread on 2 month seeing the goals to

Re: Fwd: Question regarding merging to two RDDs

2016-09-17 Thread WangJianfei
maybe you can use dataframe ,with the header file as a schema -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-Question-regarding-merging-to-two-RDDs-tp18971p18977.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Fwd: Question regarding merging to two RDDs

2016-09-17 Thread Hiral Mehta
Hi, I have two separate csv files one with header and other with data. I read those two files in 2 different RDDs and now I need to merge both the RDDs. I tried various options such as union, zip, join but none worked for my problem. What is the best way to merge two RDDs so that the header and d

Re: Fwd: HANA data access from SPARK

2016-09-12 Thread whitefalcon
na-in-memory/blog/2016/09/09/calling-hana-views-from-apache-spark Regards Aron -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-HANA-data-access-from-SPARK-tp16412p18921.html Sent from the Apache Spark Developers List mailing list archive at Nabbl

Re: Fwd: seeing this message repeatedly.

2016-09-05 Thread Radoslaw Gruchalski
All your workers go via public IP. Do you have the ports opened? Why public IP? Is it not better to use the private 10.x address? – Best regards, Radek Gruchalski ra...@gruchalski.com On September 5, 2016 at 11:49:30 AM, kant kodali (kanth...@gmail.com) wrote: -- Forwarded message ---

Fwd: seeing this message repeatedly.

2016-09-05 Thread kant kodali
-- Forwarded message -- From: kant kodali Date: Sat, Sep 3, 2016 at 5:39 PM Subject: seeing this message repeatedly. To: "user @spark" Hi Guys, I am running my driver program on my local machine and my spark cluster is on AWS. The big question is I don't know what are the rig

Fwd: KMeans calls takeSample() twice?

2016-08-30 Thread Georgios Samaras
-- Forwarded message -- From: Georgios Samaras Date: Tue, Aug 30, 2016 at 9:49 AM Subject: Re: KMeans calls takeSample() twice? To: "Sean Owen [via Apache Spark Developers List]" < ml-node+s1001551n18788...@n3.nabble.com> I am not sure what you want me to check. Note that I see t

Fwd: Anyone else having trouble with replicated off heap RDD persistence?

2016-08-23 Thread Michael Allman
FYI, I posted this to user@ and have followed up with a bug report: https://issues.apache.org/jira/browse/SPARK-17204 Michael > Begin forwarded message: > > From: Michael Allman > Subject: Anyone else having trouble with replicated off heap

Fwd: Outer Explode needed

2016-07-25 Thread Don Drake
No response on the Users list, I thought I would repost here. See below. -Don -- Forwarded message -- From: Don Drake Date: Sun, Jul 24, 2016 at 2:18 PM Subject: Outer Explode needed To: user I have a nested data structure (array of structures) that I'm using the DSL df.explod

Fwd: spark graphx storage RDD memory leak

2016-04-11 Thread zhang juntao
yes I use version 1.6 , and thanks Ted > Begin forwarded message: > > From: Robin East > Subject: Re: spark graphx storage RDD memory leak > Date: April 12, 2016 at 2:13:10 AM GMT+8 > To: zhang juntao > Cc: Ted Yu , dev@spark.apache.org > > this looks like https://issues.apache.org/jira/brows

Fwd: spark graphx storage RDD memory leak

2016-04-10 Thread zhang juntao
thanks ted for replying , these three lines can’t release param graph cache, it only release g ( graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)).cache() ) ConnectedComponents.scala param graph will cache in ccGraph and won’t be release in Pregel def run[VD: ClassTag, ED: ClassT

Fwd: Master options Cluster/Client descrepencies.

2016-03-29 Thread satyajit vegesna
Hi All, I have written a spark program on my dev box , IDE:Intellij scala version:2.11.7 spark verison:1.6.1 run fine from IDE, by providing proper input and output paths including master. But when i try to deploy the code in my cluster made of below, Spark version:

Fwd: Apache Spark Exception in thread “main” java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

2016-03-19 Thread satyajit vegesna
Hi, Scala version:2.11.7(had to upgrade the scala verison to enable case clasess to accept more than 22 parameters.) Spark version:1.6.1. PFB pom.xml Getting below error when trying to setup spark on intellij IDE, 16/03/16 18:36:44 INFO spark.SparkContext: Running Spark version 1.6.1 Exception

RE: Fwd: DF creation

2016-03-18 Thread Diwakar Dhanuskodi
Import sqlContext.implicits._  before  using  df () Sent from Samsung Mobile. Original message From: satyajit vegesna Date:19/03/2016 06:00 (GMT+05:30) To: u...@spark.apache.org, dev@spark.apache.org Cc: Subject: Fwd: DF creation Hi , I am trying to create separate

Fwd: DF creation

2016-03-18 Thread satyajit vegesna
Hi , I am trying to create separate val reference to object DATA (as shown below), case class data(name:String,age:String) Creation of this object is done separately and the reference to the object is stored into val data. i use val samplerdd = sc.parallelize(Seq(data)) , to create RDD. org.apa

Re: Fwd: spark master ui to proxy app and worker ui

2016-03-06 Thread Gurvinder Singh
I wonder if anyone got any feedback on it. I can look into implement it but would like to know if such a functionality can be merged into master back. If yes then please let me know and point me to the direction to get started. Regards, Gurvinder On 03/04/2016 09:25 AM, Gurvinder Singh wrote: > Fo

Fwd: Spark SQL drops the HIVE table in "overwrite" mode while writing into table

2016-03-05 Thread Dhaval Modi
Regards, Dhaval Modi dhavalmod...@gmail.com -- Forwarded message -- From: Dhaval Modi Date: 5 March 2016 at 20:31 Subject: Spark SQL drops the HIVE table in "overwrite" mode while writing into table To: u...@spark.apache.org Hi Team, I am facing a issue while writing dataframe

Fwd: spark master ui to proxy app and worker ui

2016-03-04 Thread Gurvinder Singh
Forwarding to development mailing list, as it might be more relevant here to ask for it. I am wondering if I miss something in the documentation that it might be possible already. If yes then please point me to the documentation as how to achieve it. If no, then would it make sense to implement it

Fwd: Aggregation + Adding static column + Union + Projection = Problem

2016-02-26 Thread Jiří Syrový
Hi, I've recently noticed a bug in Spark (branch 1.6) that appears if you do the following Let's have some DataFrame called df. 1) Aggregation of multiple columns on the Dataframe df and store result as result_agg_1 2) Do another aggregation of multiple columns, but on one less grouping columns

Fwd: HANA data access from SPARK

2016-02-23 Thread Dushyant Rajput
Hi, I am writting a python app to load data from SAP HANA. dfr = DataFrameReader(sqlContext) df = dfr.jdbc(url='jdbc:sap://ip_hana:30015/?user=&password=',table=table) df.show() It throws a ​ serialization error​ : y4j.protocol.Py4JJavaError: An error occurred while calling o59.showString. : or

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
Correct :) _ From: Sun, Rui Sent: Sunday, February 7, 2016 5:19 AM Subject: RE: Fwd: Writing to jdbc database from SparkR (1.5.2) To: , Felix Cheung , Andrew Holway This should be solved by your pending PR https://github.com/apache

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Sun, Rui
This should be solved by your pending PR https://github.com/apache/spark/pull/10480, right? From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Sunday, February 7, 2016 8:50 PM To: Sun, Rui ; Andrew Holway ; dev@spark.apache.org Subject: RE: Fwd: Writing to jdbc database from SparkR

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
posed. On Sat, Feb 6, 2016 at 11:57 PM -0800, "Sun, Rui" wrote: DataFrameWrite.jdbc() does not work? From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Sunday, February 7, 2016 9:54 AM To: Andrew Holway ; dev@spark.apache.org Subject: Re: Fwd: Writing to jdbc da

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
ew Holway ; dev@spark.apache.org Subject: Re: Fwd: Writing to jdbc database from SparkR (1.5.2) Unfortunately I couldn't find a simple workaround. It seems to be an issue with DataFrameWriter.save() that does not work with jdbc source/format For instance, this does not work in Scala either df

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Sun, Rui
DataFrameWrite.jdbc() does not work? From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Sunday, February 7, 2016 9:54 AM To: Andrew Holway ; dev@spark.apache.org Subject: Re: Fwd: Writing to jdbc database from SparkR (1.5.2) Unfortunately I couldn't find a simple workaround. It see

Re: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Felix Cheung
jdbc:mysql://something.rds.amazonaws.com:3306?user=user&password=password").option("dbtable", "table").save()             For Spark 1.5.x, it seems the best option would be to write a JVM wrapper and call it from R. _ From: Andrew Holwa

Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway
Hi, I have a thread on u...@spark.apache.org but I think this might require developer attention. I'm reading data from a database: This is working well. > df <- read.df(sqlContext, source="jdbc", url="jdbc:mysql:// database.foo.eu-west-1.rds.amazonaws.com:3306?user=user&password=pass

Fwd: Spark integration with HCatalog (specifically regarding partitions)

2016-01-19 Thread Elliot West
(Cross posted from u...@spark.apache.org) Hello, I am in the process of evaluating Spark (1.5.2) for a wide range of use cases. In particular I'm keen to understand the depth of the integration with HCatalog (aka the Hive Metastore). I am very encouraged when browsing the source contained within

Fwd: Elasticsearch sink for metrics

2016-01-18 Thread Pete Robbins
The issue I had was with the ElasticsearchReporter and how it maps eg a Gauge in JSON. The "value" was typed to whatever the first Guage was, eg int, which caused issues with some of my other guages which were double. As I say I've just started looking at this and was wanting to see if this was al

Fwd: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
HI, First of all, I'm sorry if you received mail before (apache spark user mailing group). I posted this mail to user mailing group but I didn't receive any informations about resolving so I'd like to post this to dev mailing group. This link points to the thread I'm forwarding, so if you feel co

Fwd: dataframe slow down with tungsten turn on

2015-11-05 Thread gen tang
ance / bug issues, you’d better try 1.5.1 or > 1.5.2rc version. > > > > *From:* gen tang [mailto:gen.tan...@gmail.com ] > *Sent:* Thursday, November 5, 2015 12:43 PM > *To:* dev@spark.apache.org > *Subject:* Fwd: dataframe slow down with tungsten turn on > > > > H

Fwd: dataframe slow down with tungsten turn on

2015-11-04 Thread gen tang
Hi, In fact, I tested the same code with spark 1.5 with tungsten turning off. The result is quite the same as tungsten turning on. It seems that it is not the problem of tungsten, it is simply that spark 1.5 is slower than spark 1.4. Is there any idea about why it happens? Thanks a lot in advance

Fwd: [jira] [Created] (HADOOP-12527) Upgrade Avro dependency to 1.7.7

2015-10-29 Thread Steve Loughran
Coming from AWS dev team. he's probably got a point: even if you cut avro from the hadoop POM transitives, if you run in a Hadoop cluster with the hadoop classpaths getting in, there could be conflict. 1. has anyone seen this? 2. when spark when up to avro 1.7.7 —what broke? It's that risk of b

Fwd: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Reynold Xin
With Jerry's permission, sending this back to the dev list to close the loop. -- Forwarded message -- From: Jerry Lam Date: Tue, Oct 20, 2015 at 3:54 PM Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... To: Reynold Xin Yup, coarse grained mode works just fine. :

  1   2   >