from:"Takeshi Yamamuro"

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-01 Thread Takeshi Yamamuro

e would like to acknowledge all community members for contributing to >>> this >>> release. This release would not have been possible without you. >>> >>> Dongjoon Hyun >>> >> > > -- > > -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Takeshi Yamamuro

;>> and early feedback to >>> this release. This release would not have been possible without you. >>> >>> To download Spark 3.1.1, head over to the download page: >>> http://spark.apache.org/downloads.html >>> >>> To view the release notes: >>> https://spark.apache.org/releases/spark-release-3-1-1.html >>> >>> -- --- Takeshi Yamamuro

Re: Spark SQL Dataset and BigDecimal

2021-02-17 Thread Takeshi Yamamuro

> Is there any performance penalty for using scala BigDecimal? it's more > convenient from an API point of view than java.math.BigDecimal. > -- --- Takeshi Yamamuro

Re: Custom JdbcConnectionProvider

2020-10-27 Thread Takeshi Yamamuro

2020 at 2:31 PM Takeshi Yamamuro > wrote: > >> Hi, >> >> Please see an example code in >> https://github.com/gaborgsomogyi/spark-jdbc-connection-provider ( >> https://github.com/apache/spark/pull/29024). >> Since it depends on the service loader, I think you

Re: Custom JdbcConnectionProvider

2020-10-27 Thread Takeshi Yamamuro

they are not used. Do I need to register somehow > them? Could someone share a relevant example? > Thx. > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Announcing Apache Spark 3.0.1

2020-09-11 Thread Takeshi Yamamuro

leases/spark-release-3-0-1.html >> >> We would like to acknowledge all community members for contributing to >> this release. This release would not have been possible without you. >> >> >> Thanks, >> Ruifeng Zheng >> >> -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Takeshi Yamamuro

is release would not have been possible > without you. > > To download Spark 3.0.0, head over to the download page: > http://spark.apache.org/downloads.html > > To view the release notes: > https://spark.apache.org/releases/spark-release-3-0-0.html > > > > -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Takeshi Yamamuro

t; Note that you might need to clear your browser cache or >>> to use `Private`/`Incognito` mode according to your browsers. >>> >>> To view the release notes: >>> https://spark.apache.org/releases/spark-release-2.4.6.html >>> >>> We would like to acknowledge all community members for contributing to >>> this >>> release. This release would not have been possible without you. >>> >> -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-08 Thread Takeshi Yamamuro

your browsers. >> >> To view the release notes: >> https://spark.apache.org/releases/spark-release-2.4.5.html >> >> We would like to acknowledge all community members for contributing to >> this >> release. This release would not have been possible without you. >> >> Dongjoon Hyun >> > -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Takeshi Yamamuro

possible >> without you. >> >> To download Spark 3.0.0-preview2, head over to the download page: >> https://archive.apache.org/dist/spark/spark-3.0.0-preview2 >> >> Happy Holidays. >> >> Yuming >> > > > -- > [image: Databricks Summit - Watch the talks] > <https://databricks.com/sparkaisummit/north-america> > -- --- Takeshi Yamamuro

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Takeshi Yamamuro

27;m waiting for > SPARK-27900. > > Please let me know if there is another issue. > > > > Thanks, > > Dongjoon. > > ----- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

[ANNOUNCE] Announcing Apache Spark 2.3.3

2019-02-17 Thread Takeshi Yamamuro

://spark.apache.org/downloads.html To view the release notes: https://spark.apache.org/releases/spark-release-2-3-3.html We would like to acknowledge all community members for contributing to this release. This release would not have been possible without you. Best, Takeshi -- --- Takeshi Yamamuro

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-16 Thread Takeshi Yamamuro

p://apache-spark-user-list.1001560.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- --- Takeshi Yamamuro

Re: Spark jdbc postgres numeric array

2019-01-04 Thread Takeshi Yamamuro

Hi, I filed a jira: https://issues.apache.org/jira/browse/SPARK-26540 On Thu, Jan 3, 2019 at 10:04 PM Takeshi Yamamuro wrote: > Hi, > > I checked that v2.2/v2.3/v2.4/master had the same issue, so can you file a > jira? > I looked over the related code and then I think we need

Re: Spark jdbc postgres numeric array

2019-01-03 Thread Takeshi Yamamuro

--- > Поторопись зарегистрировать самый короткий почтовый адрес @i.ua > https://mail.i.ua/reg - и получи 1Gb для хранения писем > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Spark 2.3.1 not working on Java 10

2018-06-21 Thread Takeshi Yamamuro

er-list.1001560.n3.nabble.com/ >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > Regards, > Vaquar Khan > +1 -224-436-0783 > Greater Chicago > -- --- Takeshi Yamamuro

Re: How to use disk instead of just InMemoryRelation when use JDBC datasource in SPARKSQL?

2018-04-11 Thread Takeshi Yamamuro

d into memory, OOM occurs. > If there is some option to make SparkSQL use Disk if memory not enough? > -- --- Takeshi Yamamuro

Re: Using CBO on Spark 2.3 with analyzed hive tables

2018-03-23 Thread Takeshi Yamamuro

The problem is, that filterSelectivity gets NaN value in my case and > NaN cannot be converted to BigDecimal. > I can try adding simple if, checking the NaN value and test if this helps. > I will also try to understand, why in my case, I am getting NaN. > > Best, > Michael > &g

Re: Using CBO on Spark 2.3 with analyzed hive tables

2018-03-23 Thread Takeshi Yamamuro

. > optimizedPlan(QueryExecution.scala:66) > > at org.apache.spark.sql.execution.QueryExecution$$ > anonfun$toString$2.apply(QueryExecution.scala:204) > > at org.apache.spark.sql.execution.QueryExecution$$ > anonfun$toString$2.apply(QueryExecution.scala:204) > > at org.apache.spark.sql.execution.QueryExecution. > stringOrError(QueryExecution.scala:100) > > at org.apache.spark.sql.execution.QueryExecution. > toString(QueryExecution.scala:204) > > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( > SQLExecution.scala:74) > > at org.apache.spark.sql.DataFrameWriter.runCommand( > DataFrameWriter.scala:654) > > at org.apache.spark.sql.DataFrameWriter.createTable( > DataFrameWriter.scala:458) > > at org.apache.spark.sql.DataFrameWriter.saveAsTable( > DataFrameWriter.scala:437) > > at org.apache.spark.sql.DataFrameWriter.saveAsTable( > DataFrameWriter.scala:393) > > > > This exception only comes, if the statistics exist for the hive tables > being used. > > Has anybody already seen something like this ? > Any assistance would be greatly appreciated! > > Best, > Michael > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Reading CSV with multiLine option invalidates encoding option.

2017-08-15 Thread Takeshi Yamamuro

tion is used here. >> } >> } >> >> val shouldDropHeader = parser.options.headerFlag && file.start == 0 >> UnivocityParser.parseIterator(lines, shouldDropHeader, parser, >> schema) >> } >> >> >> It seems like a bug. >> Is there anyone who had the same problem before? >> >> >> Best wishes, >> Han-Cheol >> >> -- >> == >> Han-Cheol Cho, Ph.D. >> Data scientist, Data Science Team, Data Laboratory >> NHN Techorus Corp. >> >> Homepage: https://sites.google.com/site/priancho/ >> == >> > > > > -- > == > Han-Cheol Cho, Ph.D. > Data scientist, Data Science Team, Data Laboratory > NHN Techorus Corp. > > Homepage: https://sites.google.com/site/priancho/ > == > -- --- Takeshi Yamamuro

Re: custom column types for JDBC datasource writer

2017-07-05 Thread Takeshi Yamamuro

927764/spark-jdbc- > oracle-long-string-fields > > Regards, > Georg > -- --- Takeshi Yamamuro

Re: UDF percentile_approx

2017-06-14 Thread Takeshi Yamamuro

gg(e).show() >> >> and exception is >> >> org.apache.spark.sql.AnalysisException: Undefined function: >> 'percentile_approx'. This function is neither a registered temporary >> function nor a permanent function registered >> >> I've also tryid with callUDF >> >> Regards. >> >> -- >> Ing. Ivaldi Andres >> > > -- --- Takeshi Yamamuro

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-08 Thread Takeshi Yamamuro

Takeshi, Jörn Franke, >>> >>> The problem is even I increase the maxColumns it still have some lines >>> have larger columns than the one I set and it will cost a lot of memory. >>> So I just wanna skip the line has larger columns than the maxColumns I >>

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

2017-06-07 Thread Takeshi Yamamuro

I did some investigation in univocity > <https://github.com/uniVocity/univocity-parsers> library but the way it > handle is throw error that why spark stop the process. > > How to skip the invalid row and just continue to parse next valid one? > Any libs can replace univocity in that job? > > Thanks & regards, > Chanh > -- > Regards, > Chanh > > -- --- Takeshi Yamamuro

Re: Documentation on "Automatic file coalescing for native data sources"?

2017-05-20 Thread Takeshi Yamamuro

feature, how it decides how many partitions to coalesce to, and what counts >>> as a "native data source"? I couldn't find any mention of this feature in >>> the SQL Programming Guide and Google was not helpful. >>> >>> -- >>> Daniel Siegmann >>> Senior Software Engineer >>> *SecurityScorecard Inc.* >>> 214 W 29th Street, 5th Floor >>> New York, NY 10001 >>> >>> -- > Best Regards, > Ayan Guha > -- --- Takeshi Yamamuro

Re: pyspark.sql.DataFrame write error to Postgres DB

2017-04-20 Thread Takeshi Yamamuro

ntainer_1491889279272_0040_01_03/pyspark.zip/pyspark/worker.py", > line 106, in > func = lambda _, it: map(mapper, it) > File > "/home/hadoop/hdtmp/nm-local-dir/usercache/hadoop/appcache/application_1491889279272_0040/container_1491889279272_0040_01_000003/pyspark.zip/pyspark/worker.py", > line 92, in > mapper = lambda a: udf(*a) > File > "/home/hadoop/hdtmp/nm-local-dir/usercache/hadoop/appcache/application_1491889279272_0040/container_1491889279272_0040_01_03/pyspark.zip/pyspark/worker.py", > line 70, in > return lambda *a: f(*a) > File "", line 3, in > TypeError: sequence item 0: expected string, NoneType found > > -- --- Takeshi Yamamuro

Re: From C* to DataFrames with JSON

2017-02-11 Thread Takeshi Yamamuro

3", "b": "bar" } | > > > to Spark DataFrame: > > | id | a | b | > === > | 1 | 123 | xyz | > +--+--+-+ > | 2 | 3 | bar | > > > I'm using Spark 1.6 . > > Thanks > > > JF > -- --- Takeshi Yamamuro

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Takeshi Yamamuro

a Analytics > National University of Ireland, Galway > IDA Business Park, Dangan, Galway, Ireland > Web: http://www.reza-analytics.eu/index.html > <http://139.59.184.114/index.html> > > On 11 February 2017 at 12:43, Takeshi Yamamuro > wrote: > >> Hi, >>

Re: EC2 script is missing in Spark 2.0.0~2.1.0

2017-02-11 Thread Takeshi Yamamuro

MSc > PhD Researcher, INSIGHT Centre for Data Analytics > National University of Ireland, Galway > IDA Business Park, Dangan, Galway, Ireland > Web: http://www.reza-analytics.eu/index.html > <http://139.59.184.114/index.html> > -- --- Takeshi Yamamuro

Re: increasing cross join speed

2017-02-01 Thread Takeshi Yamamuro

(orgClassName1, orgClassName2,dist) > > }).toDF("orgClassName1","orgClassName2,"dist"); > > > > > > > -- --- Takeshi Yamamuro

Re: Kinesis streaming misunderstanding..?

2017-01-27 Thread Takeshi Yamamuro

why are you creating 1 Dstream per shard? It > should be one Dstream corresponding to kinesis stream, isn't it? > > On Fri, Jan 27, 2017 at 8:09 PM, Takeshi Yamamuro > wrote: > >> Hi, >> >> Just a guess though, Kinesis shards sometimes have skew data. >>

Re: Kinesis streaming misunderstanding..?

2017-01-27 Thread Takeshi Yamamuro

or this particular example, the > driver prints out between 20 and 30 for the count value. I expected to see > the count operation parallelized across the cluster. I think I must just be > misunderstanding something fundamental! Can anyone point out where I'm > going wrong? > > Yours in confusion, > Graham > > -- --- Takeshi Yamamuro

Re: spark intermediate data fills up the disk

2017-01-27 Thread Takeshi Yamamuro

at these files had been there since the start of my > streaming application I should have checked the time stamp before doing rm > -rf. Please let me know if I am wrong > > Sent from my iPhone > > On Jan 26, 2017, at 4:24 PM, Takeshi Yamamuro > wrote: > > Yea, I think so

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread Takeshi Yamamuro

Driver" >> df = sqlContext.read.jdbc(url=url,table=table,properties={"user": >> user,"password":password,"driver":driver}) >> >> >> Still the issue persists. >> >> On Fri, Jan 27, 2017 at 11:19 AM, Takeshi Yamamuro >

Re: spark intermediate data fills up the disk

2017-01-26 Thread Takeshi Yamamuro

, Jan 25, 2017 at 11:30 AM, kant kodali wrote: > >> I have bunch of .index and .data files like that fills up my disk. I am >> not sure what the fix is? I am running spark 2.0.2 in stand alone mode >> >> Thanks! >> >> >> >> > > -- --- Takeshi Yamamuro

Re: Oracle JDBC - Spark SQL - Key Not Found: Scale

2017-01-26 Thread Takeshi Yamamuro

.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke( > ReflectionEngine.java:381) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand. > java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:209) > at java.lang.Thread.run(Thread.java:745) > > > > -- > Best Regards, > Ayan Guha > -- --- Takeshi Yamamuro

Re: Dataframe fails to save to MySQL table in spark app, but succeeds in spark shell

2017-01-25 Thread Takeshi Yamamuro

ssorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:187) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:212) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > Any idea why it's happening? A possible bug in spark? > > Thanks, > Dzung. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Issue returning Map from UDAF

2017-01-25 Thread Takeshi Yamamuro

adPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > Attached is the code that you can use to reproduce the error. > > Thanks > Ankur > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > -- --- Takeshi Yamamuro

Re: Catalyst Expression(s) - Cleanup

2017-01-25 Thread Takeshi Yamamuro

can be cleaned up? > > I have seen Generators are allowed to terminate() but my Expression(s) do > not need to emit 0..N rows. > -- --- Takeshi Yamamuro

Re: freeing up memory occupied by processed Stream Blocks

2017-01-25 Thread Takeshi Yamamuro

scenario are Strings coming from kinesis stream > > is there a way to explicitly purge RDD after last step in M/R process once > and for all ? > > thanks much! > > On Fri, Jan 20, 2017 at 2:35 AM, Takeshi Yamamuro > wrote: > >> Hi, >> >> AFAIK, the block

Re: printSchema showing incorrect datatype?

2017-01-24 Thread Takeshi Yamamuro

ql.DataFrame = [x: string] > > scala> x.as[Array[Byte]].printSchema > root > |-- x: string (nullable = true) > > scala> x.as[Array[Byte]].map(x => x).printSchema > root > |-- value: binary (nullable = true) > > why does the first schema show string instead of binary? > -- --- Takeshi Yamamuro

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-24 Thread Takeshi Yamamuro

saying Java.Lang.Long cant be convrted >> to org.apache.hadoop.hive.serde2.io.DoubleWritable >> >> >> >> its working fine on hive but throwing error on spark-sql >> >> I am importing the below packages. >> import java.util.*; >> import org.apache.hadoop.hive.serde2.objectinspector.*; >> import org.apache.hadoop.io.LongWritable; >> import org.apache.hadoop.io.Text; >> import org.apache.hadoop.hive.serde2.io.DoubleWritable; >> >> .Please let me know why it is making issue in spark when perfectly >> running fine on hive >> > -- --- Takeshi Yamamuro

Re: converting timestamp column to a java.util.Date

2017-01-23 Thread Takeshi Yamamuro

1 timestamp column and a bunch of strings. i will need to > convert that > to something compatible with a mongo's ISODate > > kr > marco > > -- --- Takeshi Yamamuro

Re: freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Takeshi Yamamuro

on on some > > is there a way to "release" these blocks free them up , app is sample m/r > > I attempted rdd.unpersist(false) in the code but that did not lead to > memory free up > > thanks much in advance! > -- --- Takeshi Yamamuro

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Takeshi Yamamuro

gt; -- > > The Boston Consulting Group, Inc. > > This e-mail message may contain confidential and/or privileged > information. If you are not an addressee or otherwise authorized to receive > this message, you should not use, copy, disclose or take any action based > on this e-mail or any information contained in the message. If you have > received this material in error, please advise the sender immediately by > reply e-mail and delete this message. Thank you. > -- --- Takeshi Yamamuro

Re: is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-19 Thread Takeshi Yamamuro

:43 WARN hive.HiveContext$$anon$2: Persisting partitioned > data source relation `test`.`my_test` into Hive metastore in Spark SQL > specific format, which is NOT compatible with Hive. Input path(s): > hdfs://nameservice1/user/hive/warehouse/test.db/my_test > > looking at hdfs

Re: need a hive generic udf which also works on spark sql

2017-01-17 Thread Takeshi Yamamuro

st and tried generic udf with object inspector > implementaion which sucessfully ran on both hive and spark-sql > > please share me the git hub link or source code file > > Thanks in advance > Sirisha > -- --- Takeshi Yamamuro

Re: partition size inherited from parent: auto coalesce

2017-01-16 Thread Takeshi Yamamuro

s.size // 4 > > --------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Apache Spark example split/merge shards

2017-01-16 Thread Takeshi Yamamuro

k-user-list. > 1001560.n3.nabble.com/Apache-Spark-example-split-merge-shards-tp28311.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: [Spark Core] Re-using dataframes with limit() produces unexpected results

2017-01-12 Thread Takeshi Yamamuro

. I do see that the intend for limit may be such that no two > limit paths should occur in a single DAG. > > What do you think? What is the correct explanation? > > Anton > -- --- Takeshi Yamamuro

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Takeshi Yamamuro

stingRDD[key#0,nested#1, > nestedArray#2,nestedObjectArray#3,value#4L] > > How can I make Spark to use HashAggregate (like the count(*) expression) > instead of SortAggregate with my UDAF? > > Is it intentional? Is there an issue tracking this? > > --- > Regards, > Andy > -- --- Takeshi Yamamuro

Re: The spark hive udf can read broadcast the variables?

2016-12-18 Thread Takeshi Yamamuro

st the variables? > -- --- Takeshi Yamamuro

Re: Managed memory leak : spark-2.0.2

2016-12-08 Thread Takeshi Yamamuro

dia > available at https://ndownloader.figshare.com/files/5036392 > > Where could i read up more about managed memory leak. Any pointers on what > might be the issue would be highly helpful > > thanks > appu > > > > -- --- Takeshi Yamamuro

Re: How to disable write ahead logs?

2016-11-28 Thread Takeshi Yamamuro

HDFS. My > installation does not enable this HDFS feature, so I would like to disable > WAL in Spark. > > > > Thanks, > > Tim > > > -- --- Takeshi Yamamuro

Re: Why is shuffle write size so large when joining Dataset with nested structure?

2016-11-25 Thread Takeshi Yamamuro

0.n3.nabble.com/Why-is-shuffle-write-size-so-large- > when-joining-Dataset-with-nested-structure-tp28136.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail:

Re: spark streaming with kinesis

2016-11-20 Thread Takeshi Yamamuro

; 2.If not - can I repartition stream data before processing? If yes how- > since JavaDStream has only one method repartition which takes number of > partitions and not the partitioner function ?So it will randomly > repartition the Dstream data. > > Thanks > > > > > >

Re: [SQL/Catalyst] Janino Generated Code Debugging

2016-11-17 Thread Takeshi Yamamuro

; set a breakpoint to the location that calls it and attempt to step into the > code, or reference a line of the stacktrace that should take me into the > code. Any idea how to properly set Janino to debug the Catalyst-generated > code more directly? > > Best, > Alek > -- --- Takeshi Yamamuro

Re: AVRO File size when caching in-memory

2016-11-16 Thread Takeshi Yamamuro

ember 15, 2016 8:44 PM >>>> *To:* Jörn Franke >>>> *Cc:* User >>>> *Subject:* Re: AVRO File size when caching in-memory >>>> >>>> >>>> >>>> Anyone? >>>> >>>> >>>> >>>> On Tue, Nov 15, 2016 at 10:45 AM, Prithish wrote: >>>> >>>> I am using 2.0.1 and databricks avro library 3.0.1. I am running this >>>> on the latest AWS EMR release. >>>> >>>> >>>> >>>> On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke >>>> wrote: >>>> >>>> spark version? Are you using tungsten? >>>> >>>> >>>> > On 14 Nov 2016, at 10:05, Prithish wrote: >>>> > >>>> > Can someone please explain why this happens? >>>> > >>>> > When I read a 600kb AVRO file and cache this in memory (using >>>> cacheTable), it shows up as 11mb (storage tab in Spark UI). I have tried >>>> this with different file sizes, and the size in-memory is always >>>> proportionate. I thought Spark compresses when using cacheTable. >>>> >>>> >>>> >>>> >>>> >>> >>> >> > -- --- Takeshi Yamamuro

Re: Spark Streaming: question on sticky session across batches ?

2016-11-15 Thread Takeshi Yamamuro

custom RDD can help to find the node for the key-->node. >> there is a getPreferredLocation() method. >> But not sure, whether this will be persistent or can vary for some edge >> cases? >> >> Thanks in advance for you help and time ! >> >> Regards, >> Manish >> > > -- --- Takeshi Yamamuro

Re: Spark SQL UDF - passing map as a UDF parameter

2016-11-15 Thread Takeshi Yamamuro

il-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation> -- --- Takeshi Yamamuro

Re: spark streaming with kinesis

2016-11-14 Thread Takeshi Yamamuro

hanks! > > > > On Mon, Nov 14, 2016 at 7:36 PM, Takeshi Yamamuro > wrote: > >> Is "aws kinesis get-shard-iterator --shard-iterator-type LATEST" not >> enough for your usecase? >> >> On Mon, Nov 14, 2016 at 10:23 PM, Shushant Arora < >> shu

Re: spark streaming with kinesis

2016-11-14 Thread Takeshi Yamamuro

On Mon, Nov 14, 2016 at 5:43 PM, Takeshi Yamamuro > wrote: > >> Hi, >> >> The time interval can be controlled by `IdleTimeBetweenReadsInMillis` >> in KinesisClientLibConfiguration though, >> it is not configurable in the current implementation. >> >> The

Re: spark streaming with kinesis

2016-11-14 Thread Takeshi Yamamuro

ta from kinesis . > > Means stream batch interval cannot be less than *spark.streaming.blockInterval > and this should be configrable , Also is there any minimum value for > streaming batch interval ?* > > *Thanks* > > -- --- Takeshi Yamamuro

Re: Convert SparseVector column to Densevector column

2016-11-13 Thread Takeshi Yamamuro

// maropu On Mon, Nov 14, 2016 at 1:20 PM, janardhan shetty wrote: > Hi, > > Is there any easy way of converting a dataframe column from SparseVector > to DenseVector using > > import org.apache.spark.ml.linalg.DenseVector API ? > > Spark ML 2.0 > -- --- Takeshi Yamamuro

Re: spark streaming with kinesis

2016-11-07 Thread Takeshi Yamamuro

eaming? > > Is there any limitation on interval checkpoint - minimum of 1second in > spark streaming with kinesis. But as such there is no limit on checkpoint > interval in KCL side ? > > Thanks > > On Tue, Oct 25, 2016 at 8:36 AM, Takeshi Yamamuro > wrote: > >> I&

Re: spark streaming with kinesis

2016-10-24 Thread Takeshi Yamamuro

i. > > > > On Tue, Oct 25, 2016 at 7:07 AM, Takeshi Yamamuro > wrote: > >> Hi, >> >> The only thing you can do for Kinesis checkpoints is tune the interval of >> them. >> https://github.com/apache/spark/blob/master/external/kinesis >> -asl/s

Re: Get size of intermediate results

2016-10-24 Thread Takeshi Yamamuro

oothly the setup was :) Thx for that. >> >> Servus >> Andy >> >> ----- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- --- Takeshi Yamamuro

Re: spark streaming with kinesis

2016-10-24 Thread Takeshi Yamamuro

umbers ourselves in Kinesis as > it is in Kafka low level consumer ? > > Thanks > > -- --- Takeshi Yamamuro

Re: access spark thrift server from another spark session

2016-10-03 Thread Takeshi Yamamuro

is: is it possible to share data frame/dataset based > temporary tables through Spark thrift server between multiple spark > sessions? > > Thanks > Herman. > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

2016-10-01 Thread Takeshi Yamamuro

ed with hadoop dependency of 2.7.2 and we > use this setting. > We've sort of "verified" it's used by configuring log of file output > commiter > > On 30 September 2016 at 03:12, Takeshi Yamamuro > wrote: > >> Hi, >> >> FYI: Seems >> `

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

2016-09-29 Thread Takeshi Yamamuro

utCommitter- > PartitionBy-SaveMode-Append-tp26398p27810.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Broadcast big dataset

2016-09-28 Thread Takeshi Yamamuro

Any advice is appreciated. > Thank you! > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Broadcast-big-dataset-tp19127.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Tuning Spark memory

2016-09-23 Thread Takeshi Yamamuro

m 0.2 to 0.8 and this solves the > problem. But in the documentation I have found that this is a deprecated > parameter. > > As I have understand, It was replaced by spark.memory.fraction. How to > modify this parameter while taking into account the sort and storage on > HDFS? > > Thanks. > -- --- Takeshi Yamamuro

Re: Spark output data to S3 is very slow

2016-09-16 Thread Takeshi Yamamuro

ease notify us at le...@appannie.com > ** immediately and remove it from your system.* -- --- Takeshi Yamamuro

Re: JDBC Very Slow

2016-09-16 Thread Takeshi Yamamuro

gt;> val jdbcDF = sqlContext.read.format("jdbc").options( >> Map("url" -> "jdbc:postgresql://dbserver:po >> rt/database?user=user&password=password", >>"dbtable" -> “schema.table")).load() >> >> jdbcDF.show >> >> >> If anyone can help, please let me know. >> >> Thanks, >> Ben >> >> > -- --- Takeshi Yamamuro

Re: Spark SQL Thriftserver

2016-09-13 Thread Takeshi Yamamuro

gt;> So I don't think it is going to give you much difference. Unless they >>> have recently changed the design of STS. >>> >>> HTH >>> >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> On 13 September 2016 at 22:32, Benjamin Kim wrote: >>> >>>> Does anyone have any thoughts about using Spark SQL Thriftserver in >>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning >>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know. >>>> >>>> Thanks, >>>> Ben >>>> - >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>> >>> >> > > > -- > Best Regards, > Ayan Guha > -- --- Takeshi Yamamuro

Re: Debugging a spark application in a none lazy mode

2016-09-12 Thread Takeshi Yamamuro

hat you mean, can you give an example? > > > > Hagai. > > > > *From: *Takeshi Yamamuro > *Date: *Monday, September 12, 2016 at 7:24 PM > *To: *Hagai Attias > *Cc: *"user@spark.apache.org" > *Subject: *Re: Debugging a spark application in a none lazy mode

Re: Debugging a spark application in a none lazy mode

2016-09-12 Thread Takeshi Yamamuro

> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Spark_JDBC_Partitions

2016-09-10 Thread Takeshi Yamamuro

satisfy this need(Highly Skewed), because of it, if the >> numPartitions is set to 104, 102 tasks are finished in a minute, 1 task >> finishes in 20 mins and the last one takes forever. >> >> Is there anything I could do to distribute the data evenly into >> par

Re: java.io.IOException: FAILED_TO_UNCOMPRESS(5)

2016-09-10 Thread Takeshi Yamamuro

at java.lang.Thread.run(Thread.java:745) > > env info > > spark on yarn(cluster)scalaVersion := "2.10.6" > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0" % > "provided"libraryDependencies += "org.apache.spark" %% "spark-mllib" % > "1.6.0" % "provided" > > > THANKS > > > -- > cente...@gmail.com > -- --- Takeshi Yamamuro

Re: Any estimate for a Spark 2.0.1 release date?

2016-09-05 Thread Takeshi Yamamuro

Oh, sorry. I forgot attaching an URL; https://www.mail-archive.com/user@spark.apache.org/msg55723.html // maropu On Tue, Sep 6, 2016 at 2:41 PM, Morten Hornbech wrote: > Sorry. Seen what? I think you forgot a link. > > Morten > > Den 6. sep. 2016 kl. 04.51 skrev Takeshi Ya

Re: Any estimate for a Spark 2.0.1 release date?

2016-09-05 Thread Takeshi Yamamuro

> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: broadcast fails on join

2016-08-30 Thread Takeshi Yamamuro

fine (btw the grouped dataframe is 1.5MB > when cached in memory and I have more than 4GB per executor with 8 > executors, the full dataframe is ~8GB) > > > > Thanks, > > Assaf. > > > > -- > View this message in context: broadcast fails on join > <http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-fails-on-join-tp27623.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. > -- --- Takeshi Yamamuro

Re: Caching broadcasted DataFrames?

2016-08-25 Thread Takeshi Yamamuro

rtitions) inside its own memory. > > Since the dataset for d1 is used in two separate joins, should I also > persist it to prevent reading it from disk again? Or would broadcasting the > data already take care of that? > > > Thank you, > Jestin > -- --- Takeshi Yamamuro

Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?

2016-08-25 Thread Takeshi Yamamuro

afaik no. // maropu On Thu, Aug 25, 2016 at 9:16 PM, Tal Grynbaum wrote: > Is/was there an option similar to DirectParquetOutputCommitter to write > json files to S3 ? > > On Thu, Aug 25, 2016 at 2:56 PM, Takeshi Yamamuro > wrote: > >> Hi, >> >> Seems thi

Re: spark 2.0.0 - when saving a model to S3 spark creates temporary files. Why?

2016-08-25 Thread Takeshi Yamamuro

; When Spark saves anything to S3 it creates temporary files. Why? Asking >> this as this requires the the access credentails to be given >> delete permissions along with write permissions. >> > -- --- Takeshi Yamamuro

Re: Spark SQL and number of task

2016-08-04 Thread Takeshi Yamamuro

; | +- Scan > org.apache.spark.sql.cassandra.CassandraSourceRelation@49243f65[id#0L,avg#2] > PushedFilters: [Or(EqualTo(id,94),EqualTo(id,2))] | > > +--+--+ > > > Filters are pushed down, so I cannot realize why it is performing a so big

Re: Spark SQL and number of task

2016-08-04 Thread Takeshi Yamamuro

from v_points d where id in (90,2) group by id; > > query is again fast. > > How can I get the 'execution plan' of the query? > > And also, how can I kill the long running submitted tasks? > > Thanks all! > -- --- Takeshi Yamamuro

Re: SparkSession for RDBMS

2016-08-03 Thread Takeshi Yamamuro

boundary if we > are not specifying anything. > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" > -- --- Takeshi Yamamuro

Re: [Thriftserver2] Controlling number of tasks

2016-08-03 Thread Takeshi Yamamuro

; > > (Obviously haivng fewer large files is better but I don't control the > file generation side of this) > > > > Tips much appreciated > > > --------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Sqoop On Spark

2016-08-01 Thread Takeshi Yamamuro

see the patch(S > QOOP-1532 <https://issues.apache.org/jira/browse/SQOOP-1532>) but it > shows in progess. > > so can not we use sqoop on spark. > > Please help me if you have an any idea. > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" > -- --- Takeshi Yamamuro

Re: Possible to push sub-queries down into the DataSource impl?

2016-07-28 Thread Takeshi Yamamuro

park has > >> the hooks to allow me to try ;-) > >> > >> Cheers, > >> Tim > >> > >> --------- > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > > > > > -- > > Ing. Marco Colombo > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro

Re: Setting spark.sql.shuffle.partitions Dynamically

2016-07-27 Thread Takeshi Yamamuro

he in memory size > of the dataframe halfway through the spark job. So I would need to stop the > context and recreate it in order to set this config. > > Is there any better way to set this? How > does spark.sql.shuffle.partitions work differently than .repartition? > > Brandon > -- --- Takeshi Yamamuro

Re: read parquetfile in spark-sql error

2016-07-25 Thread Takeshi Yamamuro

Driver.java:425) >> at >> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166) >> at >> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at >> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at >> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> Error in query: cannot recognize input near 'parquetTable' 'USING' 'org' >> in table name; line 2 pos 0 >> >> >> am I use it in the wrong way? >> >> >> >> >> >> thanks >> > -- --- Takeshi Yamamuro

Re: Bzip2 to Parquet format

2016-07-25 Thread Takeshi Yamamuro

>> On Jul 24, 2016, at 5:34 PM, janardhan shetty >> wrote: >> >> We have data in Bz2 compression format. Any links in Spark to convert >> into Parquet and also performance benchmarks and uses study materials ? >> >> >> > -- --- Takeshi Yamamuro

Re: Tools for Balancing Partitions by Size

2016-07-12 Thread Takeshi Yamamuro

keley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > > -- --- Takeshi Yamamuro

Re: Spark cluster tuning recommendation

2016-07-12 Thread Takeshi Yamamuro

t;>>- *Cores in use:* 28 Total, 2 Used >>>- *Memory in use:* 56.0 GB Total, 1024.0 MB Used >>>- *Applications:* 1 Running, 6 Completed >>>- *Drivers:* 0 Running, 0 Completed >>>- *Status:* ALIVE >>> >>> Each worker has 8 cores and 4GB memory. >>> >>> My questions is how do people running in production decide these >>> properties - >>> >>> 1) --num-executors >>> 2) --executor-cores >>> 3) --executor-memory >>> 4) num of partitions >>> 5) spark.default.parallelism >>> >>> Thanks, >>> Kartik >>> >>> >>> >> > -- --- Takeshi Yamamuro

Re: Spark crashes with two parquet files

2016-07-10 Thread Takeshi Yamamuro

iveChunkList.readAll(ParquetFileReader.java:755) >> at >> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:494) >> at >> org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.checkEndOfRowGroup(UnsafeRowParquetRecord >> >

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Takeshi Yamamuro

Because It’s only >> test in local with local mode If I deploy on mesos cluster what would >> happened? >> >> Need you guys suggests some solutions for that. Thanks. >> >> Chanh >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha > -- --- Takeshi Yamamuro

Re: Spark crashes with two parquet files

2016-07-10 Thread Takeshi Yamamuro

ks: > > path = '/data/train_parquet/0_0_0.parquet' > train0_df = sqlContext.read.load(path) > train_df.take(1) > > Thanks in advance. > > Samir > -- --- Takeshi Yamamuro

1 2 3 >

1 - 100 of 218 matches

Mail list logo