date:20150912

Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

2015-09-12 Thread Jörn Franke

I am not sure what are you trying to achieve here. Have you thought about using flume? Additionally maybe something like rsync? Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar a écrit : > Hi all, >I have a coded a custom receiver which receives kafka messages. These > Kafka messages have FTP

Re: HyperLogLogUDT

2015-09-12 Thread Nick Pentreath

Inspired by this post: http://eugenezhulenev.com/blog/2015/07/15/interactive-audience-analytics-with-spark-and-hyperloglog/, I've started putting together something based on the Spark 1.5 UDAF interface: https://gist.github.com/MLnick/eca566604f2e4e3c6141 Some questions - 1. How do I get the UDAF

Re: HyperLogLogUDT

2015-09-12 Thread Herman van Hövell tot Westerflier

Hello Nick, I have been working on a (UDT-less) implementation of HLL++. You can find the PR here: https://github.com/apache/spark/pull/8362. This current implements the dense version of HLL++, which is a further development of HLL. It returns a Long, but it shouldn't be to hard to return a Row co

Re: HyperLogLogUDT

2015-09-12 Thread Nick Pentreath

Can I ask why you've done this as a custom implementation rather than using StreamLib, which is already implemented and widely used? It seems more portable to me to use a library - for example, I'd like to export the grouped data with raw HLLs to say Elasticsearch, and then do further on-demand agg

spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Fengdong Yu

Hi, I am using spark1.4.1 data frame, read JSON data, then save it to orc. the code is very simple: DataFrame json = sqlContext.read().json(input); json.write().format("orc").save(output); the job failed. what's wrong with this exception? Thanks. Exception in thread "main" org.apache.spark.sq

Re: HyperLogLogUDT

2015-09-12 Thread Nick Pentreath

I should add that surely the idea behind UDT is exactly that it can (a) fit automatically into DFs and Tungsten and (b) that it can be used efficiently in writing ones own UDTs and UDAFs? On Sat, Sep 12, 2015 at 11:05 AM, Nick Pentreath wrote: > Can I ask why you've done this as a custom implem

Re: Spark 1.5.x: Java files in src/main/scala and vice versa

2015-09-12 Thread Sean Owen

There are actually 33 instances of a Java file in src/main/scala -- I opened https://issues.apache.org/jira/browse/SPARK-10576 to track a discussion and decision. On Fri, Sep 11, 2015 at 3:10 PM, lonikar wrote: > It does not cause any problem when building using maven. But when doing > eclipse:ec

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu

Is it possible that Canonical_URL occurs more than once in your json ? Can you check your json input ? Thanks On Sat, Sep 12, 2015 at 2:05 AM, Fengdong Yu wrote: > Hi, > > I am using spark1.4.1 data frame, read JSON data, then save it to orc. the > code is very simple: > > DataFrame json = sql

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Fengdong Yu

Hi Ted, I checked the JSON, there aren't duplicated key in JSON. Azuryy Yu Sr. Infrastructure Engineer cel: 158-0164-9103 wetchat: azuryy On Sat, Sep 12, 2015 at 5:52 PM, Ted Yu wrote: > Is it possible that Canonical_URL occurs more than once in your json ? > > Can you check your json input

Re: Code generation for GPU

2015-09-12 Thread kiran lonikar

Thanks. Yes thats exactly what i would like to do: copy large amounts of data to GPU RAM, perform computation and get bulk rows back for map/filter or reduce result. It is true that non trivial operations benefit more. Even streaming data to GPU RAM and interleaving computation with data transfer w

Re: Code generation for GPU

2015-09-12 Thread kiran lonikar

Thanks for pointing to the yarn JIRA. For now, it would be good for my talk since it brings out that hadoop and big data community is already aware of the GPUs and making effort to exploit it. Good luck for your talk. That fear is lurking in my mind too :) On 10-Sep-2015 2:08 pm, "Steve Loughran"

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

2015-09-12 Thread Ted Yu

Can you take a look at SPARK-5278 where ambiguity is shown between field names which differ only by case ? Cheers On Sat, Sep 12, 2015 at 3:40 AM, Fengdong Yu wrote: > Hi Ted, > I checked the JSON, there aren't duplicated key in JSON. > > > Azuryy Yu > Sr. Infrastructure Engineer > > cel: 158-0

[Question] ORC - EMRFS Problem

2015-09-12 Thread Cazen Lee

Good Day! I think there are some problems between ORC and AWS EMRFS. When I was trying to read "upper 150M" ORC files from S3, ArrayOutOfIndex Exception occured. I'm sure that it's AWS side issue because there was no exception when trying from HDFS or S3NativeFileSystem. Parquet runs ordinari

Re: HyperLogLogUDT

2015-09-12 Thread Herman van Hövell tot Westerflier

I am typically all for code re-use. The reason for writing this is to prevent the indirection of a UDT and work directly against memory. A UDT will work fine at the moment because we still use GenericMutableRow/SpecificMutableRow as aggregation buffers. However if you would use an UnsafeRow as an A

Spark Streaming..Exception

2015-09-12 Thread Priya Ch

Hello All, When I push messages into kafka and read into streaming application, I see the following exception- I am running the application on YARN and no where broadcasting the message within the application. Just simply reading message, parsing it and populating fields in a class and then prin

Re: HyperLogLogUDT

2015-09-12 Thread Nick Pentreath

Ok, that makes sense. So this is (a) more efficient, since as far as I can see it is updating the HLL registers directly in the buffer for each value, and (b) would be "Tungsten-compatible" as it can work against UnsafeRow? Is it currently possible to specify an UnsafeRow as a buffer in a UDAF? So

Re: HyperLogLogUDT

2015-09-12 Thread Yin Huai

Hi Nick, The buffer exposed to UDAF interface is just a view of underlying buffer (this underlying buffer is shared by different aggregate functions and every function takes one or multiple slots). If you need a UDAF, extending UserDefinedAggregationFunction is the preferred approach. AggregateFun

Re: Spark 1.5.x: Java files in src/main/scala and vice versa

2015-09-12 Thread Reynold Xin

Most these files are just package-info.java there for having a good package index for JavaDoc. If we move them, we will need to create a folder in the java one for each package that exposes any documentation. And it is very likely we will forget to update package-info.java when we update package.sc

Re: HyperLogLogUDT

2015-09-12 Thread Nick Pentreath

Thanks Yin So how does one ensure a UDAF works with Tungsten and UnsafeRow buffers? Or is this something that will be included in the UDAF interface in future? Is there a performance difference between Extending UDAF vs Aggregate2? It's also not clear to me how to handle inputs of dif

Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

Re: HyperLogLogUDT

Re: HyperLogLogUDT

Re: HyperLogLogUDT

spark dataframe transform JSON to ORC meet “column ambigous exception”

Re: HyperLogLogUDT

Re: Spark 1.5.x: Java files in src/main/scala and vice versa

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

Re: Code generation for GPU

Re: Code generation for GPU

Re: spark dataframe transform JSON to ORC meet “column ambigous exception”

[Question] ORC - EMRFS Problem

Re: HyperLogLogUDT

Spark Streaming..Exception

Re: HyperLogLogUDT

Re: HyperLogLogUDT

Re: Spark 1.5.x: Java files in src/main/scala and vice versa

Re: HyperLogLogUDT

19 matches

Site Navigation

Mail list logo

Footer information