Re: Read JSON file as input

2016-04-26 Thread Punit Naik
Hi So I managed to do the map part. I stuc with the "import scala.util.parsing.json._" library for parsing. First I read my JSON: val data=env.readTextFile("file:///home/punit/vik-in") Then I transformed it so that it can be parsed to a map: val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") }

Re: Eclipse Problems

2016-04-26 Thread Matthias J. Sax
Even if the fix works, I still have two issues in my Eclipse build... In flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala Eclipse cannot infer the integer type. It could be fixed if you make the type explicit (as this is only a test, it might be nice t

Updated Gelly Roadmap

2016-04-26 Thread Vasiliki Kalavri
Hi all, as promised, I have updated the Gelly roadmap [1]. Below, I am describing and reasoning about the changes I made. Please, let me know whether you agree and if you have any other ideas for further improvements and feature additions. *1. Operators for highly skewed graphs*: I have removed t

[jira] [Created] (FLINK-3825) Update CEP documentation to include Scala API

2016-04-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3825: Summary: Update CEP documentation to include Scala API Key: FLINK-3825 URL: https://issues.apache.org/jira/browse/FLINK-3825 Project: Flink Issue Type: Impro

[jira] [Created] (FLINK-3824) ResourceManager may repeatedly connect to outdated JobManager in HA mode

2016-04-26 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3824: - Summary: ResourceManager may repeatedly connect to outdated JobManager in HA mode Key: FLINK-3824 URL: https://issues.apache.org/jira/browse/FLINK-3824 Proj

Re: Data locality and scheduler

2016-04-26 Thread CPC
Hi But isnt this behaviour can cause a lot of network activity? Is there any roadmap or plan to change this behaviour? On Apr 26, 2016 7:06 PM, "Fabian Hueske" wrote: > Hi, > > Flink starts four tasks and then lazily assigns input splits to these tasks > with locality preference. So each task ma

[jira] [Created] (FLINK-3823) Kafka08ITCase.testOffsetInZookeeper failed on Travis

2016-04-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3823: Summary: Kafka08ITCase.testOffsetInZookeeper failed on Travis Key: FLINK-3823 URL: https://issues.apache.org/jira/browse/FLINK-3823 Project: Flink Issue Type

Re: Data locality and scheduler

2016-04-26 Thread Fabian Hueske
Hi, Flink starts four tasks and then lazily assigns input splits to these tasks with locality preference. So each task may consume more than one split. This is different from Hadoop MapReduce or Spark which schedule a new task for each input split. In your case, the four tasks would be scheduled t

Re: Question on Testing

2016-04-26 Thread Nick Dimiduk
Hi Chenguang, I've been using the class StreamingMultipleProgramsTestBase, found in flink-streaming-java test jar as the basis for my integration tests. These tests spin up a flink cluster (and kafka, and hbase, &c) in a single JVM. It's not a perfect integration environment, but it's as close as

Question on Testing

2016-04-26 Thread Chenguang He
Hello guys, I have played Flink for a while, i find out that, the testing code and data are splitted in different part, for example, some code is in different model with their own test data, but there is a 'flink-tests' model which includes a lot of test program. I am just wondering that, is ther

Data locality and scheduler

2016-04-26 Thread CPC
Hi, I look at some scheduler documentations but could not find answer to my question. My question is: suppose that i have a big file on 40 node hadoop cluster and since it is a big file every node has at least one chunk of the file. If i write a flink job and want to filter file and if job has par

Re: [DISCUSS] Allowed Lateness in Flink

2016-04-26 Thread Aljoscha Krettek
Hi Max, thanks for the Feedback and suggestions! I'll try and address each paragraph separately. I'm afraid deciding based on the "StreamTimeCharacteristic is not possible since a user can use processing-time windows in their job even though the set the characteristic to event-time. Enabling event

Re: [DISCUSS] Allowed Lateness in Flink

2016-04-26 Thread Maximilian Michels
Hi Aljoscha, Thank you for the detailed design document. Wouldn't it be ok to allow these new concepts regardless of the time semantics? For Event Time and Ingestion Time "Lateness" and "Accumulating/Discarding" make sense. If the user chooses Processing time then these can be ignored during tran

Re: Partition problem

2016-04-26 Thread Till Rohrmann
If you don’t know the size of your matrix, then you cannot partition it into continuous chunks of rows. The problem is that partitionByRange samples the data set to generate a distribution and, thus, two matrices will rarely be partitioned identically. Also if you want to provide a data distributio

[jira] [Created] (FLINK-3822) Document checksum functions

2016-04-26 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3822: - Summary: Document checksum functions Key: FLINK-3822 URL: https://issues.apache.org/jira/browse/FLINK-3822 Project: Flink Issue Type: Improvement Compone

[jira] [Created] (FLINK-3821) Reduce Guava usage in flink-java

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3821: --- Summary: Reduce Guava usage in flink-java Key: FLINK-3821 URL: https://issues.apache.org/jira/browse/FLINK-3821 Project: Flink Issue Type: Improvement

Re: [DISCUSS] Graph algorithms for vertex and edge degree

2016-04-26 Thread Vasiliki Kalavri
Hey Greg, when we made the initial design for Gelly, we discussed having separate directed and undirected graph types. We decided to represent undirected graphs by simply adding the opposite-direction edges, like Apache Giraph does. I'm not sure whether that was a good idea, but, back then, it see

Re: RichMapPartitionFunction - problems with collect

2016-04-26 Thread Till Rohrmann
Hi Sergio, sorry for the late reply. I figured out your problem. The reason why you see apparently inconsistent results is that you execute your job multiple times. Each collect call triggers an eager execution of your Flink job. Since the intermediate results are not stored the whole topology has

Re: [DISCUSS] Methods for translating Graphs

2016-04-26 Thread Vasiliki Kalavri
Thanks for the input Fabian! I also think this is a valuable and lightweight addition. I will add specific comments on the PR :) -Vasia. On 25 April 2016 at 14:30, Fabian Hueske wrote: > Hi Greg, > > sorry for the late reply. > I am not super familiar with Gelly, but the use cases you describe

[jira] [Created] (FLINK-3820) ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis

2016-04-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3820: Summary: ScalaShellITCase.testPreventRecreationBatch deadlocks on Travis Key: FLINK-3820 URL: https://issues.apache.org/jira/browse/FLINK-3820 Project: Flink

[jira] [Created] (FLINK-3819) Replace Guava Preconditions usage in flink-gelly-scala

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3819: --- Summary: Replace Guava Preconditions usage in flink-gelly-scala Key: FLINK-3819 URL: https://issues.apache.org/jira/browse/FLINK-3819 Project: Flink Is

[jira] [Created] (FLINK-3818) Remove Guava dependency from flink-gelly-examples

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3818: --- Summary: Remove Guava dependency from flink-gelly-examples Key: FLINK-3818 URL: https://issues.apache.org/jira/browse/FLINK-3818 Project: Flink Issue T

[jira] [Created] (FLINK-3817) Remove unused Guava dependency from RocksDB StateBackend

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3817: --- Summary: Remove unused Guava dependency from RocksDB StateBackend Key: FLINK-3817 URL: https://issues.apache.org/jira/browse/FLINK-3817 Project: Flink

[jira] [Created] (FLINK-3816) Replace Guava Preconditions usage in flink-clients

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3816: --- Summary: Replace Guava Preconditions usage in flink-clients Key: FLINK-3816 URL: https://issues.apache.org/jira/browse/FLINK-3816 Project: Flink Issue

[jira] [Created] (FLINK-3815) Replace Guava Preconditions usage in flink-gelly

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3815: --- Summary: Replace Guava Preconditions usage in flink-gelly Key: FLINK-3815 URL: https://issues.apache.org/jira/browse/FLINK-3815 Project: Flink Issue Ty

[jira] [Created] (FLINK-3814) Update code style guide regarding Preconditions

2016-04-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3814: --- Summary: Update code style guide regarding Preconditions Key: FLINK-3814 URL: https://issues.apache.org/jira/browse/FLINK-3814 Project: Flink Issue Typ

Re: Read JSON file as input

2016-04-26 Thread Fabian Hueske
Hi, you need to implement the MapFunction interface [1]. Inside the MapFunction you can use any JSON parser library such as Jackson to parse the String. The exact logic depends on your use case. However, you should be careful to not initialize a new parser in each map() call, because that would b

Re: flink-git - an experiment in exactly-once semantics

2016-04-26 Thread Maximilian Michels
Hi Eron, Very interesting idea to support exactly once semantics for sinks via Git! I would be curious about the performance of such a sink. Since this currently works on local file systems only (throws an Exception otherwise), I wonder how does it work on failures when the "git-${subtaskIndex}"

Re: Kryo StackOverflowError

2016-04-26 Thread Maximilian Michels
Thanks for the PR, Andrew! This has been fixed in 1.0.2. On Thu, Apr 14, 2016 at 7:04 PM, Andrew Palumbo wrote: > Do you want me to open a jira/pr for this? > > Original message > From: Stephan Ewen > Date: 04/13/2016 5:16 AM (GMT-05:00) > To: dev@flink.apache.org > Subject: Re

[jira] [Created] (FLINK-3813) YARNSessionFIFOITCase.testDetachedMode failed on Travis

2016-04-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3813: Summary: YARNSessionFIFOITCase.testDetachedMode failed on Travis Key: FLINK-3813 URL: https://issues.apache.org/jira/browse/FLINK-3813 Project: Flink Issue T

Re: Read JSON file as input

2016-04-26 Thread Punit Naik
Hi Fabian Thanks for the reply. Yes my json is separated by new lines. It would have been great if you had explained the function that goes inside the map. I tried to use the 'scala.util.parsing.json._' library but got no luck. On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske wrote: > Hi Punit, >

[jira] [Created] (FLINK-3812) Kafka09ITCase testAllDeletes fails on Travis

2016-04-26 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-3812: -- Summary: Kafka09ITCase testAllDeletes fails on Travis Key: FLINK-3812 URL: https://issues.apache.org/jira/browse/FLINK-3812 Project: Flink Issue Type: Bug

Re: Read JSON file as input

2016-04-26 Thread Fabian Hueske
Hi Punit, JSON can be hard to parse in parallel due to its nested structure. It depends on the schema and (textual) representation of the JSON whether and how it can be done. The problem is that a parallel input format needs to be able to identify record boundaries without context information. Thi