Hi
So I managed to do the map part. I stuc with the "import
scala.util.parsing.json._" library for parsing.
First I read my JSON:
val data=env.readTextFile("file:///home/punit/vik-in")
Then I transformed it so that it can be parsed to a map:
val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") }
Even if the fix works, I still have two issues in my Eclipse build...
In
flink-scala/src/test/scala/org/apache/flink/api/scala/extensions/base/AcceptPFTestBase.scala
Eclipse cannot infer the integer type. It could be fixed if you make the
type explicit (as this is only a test, it might be nice t
Hi all,
as promised, I have updated the Gelly roadmap [1].
Below, I am describing and reasoning about the changes I made. Please, let
me know whether you agree and if you have any other ideas for further
improvements and feature additions.
*1. Operators for highly skewed graphs*:
I have removed t
Till Rohrmann created FLINK-3825:
Summary: Update CEP documentation to include Scala API
Key: FLINK-3825
URL: https://issues.apache.org/jira/browse/FLINK-3825
Project: Flink
Issue Type: Impro
Maximilian Michels created FLINK-3824:
-
Summary: ResourceManager may repeatedly connect to outdated
JobManager in HA mode
Key: FLINK-3824
URL: https://issues.apache.org/jira/browse/FLINK-3824
Proj
Hi
But isnt this behaviour can cause a lot of network activity? Is there any
roadmap or plan to change this behaviour?
On Apr 26, 2016 7:06 PM, "Fabian Hueske" wrote:
> Hi,
>
> Flink starts four tasks and then lazily assigns input splits to these tasks
> with locality preference. So each task ma
Till Rohrmann created FLINK-3823:
Summary: Kafka08ITCase.testOffsetInZookeeper failed on Travis
Key: FLINK-3823
URL: https://issues.apache.org/jira/browse/FLINK-3823
Project: Flink
Issue Type
Hi,
Flink starts four tasks and then lazily assigns input splits to these tasks
with locality preference. So each task may consume more than one split.
This is different from Hadoop MapReduce or Spark which schedule a new task
for each input split.
In your case, the four tasks would be scheduled t
Hi Chenguang,
I've been using the class StreamingMultipleProgramsTestBase, found in
flink-streaming-java test jar as the basis for my integration tests. These
tests spin up a flink cluster (and kafka, and hbase, &c) in a single JVM.
It's not a perfect integration environment, but it's as close as
Hello guys,
I have played Flink for a while, i find out that, the testing code and data
are splitted in different part, for example, some code is in different
model with their own test data, but there is a 'flink-tests' model which
includes a lot of test program.
I am just wondering that, is ther
Hi,
I look at some scheduler documentations but could not find answer to my
question. My question is: suppose that i have a big file on 40 node hadoop
cluster and since it is a big file every node has at least one chunk of the
file. If i write a flink job and want to filter file and if job has
par
Hi Max,
thanks for the Feedback and suggestions! I'll try and address each
paragraph separately.
I'm afraid deciding based on the "StreamTimeCharacteristic is not possible
since a user can use processing-time windows in their job even though the
set the characteristic to event-time. Enabling event
Hi Aljoscha,
Thank you for the detailed design document.
Wouldn't it be ok to allow these new concepts regardless of the time
semantics? For Event Time and Ingestion Time "Lateness" and
"Accumulating/Discarding" make sense. If the user chooses Processing
time then these can be ignored during tran
If you don’t know the size of your matrix, then you cannot partition it
into continuous chunks of rows. The problem is that partitionByRange
samples the data set to generate a distribution and, thus, two matrices
will rarely be partitioned identically. Also if you want to provide a data
distributio
Greg Hogan created FLINK-3822:
-
Summary: Document checksum functions
Key: FLINK-3822
URL: https://issues.apache.org/jira/browse/FLINK-3822
Project: Flink
Issue Type: Improvement
Compone
Chesnay Schepler created FLINK-3821:
---
Summary: Reduce Guava usage in flink-java
Key: FLINK-3821
URL: https://issues.apache.org/jira/browse/FLINK-3821
Project: Flink
Issue Type: Improvement
Hey Greg,
when we made the initial design for Gelly, we discussed having separate
directed and undirected graph types.
We decided to represent undirected graphs by simply adding the
opposite-direction edges, like Apache Giraph does. I'm not sure whether
that was a good idea, but, back then, it see
Hi Sergio,
sorry for the late reply. I figured out your problem. The reason why you
see apparently inconsistent results is that you execute your job multiple
times. Each collect call triggers an eager execution of your Flink job.
Since the intermediate results are not stored the whole topology has
Thanks for the input Fabian!
I also think this is a valuable and lightweight addition. I will add
specific comments on the PR :)
-Vasia.
On 25 April 2016 at 14:30, Fabian Hueske wrote:
> Hi Greg,
>
> sorry for the late reply.
> I am not super familiar with Gelly, but the use cases you describe
Till Rohrmann created FLINK-3820:
Summary: ScalaShellITCase.testPreventRecreationBatch deadlocks on
Travis
Key: FLINK-3820
URL: https://issues.apache.org/jira/browse/FLINK-3820
Project: Flink
Chesnay Schepler created FLINK-3819:
---
Summary: Replace Guava Preconditions usage in flink-gelly-scala
Key: FLINK-3819
URL: https://issues.apache.org/jira/browse/FLINK-3819
Project: Flink
Is
Chesnay Schepler created FLINK-3818:
---
Summary: Remove Guava dependency from flink-gelly-examples
Key: FLINK-3818
URL: https://issues.apache.org/jira/browse/FLINK-3818
Project: Flink
Issue T
Chesnay Schepler created FLINK-3817:
---
Summary: Remove unused Guava dependency from RocksDB StateBackend
Key: FLINK-3817
URL: https://issues.apache.org/jira/browse/FLINK-3817
Project: Flink
Chesnay Schepler created FLINK-3816:
---
Summary: Replace Guava Preconditions usage in flink-clients
Key: FLINK-3816
URL: https://issues.apache.org/jira/browse/FLINK-3816
Project: Flink
Issue
Chesnay Schepler created FLINK-3815:
---
Summary: Replace Guava Preconditions usage in flink-gelly
Key: FLINK-3815
URL: https://issues.apache.org/jira/browse/FLINK-3815
Project: Flink
Issue Ty
Chesnay Schepler created FLINK-3814:
---
Summary: Update code style guide regarding Preconditions
Key: FLINK-3814
URL: https://issues.apache.org/jira/browse/FLINK-3814
Project: Flink
Issue Typ
Hi,
you need to implement the MapFunction interface [1].
Inside the MapFunction you can use any JSON parser library such as Jackson
to parse the String.
The exact logic depends on your use case.
However, you should be careful to not initialize a new parser in each map()
call, because that would b
Hi Eron,
Very interesting idea to support exactly once semantics for sinks via
Git! I would be curious about the performance of such a sink.
Since this currently works on local file systems only (throws an
Exception otherwise), I wonder how does it work on failures when the
"git-${subtaskIndex}"
Thanks for the PR, Andrew! This has been fixed in 1.0.2.
On Thu, Apr 14, 2016 at 7:04 PM, Andrew Palumbo wrote:
> Do you want me to open a jira/pr for this?
>
> Original message
> From: Stephan Ewen
> Date: 04/13/2016 5:16 AM (GMT-05:00)
> To: dev@flink.apache.org
> Subject: Re
Till Rohrmann created FLINK-3813:
Summary: YARNSessionFIFOITCase.testDetachedMode failed on Travis
Key: FLINK-3813
URL: https://issues.apache.org/jira/browse/FLINK-3813
Project: Flink
Issue T
Hi Fabian
Thanks for the reply. Yes my json is separated by new lines. It would have
been great if you had explained the function that goes inside the map. I
tried to use the 'scala.util.parsing.json._' library but got no luck.
On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske wrote:
> Hi Punit,
>
Ufuk Celebi created FLINK-3812:
--
Summary: Kafka09ITCase testAllDeletes fails on Travis
Key: FLINK-3812
URL: https://issues.apache.org/jira/browse/FLINK-3812
Project: Flink
Issue Type: Bug
Hi Punit,
JSON can be hard to parse in parallel due to its nested structure. It
depends on the schema and (textual) representation of the JSON whether and
how it can be done. The problem is that a parallel input format needs to be
able to identify record boundaries without context information. Thi
33 matches
Mail list logo