Thank you Fabian, I think that solves it. I'll need to rig up some tests to
verify, but it looks good.
I used a RichMapFunction to assign ids incrementally to windows (mapping
STREAM_OBJECT to Tuple2 using a private long value in
the mapper that increments on every map call). It works, but by any
Hi Fabian,
I am running on a IDE and the code runs all ok just no output from the
datastream
Thanks
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Re-Csv-data-stream-tp9377p9384.html
Sent from the Apache Flink User Mailing List archive. mail
Hi Fabian, The program runs with no exceptions in an IDE but I can't see the
datastream print messagesThanks
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-csv-reading-tp9376p9382.html
Sent from the Apache Flink User Mailing List
Maybe this can be done by assigning the same window id to each of the N
local windows, and do a
.keyBy(windowId)
.countWindow(N)
This should create a new global window for each window id and collect all N
windows.
Best, Fabian
2016-10-06 22:39 GMT+02:00 AJ Heller :
> The goal is:
> * to split
Hi Greg,The program runs with no exceptions in an IDE but I can't see the
datastream print messagesThanks
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-csv-reading-tp9376p9381.html
Sent from the Apache Flink User Mailing List arc
Hi Greg,
print is only eagerly executed for DataSet programs.
In the DataStream API, print() just appends a print sink and execute() is
required to trigger an execution.
2016-10-06 22:40 GMT+02:00 Greg Hogan :
> The program executes when you call print (same for collect), which is why
> you are
The program executes when you call print (same for collect), which is why
you are seeing an error when calling execute (since there is no new job to
execute). As Fabian noted, you'll need to look in the TaskManager log files
for the printed output if running on a cluster.
On Thu, Oct 6, 2016 at 4:
The goal is:
* to split data, random-uniformly, across N nodes,
* window the data identically on each node,
* transform the windows locally on each node, and
* merge the N parallel windows into a global window stream, such that one
window from each parallel process is merged into a "global wind
Hi,
how are you executing your code? From an IDE or on a running Flink instance?
If you execute it on a running Flink instance, you have to look into the
.out files of the task managers (located in ./log/).
Best, Fabian
2016-10-06 22:08 GMT+02:00 drystan mazur :
> Hello I am reading a csv file
HelloI am reading a csv file with flink 1.1.2 the file loads and runs but
printing shows nothing ?The code runs ok I just wanted to view the
datastream what I am doing wrong ?Thanks
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DataStream-c
HelloI am reading a csv file with flink 1.1.2 the file loads and runs but
printi= ng shows nothing ? TupleCsvInputFormat oilDataIn;
TupleTypeInfo>
oildataTypes; BasicTypeInfo[] types =3D
{BasicTypeInfo.STRING_TYPE_INFO,BasicTypeIn=fo.STRING_TYPE_INFO,BasicTypeInfo.STRING_TYPE_INFO,
Hi Alberto,
if you want to read a single column you have to wrap it in a Tuple1:
val text4 = env.readCsvFile[Tuple1[String]]("file:data.csv"
,includedFields = Array(1))
Best, Fabian
2016-10-06 20:59 GMT+02:00 Alberto Ramón :
> I'm learning readCsvFile
> (I discover if the file ends on "/n", yo
I'm learning readCsvFile
(I discover if the file ends on "/n", you will return a null exception)
*if I try to read only 1 column *
val text4 = env.readCsvFile[String]("file:data.csv" ,includedFields = Array(1))
The error is: he type String has to be a tuple or pojo type. [null]
*If I put >
Hi!
There was an issue in the Kafka 0.9 consumer in Flink concerning
checkpoints. It was relevant mostly for lower-throughput topics /
partitions.
It is fixed in the 1.1.3 release. Can you try out the release candidate and
see if that solves your problem?
See here for details on the release candi
Hello,
With Flink CEP, is there a way to actively listen to pattern matches that
time out? I am under the impression that this is not possible.
In my case I partition a stream containing user web navigation by "userId"
to look for sequences of Event A, followed by B within 4 seconds for each
user
I guess that this is caused by a bug in the checksum calculation. Let
me check that.
On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier wrote:
> I've ran the job once more (always using the checksum branch) and this time
> I got:
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
I've ran the job once more (always using the checksum branch) and this time
I got:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112
at
org.apache.flink.api.common.typeutils.base.EnumSerializer.deserialize(EnumSerializer.java:83)
at
org.apache.flink.api.common.typeutils.base.EnumSeri
Flink has built-in CSV support. Please check the documentation. There
is not Redshift connector in Flink. You will have to use JDBC or
create your own connector.
-Max
On Thu, Oct 6, 2016 at 11:14 AM, sunny patel wrote:
> I have check the forum; I couldn’t find the stuffs that i am looking for
This is a common problem in Event Time which is referred to as late
data. You can a) change the Watermark generation code 2) Allow
elements to be late and re-trigger a window execution.
For 2) see
https://ci.apache.org/projects/flink/flink-docs-master/dev/windows.html#dealing-with-late-data
-Max
It is safe to run multiple jobs concurrently in a Flink cluster. If we
care about resource isolation, you are probably better off with Flink
on Yarn.
That said, we're moving away from this model in FLIP-6 towards a
cluster-per-job model:
https://cwiki.apache.org/confluence/pages/viewpage.action?pa
Hi Konstantin,
This looks fine. Generally it is fine to delete Blobs in /tmp once the
Job is running or has finished. When the job is running, the Flink
classloader has already opened these files. Thus, the file system will
still have these available through the file descriptor and defer
deletion
I have check the forum; I couldn’t find the stuffs that i am looking for
long time on how to write the data from CSV to database in fink Scala.
I have tried the Rich/sink function it appears not working
Literally we are anticipating for the code how to sink the data into
database or any suppor
Just to add to your solution, you can actually use the ClusterClient
(StandaloneClusterClient or YarnClusterClient) to achieve that. For
example,
ClusterClient client = new StandaloneClusterClient(config);
JobID jobID = JobID.fromHexString(jobID);
client.cancelJob(jobID);
Cheers,
Max
On Mon, Oct
Yes, if that's the case you should go with option (2) and run with the
checksums I think.
On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier
wrote:
> The problem is that data is very large and usually cannot run on a single
> machine :(
>
> On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote:
>>
Hi all,
We have some sensors that sends data into kafka. Each kafka partition have a
set of deferent sensor writing data in it. We consume the data from flink.
We want to SUM up the values in half an hour intervals in eventTime(extract
from data).
The result is a keyed stream by sensor_id with ti
Like Fabian said, there is no built-in solution. If you want to use
encryption over a socket, you will have to implement your own socket
source using an encryption library.
On Wed, Oct 5, 2016 at 10:03 AM, Fabian Hueske wrote:
> Hi,
>
> the TextSocketSink is rather meant for demo purposes than to
I think there have been multiple threads like this one.
Please see the thread "Flink scala or Java - Dataload from CSV to database"
On Wed, Oct 5, 2016 at 8:15 AM, sunny patel wrote:
> Hi Guys,
>
> We are in the process of creating Proof of concept.,
> I am looking for the sample project - Flink
> Will the above ensure that dsSide is always available before ds3 elements
> arrive on the connected stream. Am I correct in assuming that ds2 changes
> will continue to be broadcast to ds3 (with no ordering guarantees between ds3
> and dsSide, ofcourse).
Broadcasting is just a partition strat
Hi,
I wonder if create and delete lots of jobs in a cluster can be a problem ?
>From my point of view the impact will be limited, because the task Manager
won't be impacted during the process, it is only a task slot available or
not.
But is it really "free" or a good practice to do that ? Especia
The problem is that data is very large and usually cannot run on a single
machine :(
On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi wrote:
> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh
> wrote:
> > @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager.
> Slots
> > per task mana
On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh wrote:
> @Stephan my flink cluster setup- 5 nodes, each running 1 TaskManager. Slots
> per task manager: 2-4 (I tried varying this to see if this has any impact).
> Network buffers: 5k - 20k (tried different values for it).
Could you run the job fir
31 matches
Mail list logo