Problems with using ZipWithIndex

2015-12-12 Thread Filip Łęczycki
Hi all,

I tried to use ZipWithIndex functionality, accordingly to the Scala
examples posted here:
https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html

however I am not able to call the mentioned function because it cannot be
resolved. I checked the flink code for org.apache.flink.api.scala.DataSet
and there is no such function. I am using the latest version, 0.10.1. Was
it removed or moved to different module? Is there any way to use it?

When i try to use the function from DataSetUtils java module:

data is of type DataSet[AlignmentRecord]
val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data)

I receive following error:
Type mismatch: expected: DataSet[AlignmentRecord], actual:
DataSet[AlignmentRecord]

Could you please guide me how to use this function?

Pozdrawiam,
Filip Łęczycki


Re: Problems with using ZipWithIndex

2015-12-12 Thread Márton Balassi
Hey Filip,

As you are using the scala API it is easier to use the Scala DataSet utils,
which are accessible after the following import:

import org.apache.flink.api.scala.utils._

Then you can do the following:

val indexed = data.zipWithIndex

Best,

Marton


On Sat, Dec 12, 2015 at 7:48 PM, Filip Łęczycki 
wrote:

> Hi all,
>
> I tried to use ZipWithIndex functionality, accordingly to the Scala
> examples posted here:
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html
>
> however I am not able to call the mentioned function because it cannot be
> resolved. I checked the flink code for org.apache.flink.api.scala.DataSet
> and there is no such function. I am using the latest version, 0.10.1. Was
> it removed or moved to different module? Is there any way to use it?
>
> When i try to use the function from DataSetUtils java module:
>
> data is of type DataSet[AlignmentRecord]
> val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data)
>
> I receive following error:
> Type mismatch: expected: DataSet[AlignmentRecord], actual:
> DataSet[AlignmentRecord]
>
> Could you please guide me how to use this function?
>
> Pozdrawiam,
> Filip Łęczycki
>


Re: Serialisation problem

2015-12-12 Thread Robert Metzger
Hi,

Can you check the log output in your IDE or the log files of the Flink
client (./bin/flink). The TypeExtractor is logging why a POJO is not
recognized as a POJO.

The log statements look like this:

20:42:43,465 INFO  org.apache.flink.api.java.typeutils.TypeExtractor
  - class com.dataartisans.debug.MyPojo must have a default constructor
to be used as a POJO.



On Thu, Dec 10, 2015 at 11:24 PM, Abdulrahman kaitoua <
abdulrahman.kait...@outlook.com> wrote:

>
>
> Hello,
>
> I would like to hive directions to make my code work again (thanks in
> advance). My code used to work on versions equal or less than 9.1 but when
> i included 10 or 10.1 i got the following exception.
>
> This type
> (ObjectArrayTypeInfo>)
> cannot be used as key
>
> I understood that it is related to the serialisation of objects. I tried
> to follow the POJO building directions in
> https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization
> with no luck to make it work.
>
> my dataset contains a set of tuples as key and one array of GValues, this
> is a snapshot of the GValue class.
>
>
> sealed trait GValue extends Serializable with Ordered[GValue]{
>   def compare(o : GValue) : Int = {
> o match {
>   case GDouble(v) => this.asInstanceOf[GDouble].v compare v
>   case GString(v) => this.asInstanceOf[GString].v compare v
>   case GInt(v) => this.asInstanceOf[GInt].v compare v
>   case GNull() => 0
> }
>   }
>   def equal(o : GValue) : Boolean = {
> o match {
>   case GInt(value) => try{value.equals(o.asInstanceOf[GInt].v)} catch { 
> case e : Throwable => false }
>   case GDouble(value) => try{value.equals(o.asInstanceOf[GDouble].v)} 
> catch { case e : Throwable => false }
>   case GString(value) => try{value.equals(o.asInstanceOf[GString].v)} 
> catch { case e : Throwable => false }
>   case GNull() => o.isInstanceOf[GNull]
>   case _ => false
> }
>   }
> }
>
> /**
>  * Represents a @GValue that contains an integer
>  * @deprecated
>  * @param v
>  */
> case class GInt(v: Int) extends GValue{
>   def this() = this(0)
>   override def toString() : String = {
> v.toString
>   }
>   override def equals(other : Any) : Boolean = {
> other match {
>   case GInt(value) => value.equals(v)
>   case _ => false
> }
>   }
> }
>
> /**
>  * Represents a @GValue that contains a number as a @Double
>  * @param v number
>  */
> case class GDouble(v: Double) extends GValue {//with Ordered[GDouble]{
>
>   def this() = this(0.0)
>
>   override def equals(other : Any) : Boolean = {
> other match {
>   case GDouble(value) => value.equals(v)
>   case _ => false
> }
>   }
> }
>
> /**
>  * Represents a @GValue that contains a @String
>  * @param v string
>  */
> case class GString(v: String) extends GValue{
>   def this() = this(".")
>   override def toString() : String = {
> v.toString
>   }
>   override def equals(other : Any) : Boolean = {
> other match {
>   case GString(value) => value.equals(v)
>   case _ => false
> }
>   }
> }
>
>
> Regards,
>
>
>
>
> *-Abdulrahman
> Kaitoua-Ph.D.
> Candidate at Politecnico Di Milano*
>
>


Re: Problems with using ZipWithIndex

2015-12-12 Thread Filip Łęczycki
Hi Marton,

Thank you for your answer. I wasn't able to use zipWithIndex in a way that
you stated as i got "cannot resolve" error. However it worked when i used
it like this:

val utils = new DataSetUtils[AlignmentRecord](data)
val index = utils.zipWithIndex

Regards,
Filip Łęczycki

Pozdrawiam,
Filip Łęczycki

2015-12-12 19:56 GMT+01:00 Márton Balassi :

> Hey Filip,
>
> As you are using the scala API it is easier to use the Scala DataSet
> utils, which are accessible after the following import:
>
> import org.apache.flink.api.scala.utils._
>
> Then you can do the following:
>
> val indexed = data.zipWithIndex
>
> Best,
>
> Marton
>
>
> On Sat, Dec 12, 2015 at 7:48 PM, Filip Łęczycki 
> wrote:
>
>> Hi all,
>>
>> I tried to use ZipWithIndex functionality, accordingly to the Scala
>> examples posted here:
>>
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html
>>
>> however I am not able to call the mentioned function because it cannot be
>> resolved. I checked the flink code for org.apache.flink.api.scala.DataSet
>> and there is no such function. I am using the latest version, 0.10.1. Was
>> it removed or moved to different module? Is there any way to use it?
>>
>> When i try to use the function from DataSetUtils java module:
>>
>> data is of type DataSet[AlignmentRecord]
>> val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data)
>>
>> I receive following error:
>> Type mismatch: expected: DataSet[AlignmentRecord], actual:
>> DataSet[AlignmentRecord]
>>
>> Could you please guide me how to use this function?
>>
>> Pozdrawiam,
>> Filip Łęczycki
>>
>
>


Re: S3 Input/Output with temporary credentials (IAM Roles)

2015-12-12 Thread Robert Metzger
Hi Vladimir,

Flink is using Hadoop's S3 File System implementation. It seems that this
feature is not supported by their implementation:
https://issues.apache.org/jira/browse/HADOOP-9680
This issue contains some more information:
https://issues.apache.org/jira/browse/HADOOP-9384 It seems that the s3a
implementation is the one where to implement the feature.

And realistically, I don't see Hadoop fixing this in the foreseeable future
:)

I see the following options:
- We could try adding the feature to Hadoop's s3a implementation. It'll
probably be a few months until the fix is reviewed, merged and released
(but you could probably extract the relevant code into your own project and
run it from there)
- You implement an s3 file system implementation for Flink with the
required features (that is not as hard as it sounds).

Sorry that I can not give you a better solution for this.

Regards,
Robert


On Fri, Dec 11, 2015 at 3:42 PM, Vladimir Stoyak  wrote:

> Our setup involves AWS IAM roles when with permanent access_key and
> access_secret we need to assume specific role (ie getting temporary
> credentials to use AWS resources).
>
> I was wondering what would be the best way handling this, ie how to set 
> fs.s3n.awsAccessKeyId
> and fs.s3n.awsSecretAccessKey programmatically and also how to handle
> expired sessions.
>
> Thanks,
> Vladimir
>