Problems with using ZipWithIndex
Hi all, I tried to use ZipWithIndex functionality, accordingly to the Scala examples posted here: https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html however I am not able to call the mentioned function because it cannot be resolved. I checked the flink code for org.apache.flink.api.scala.DataSet and there is no such function. I am using the latest version, 0.10.1. Was it removed or moved to different module? Is there any way to use it? When i try to use the function from DataSetUtils java module: data is of type DataSet[AlignmentRecord] val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data) I receive following error: Type mismatch: expected: DataSet[AlignmentRecord], actual: DataSet[AlignmentRecord] Could you please guide me how to use this function? Pozdrawiam, Filip Łęczycki
Re: Problems with using ZipWithIndex
Hey Filip, As you are using the scala API it is easier to use the Scala DataSet utils, which are accessible after the following import: import org.apache.flink.api.scala.utils._ Then you can do the following: val indexed = data.zipWithIndex Best, Marton On Sat, Dec 12, 2015 at 7:48 PM, Filip Łęczycki wrote: > Hi all, > > I tried to use ZipWithIndex functionality, accordingly to the Scala > examples posted here: > > https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html > > however I am not able to call the mentioned function because it cannot be > resolved. I checked the flink code for org.apache.flink.api.scala.DataSet > and there is no such function. I am using the latest version, 0.10.1. Was > it removed or moved to different module? Is there any way to use it? > > When i try to use the function from DataSetUtils java module: > > data is of type DataSet[AlignmentRecord] > val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data) > > I receive following error: > Type mismatch: expected: DataSet[AlignmentRecord], actual: > DataSet[AlignmentRecord] > > Could you please guide me how to use this function? > > Pozdrawiam, > Filip Łęczycki >
Re: Serialisation problem
Hi, Can you check the log output in your IDE or the log files of the Flink client (./bin/flink). The TypeExtractor is logging why a POJO is not recognized as a POJO. The log statements look like this: 20:42:43,465 INFO org.apache.flink.api.java.typeutils.TypeExtractor - class com.dataartisans.debug.MyPojo must have a default constructor to be used as a POJO. On Thu, Dec 10, 2015 at 11:24 PM, Abdulrahman kaitoua < abdulrahman.kait...@outlook.com> wrote: > > > Hello, > > I would like to hive directions to make my code work again (thanks in > advance). My code used to work on versions equal or less than 9.1 but when > i included 10 or 10.1 i got the following exception. > > This type > (ObjectArrayTypeInfo>) > cannot be used as key > > I understood that it is related to the serialisation of objects. I tried > to follow the POJO building directions in > https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization > with no luck to make it work. > > my dataset contains a set of tuples as key and one array of GValues, this > is a snapshot of the GValue class. > > > sealed trait GValue extends Serializable with Ordered[GValue]{ > def compare(o : GValue) : Int = { > o match { > case GDouble(v) => this.asInstanceOf[GDouble].v compare v > case GString(v) => this.asInstanceOf[GString].v compare v > case GInt(v) => this.asInstanceOf[GInt].v compare v > case GNull() => 0 > } > } > def equal(o : GValue) : Boolean = { > o match { > case GInt(value) => try{value.equals(o.asInstanceOf[GInt].v)} catch { > case e : Throwable => false } > case GDouble(value) => try{value.equals(o.asInstanceOf[GDouble].v)} > catch { case e : Throwable => false } > case GString(value) => try{value.equals(o.asInstanceOf[GString].v)} > catch { case e : Throwable => false } > case GNull() => o.isInstanceOf[GNull] > case _ => false > } > } > } > > /** > * Represents a @GValue that contains an integer > * @deprecated > * @param v > */ > case class GInt(v: Int) extends GValue{ > def this() = this(0) > override def toString() : String = { > v.toString > } > override def equals(other : Any) : Boolean = { > other match { > case GInt(value) => value.equals(v) > case _ => false > } > } > } > > /** > * Represents a @GValue that contains a number as a @Double > * @param v number > */ > case class GDouble(v: Double) extends GValue {//with Ordered[GDouble]{ > > def this() = this(0.0) > > override def equals(other : Any) : Boolean = { > other match { > case GDouble(value) => value.equals(v) > case _ => false > } > } > } > > /** > * Represents a @GValue that contains a @String > * @param v string > */ > case class GString(v: String) extends GValue{ > def this() = this(".") > override def toString() : String = { > v.toString > } > override def equals(other : Any) : Boolean = { > other match { > case GString(value) => value.equals(v) > case _ => false > } > } > } > > > Regards, > > > > > *-Abdulrahman > Kaitoua-Ph.D. > Candidate at Politecnico Di Milano* > >
Re: Problems with using ZipWithIndex
Hi Marton, Thank you for your answer. I wasn't able to use zipWithIndex in a way that you stated as i got "cannot resolve" error. However it worked when i used it like this: val utils = new DataSetUtils[AlignmentRecord](data) val index = utils.zipWithIndex Regards, Filip Łęczycki Pozdrawiam, Filip Łęczycki 2015-12-12 19:56 GMT+01:00 Márton Balassi : > Hey Filip, > > As you are using the scala API it is easier to use the Scala DataSet > utils, which are accessible after the following import: > > import org.apache.flink.api.scala.utils._ > > Then you can do the following: > > val indexed = data.zipWithIndex > > Best, > > Marton > > > On Sat, Dec 12, 2015 at 7:48 PM, Filip Łęczycki > wrote: > >> Hi all, >> >> I tried to use ZipWithIndex functionality, accordingly to the Scala >> examples posted here: >> >> https://ci.apache.org/projects/flink/flink-docs-master/apis/zip_elements_guide.html >> >> however I am not able to call the mentioned function because it cannot be >> resolved. I checked the flink code for org.apache.flink.api.scala.DataSet >> and there is no such function. I am using the latest version, 0.10.1. Was >> it removed or moved to different module? Is there any way to use it? >> >> When i try to use the function from DataSetUtils java module: >> >> data is of type DataSet[AlignmentRecord] >> val indexed = DataSetUtils.zipWithIndex[AlignmentRecord](data) >> >> I receive following error: >> Type mismatch: expected: DataSet[AlignmentRecord], actual: >> DataSet[AlignmentRecord] >> >> Could you please guide me how to use this function? >> >> Pozdrawiam, >> Filip Łęczycki >> > >
Re: S3 Input/Output with temporary credentials (IAM Roles)
Hi Vladimir, Flink is using Hadoop's S3 File System implementation. It seems that this feature is not supported by their implementation: https://issues.apache.org/jira/browse/HADOOP-9680 This issue contains some more information: https://issues.apache.org/jira/browse/HADOOP-9384 It seems that the s3a implementation is the one where to implement the feature. And realistically, I don't see Hadoop fixing this in the foreseeable future :) I see the following options: - We could try adding the feature to Hadoop's s3a implementation. It'll probably be a few months until the fix is reviewed, merged and released (but you could probably extract the relevant code into your own project and run it from there) - You implement an s3 file system implementation for Flink with the required features (that is not as hard as it sounds). Sorry that I can not give you a better solution for this. Regards, Robert On Fri, Dec 11, 2015 at 3:42 PM, Vladimir Stoyak wrote: > Our setup involves AWS IAM roles when with permanent access_key and > access_secret we need to assume specific role (ie getting temporary > credentials to use AWS resources). > > I was wondering what would be the best way handling this, ie how to set > fs.s3n.awsAccessKeyId > and fs.s3n.awsSecretAccessKey programmatically and also how to handle > expired sessions. > > Thanks, > Vladimir >