Still the same problem when trying to compile..maven hangs on: Downloading: https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml
On Wed, Jul 24, 2013 at 12:20 AM, Flavio Pompermaier <[email protected]>wrote: > Unfortunately now I'm not at work..I'll try as soon as possible! > > > On Tue, Jul 23, 2013 at 7:48 PM, Wolfgang Hoschek > <[email protected]>wrote: > >> Seems like a transient mvn repo problem. Can you try again? >> >> Wolfgang. >> >> On Jul 23, 2013, at 1:36 AM, Flavio Pompermaier wrote: >> >> > Still problems when building CDK Data Core Module 0.4.2-SNAPSHOT. Maven >> hangs at: >> > >> > Downloading: >> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml >> > Downloading: >> https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml >> > lug 23, 2013 10:35:41 AM >> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry >> > INFO: I/O exception (java.net.ConnectException) caught when processing >> request: Connessione scaduta >> > lug 23, 2013 10:35:41 AM >> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry >> > INFO: I/O exception (java.net.ConnectException) caught when processing >> request: Connessione scaduta >> > lug 23, 2013 10:35:41 AM >> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry >> > INFO: Retrying request >> > lug 23, 2013 10:35:41 AM >> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry >> > INFO: Retrying request >> > >> > >> > >> > On Tue, Jul 23, 2013 at 10:33 AM, Flavio Pompermaier < >> [email protected]> wrote: >> > Sorry, this is caused of our mirror..I remove it and I'll retry.. >> > >> > >> > On Tue, Jul 23, 2013 at 10:31 AM, Flavio Pompermaier < >> [email protected]> wrote: >> > >> > I still get this error: >> > >> > Failed to read artifact descriptor for >> commons-daemon:commons-daemon:jar:1.0.3: Could not transfer artifact >> commons-daemon:commons-daemon:pom:1.0.3 from/to repo ( >> http://dev.okkam.it/artifactory/repo): Failed to transfer file: >> http://dev.okkam.it/artifactory/repo/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom. >> Return code is: 409 -> [Help 1] >> > >> > >> > On Tue, Jul 23, 2013 at 10:22 AM, Wolfgang Hoschek < >> [email protected]> wrote: >> > Tests pass on java 6 but fail on java 7. Correspondingly, I have filed >> https://issues.cloudera.org/browse/CDK-80. We'll fix it. Meanwhile, >> please try java 6. >> > >> > Wolfgang. >> > >> > On Jul 23, 2013, at 12:51 AM, Flavio Pompermaier wrote: >> > >> > > I tried to download the current trunk but it doesn't compile..for >> example it hangs on >> > > >> https://repository.cloudera.com/artifactory/cloudera-repos/com/twitter/parquet-avro/1.0.0-SNAPSHOT/maven-metadata.xml >> > > that doesn't exists anymore.. >> > > >> > > >> > > On Mon, Jul 22, 2013 at 11:14 PM, Flavio Pompermaier < >> [email protected]> wrote: >> > > You couldn't be more precise ;) >> > > >> > > Thanks, >> > > Flavio >> > > >> > > On Mon, Jul 22, 2013 at 11:02 PM, Wolfgang Hoschek < >> [email protected]> wrote: >> > > Docs for the xquery and xslt morphline commands are here (look for >> xquery"): >> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence >> > > >> > > Example morphlines for the new xquery and xslt commands are here: >> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-morphlines >> > > >> > > Sample input data is here: >> https://github.com/cloudera/cdk/tree/master/cdk-morphlines/cdk-morphlines-saxon/src/test/resources/test-documents >> > > >> > > Unit tests are here: >> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-saxon/src/test/java/com/cloudera/cdk/morphline/saxon/SaxonMorphlineTest.java >> > > >> > > Wolfgang. >> > > >> > > On Jul 22, 2013, at 1:41 PM, Flavio Pompermaier wrote: >> > > >> > > > Ok, I'll try to follow the code! Just one last thing: for >> morphine-neon I manage to find the test (in cdk repository) but for the new >> xslt and xquery I'm not able to find the tests code..could you give me an >> hook? >> > > > >> > > > On Mon, Jul 22, 2013 at 9:21 PM, Wolfgang Hoschek < >> [email protected]> wrote: >> > > > There are many tests for this in the morphlines repo. >> > > > >> > > > Wolfgang. >> > > > >> > > > On Jul 22, 2013, at 11:43 AM, Flavio Pompermaiert wrote: >> > > > >> > > > > >> > > > > Thank you for the great support Wolfgang! >> > > > > Flume + Morphlines is undoubtedly an exciting road but its taking >> me too much time :( >> > > > > Do you think you could add some more tests including readJson and >> the new xquery and xslt in trunk? >> > > > > >> > > > > Best, >> > > > > Flavio >> > > > > On Mon, Jul 22, 2013 at 8:12 PM, Wolfgang Hoschek < >> [email protected]> wrote: >> > > > > Looks like the DcXMLParser spits out a metadata field called >> "title" and another title as part of the Tika XML stream. That metadata >> field is then added to the solr document by solrcell. If you add "title" to >> the captures the title from the XML stream gets added as well by solrcell. >> > > > > >> > > > > JSON support has been released in morphlines-0.4.1 (which flume >> trunk is now depending on): >> http://cloudera.github.io/cdk/docs/0.4.1/cdk-morphlines/morphlinesReferenceGuide.html#readJson >> > > > > >> > > > > Note that Tika XML doesn't really support/capture XPath >> extraction with SolrCell. We have added proper support for reading, >> extracting and transforming XML and HTML with XPath, XQuery and XSLT on the >> current morphlines trunk (not yet released), similar to the way we already >> support JSON and Avro. This should make XML handling a lot more >> straightforward, and make the very limited XML SolrCell approach obsolete. >> Look for the new "xquery" and "xslt" command in >> https://github.com/cloudera/cdk/blob/master/cdk-morphlines/src/site/confluence/morphlinesReferenceGuide.confluence >> > > > > >> > > > > Meanwhile, consider using these new commands or, use JSON or >> Avro, or write your own custom morphline commands that extract whatever you >> want from your XML data. >> > > > > >> > > > > Wolfgang. >> > > > > >> > > > > On Jul 22, 2013, at 9:18 AM, Flavio Pompermaier wrote: >> > > > > >> > > > > > Hi to all, >> > > > > > I'm trying to understand how to "master" Morphline >> configuration files in order to put some data into Solr but I'm facing some >> problem with TestMorphlineSolrSink. This is what I done: >> > > > > > >> > > > > > 1) Since I want to index the title of the testXML.xml (i.e. >> "Tika test document") so I commented out all the parsers except >> org.apache.tika.parser.xml.DcXMLParser (which parse Doublin Core metadata) >> > > > > > 2) In schema.xml I added the following field: >> > > > > > <field name="title" type="text_en" indexed="true" >> stored="true" multiValued="false" /> >> > > > > > >> > > > > > But: >> > > > > > - If I don't add anything to fmap or capture everything works >> fine but I don't understand why (who fills that field?). If instead I add >> to capture title or/and to famp title: title (or dc_title:title) Solr >> complains that 2 values are retrieved for 'title' (debugging the values I >> see the title and one empty value in the 'title\ metadata array...). >> > > > > > Thus, the problem is that everything works magically if the >> field is named title, but if I change its name to something like doc_title >> there's no way to make it non-multivalued. Am I right? How can I fix this >> problem? >> > > > > > - I'd like to manage JSON files..How can I map JSON fields to >> Solr fields? Could someone give a simple example? >> > > > > > >> > > > > > Best, >> > > > > > Flavio >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> >
