Hi Gordon, hi Till,

Thanks for your feedback. I am happy to contibute by precising how the bug 
occured, if it might help.


First, to describe a bit more what does my Flink job, there is in a part of its 
execution plan a ProcessFunction which basically stores the events as Lucene 
documents in an in-memory Lucene index. When the number of documents reaches a 
threshold, the process function fires Lucene queries to filter the documents 
(then the events) according to user models.


Therefore this process function is dependent on Lucene modules lucene-core, 
lucene-queryparser, lucene-analyzers-common in version 6.3.0 (as a precaution 
we chose the same version than elasticsearch:5.1.2).


Later the event stream is sent in an Elasticseach index via the module 
flink-connector-elasticsearch5.


I have updgraded Flink dependencies from version 1.3.2 to 1.4.2. When the job 
was deployed on a Yarn cluster, it raised the error :

java.util.ServiceConfigurationError: An SPI class of type 
org.apache.lucene.codecs.PostingsFormat with classname 
org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not 
exist, please fix the file 
'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.


So I checked the META-INF/services/org.apache.lucene.codecs.PostingsFormat in 
my job's fat jar. It contained several implementation of PostingsFormat to be 
loaded :

org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
org.apache.lucene.codecs.idversion.IDVersionPostingsFormat

I don't know how the maven-shade-plugin operates but it seems to me that it 
aggregates the same configuration files from different modules in one file.

For example, in elasticsearch-5.1.2.jar, the file 
org.apache.lucene.codecs.PostingsFormat is :

org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat

In flink-connector-elasticsearch5_2.11-1.4.2.jar, the file 
org.apache.lucene.codecs.PostingsFormat is :

org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.idversion.IDVersionPostingsFormat
#
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.search.suggest.document.Completion50PostingsFormat

Since my job's fat jar inherits configuration files in META-INF/services from 
its dependencies, I guess this is the reason why on runtime the Lucene API 
tries to load some classes that are not in the classpath. I had confirmation of 
this intuition when I tried to exclude 
META-INF/services/org.apache.lucene.codecs.* files from 
flink-connector-elasticsearch5. The file 
org.apache.lucene.codecs.PostingsFormat of my jar did not lead to runtime 
exception anymore :

#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.idversion.IDVersionPostingsFormat

I hope my explanation is clear enough. Don't hesitate to ask for more 
information if needed. I would be also be glad if you would point some 
misunderstanding from my part, or even misusages of Flink framework (maybe the 
fact we use a Lucene index as a micro-batch inside a Flink transformation).

Cheers,
Manuel



________________________________
De : Tzu-Li (Gordon) Tai <tzuli...@apache.org>
Envoyé : vendredi 23 mars 2018 10:40:52
À : Till Rohrmann; Haddadi Manuel
Cc : flink-u...@apache.org
Objet : Re: Lucene SPI class loading fails with shaded 
flink-connector-elasticsearch

Hi Manuel,

Thanks a lot for reporting this!

Yes, this issue is most likely related to the recent changes to shading the 
Elasticsearch connector dependencies, though it is a bit curious why I didn’t 
bump into it before while testing it.

The Flink job runs Lucene queries on a data stream which ends up in an 
Elasticsearch index.

Could you explain a bit more where the Lucene queries are executed? Were there 
other dependencies required for this?

I would highly appreciate any opinion on this workaround. Could it have side 
effect ?

I think your workaround wouldn’t be harmful. Could you explain how you came to 
the solution? That would help me in getting to the bottom of the problem (and 
maybe other potential similar issues).

Cheers,
Gordon


On 23 March 2018 at 12:43:31 AM, Till Rohrmann 
(till.rohrm...@gmail.com<mailto:till.rohrm...@gmail.com>) wrote:

Hi Manuel,

thanks for reporting this issue. It sounds to me like a bug we should fix. I've 
pulled Gordon into the conversation since he will most likely know more about 
the ElasticSearch connector shading.

Cheers,
Till

On Thu, Mar 22, 2018 at 5:09 PM, Haddadi Manuel 
<manuel.hadd...@gfi.fr<mailto:manuel.hadd...@gfi.fr>> wrote:

Hello,

When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime 
of a Flink job :

java.util.ServiceConfigurationError: An SPI class of type 
org.apache.lucene.codecs.PostingsFormat with classname 
org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not 
exist, please fix the file 
'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.

I added lucene-suggest dependency and then I encountered this :
java.lang.ClassCastException: class 
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat

The Flink job runs Lucene queries on a data stream which ends up in an 
Elasticsearch index.

It seems to me that this exception is a side effect of shading 
flink-connector-elasticsearch-5 dependencies. Actually, the only solution I 
have found is to re-build flink-connector-elasticsearch-5 jar excluding 
META-INF/services/org.apache.lucene.codecs.*

I would highly appreciate any opinion on this workaround. Could it have side 
effect ?

Thanks. And by the way, congrats to all Flink contributors, this is a pretty 
good piece of technology !

Regards,

Manuel Haddadi


Reply via email to