[
https://issues.apache.org/jira/browse/METRON-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484274#comment-16484274
]
ASF GitHub Bot commented on METRON-1569:
----------------------------------------
GitHub user nickwallen opened a pull request:
https://github.com/apache/metron/pull/1022
METRON-1569 Allow user to change field name conversion when indexing …
The `ElasticsearchWriter` has a mechanism to transform the field names of a
message before it is written to Elasticsearch. Right now this mechanism is
hard-coded to replace all '.' dots with ':' colons.
This mechanism was needed for Elasticsearch 2.x which did not allow dots in
field names. Now that Metron supports Elasticsearch 5.x this is no longer a
problem. A user should be able to configure the field name transformation when
writing to Elasticsearch, as needed.
While it might have been simpler to just remove the de-dotting mechanism,
this would break backwards compatibility. Taking this approach provides users
with an upgrade path.
## Changes
This change allows the user to configure the field name converter as part
of the index writer configuration.
Acceptable values include the following.
* `DEDOT`: Replaces all '.' with ':' which is the default, backwards
compatible behavior.
* `NOOP`: No field name change.
If no "fieldNameConverter" is defined, it defaults to using `DEDOT` which
maintains backwards compatibility.
A cache of `FieldNameConverter`s is maintained since the index writer
configuration can be changed at run-time and each sensor has its own index
writer configuration.
An example configuration looks-like the following.
```
{
"hdfs" : {
"enabled" : false
},
"elasticsearch" : {
"index" : "bro",
"batchSize" : 5,
"enabled" : true,
"fieldNameConverter": "NOOP"
},
"solr" : {
"enabled" : false
}
}
```
## Code Changes
* Added the `fieldNameConverter` parameter to the Index writer
configuration.
* Moved the `FieldNameConverter` implementations to a dedicated package in
`metron-common`.
* Renamed `ElasticsearchFieldNameConverter` to `DeDotFieldNameConverter`.
* Implemented the `NoopFieldNameConverter` which does not modify the field
name.
* Created `FieldNameConverters` class that allows a user to specify either
`DEDOT` or `NOOP` to choose the appropriate implementation.
* Implemented a `CachedFieldNameConverterFactory` that encapsulates all the
logic for choosing and instantiating the appropriate `FieldNameConverter`.
* Updated `ElasticsearchWriter` to use the
`CachedFieldNameConverterFactory`.
* Updated the README to document the new configuration parameter.
## Manual Testing
1. Launch a development environment and login.
```
vagrant ssh
sudo su -
source /etc/default/metron
```
1. Validate the environment by ensuring alerts are visible in the Alerts UI
and that the Ambari Service Check completes successfully. This ensures that
the change is backwards compatible.
1. Login to the Storm UI and enable DEBUG logging for
`org.apache.metron.common` and `org.apache.metron.elasticsearch`.
1. The Storm worker logs in
`/var/log/storm/worker-artifacts/random_access_indexing*/worker.log` should
contain the following log statements, if you have enabled DEBUG logging
correctly. This shows that the default `DEDOT` converter is in-use.
```
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=source.type, new=source:type
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=adapter.geoadapter.end.ts, new=adapter:geoadapter:end:ts
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=threatintelsplitterbolt.splitter.end.ts,
new=threatintelsplitterbolt:splitter:end:ts
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=adapter.threatinteladapter.begin.ts,
new=adapter:threatinteladapter:begin:ts
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=enrichments.geo.ip_dst_addr.location_point,
new=enrichments:geo:ip_dst_addr:location_point
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=adapter.threatinteladapter.end.ts,
new=adapter:threatinteladapter:end:ts
2018-05-22 14:38:... [DEBUG] Renamed dotted field;
original=enrichmentsplitterbolt.splitter.end.ts,
new=enrichmentsplitterbolt:splitter:end:ts
```
1. Launch the REPL.
```
./bin/stellar -z $ZOOKEEPER
```
1. Change the field name converter to NOOP.
```
[Stellar]>>> conf := SHELL_EDIT()
{
"hdfs" : {
"enabled" : false
},
"elasticsearch" : {
"index" : "bro",
"batchSize" : 5,
"enabled" : true,
"fieldNameConverter": "NOOP"
},
"solr" : {
"enabled" : false
}
}
[Stellar]>>> CONFIG_PUT("INDEXING", conf, "bro")
```
1. It can take up to 5 minutes for the topology to pick-up this change.
The old `FieldNameConverter` needs to expire from the cache first.
1. Go back to the Storm worker logs. When the change takes effect, we
should see a log like the following indicating that the
`NoopFieldNameConverter` was created.
```
2018-05-22 16:... [DEBUG] Created field name converter; sensorType=bro,
configuredName=NOOP, class=NoopFieldNameConverter
```
1. In the same logs, we will start to see tuples fail to be indexed.
Elasticsearch complains because the templates have been created to expect
`source:type`, but that field no longer exists because the `FieldNameConverter`
was changed.
```
2018-05-22 16:0...[ERROR] Failing 1 tuples
org.elasticsearch.index.mapper.MapperParsingException: Could not
dynamically add mapping for field [source.type]. Existing mapping for [source]
must be of type object but found [keyword].
at
org.elasticsearch.index.mapper.DocumentParser.getDynamicParentMapper(DocumentParser.java:876)
~[stormjar.jar:?]
at
org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:596)
~[stormjar.jar:?]
at
org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:396)
~[stormjar.jar:?]
at
org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:373)
~[stormjar.jar:?]
at
org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:93)
~[stormjar.jar:?]
at
org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:66)
~[stormjar.jar:?]
```
## Pull Request Checklist
- [ ] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [ ] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
- [ ] Have you written or updated unit tests and or integration tests to
verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/metron METRON-1569
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1022.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1022
----
commit c478de1f099d62c0f5271cd9b8ad5124089ad735
Author: Nick Allen <nick@...>
Date: 2018-05-21T20:50:25Z
METRON-1569 Allow user to change field name conversion when indexing to
Elasticsearch
----
> Allow user to change field name conversion when indexing to Elasticsearch
> -------------------------------------------------------------------------
>
> Key: METRON-1569
> URL: https://issues.apache.org/jira/browse/METRON-1569
> Project: Metron
> Issue Type: Improvement
> Reporter: Nick Allen
> Assignee: Nick Allen
> Priority: Major
>
> The `ElasticsearchWriter` has a mechanism to transform the field names of a
> message before it is written to Elasticsearch. Right now this mechanism is
> hard-coded to replace all '.' dots with ':' colons.
> This mechanism was needed for Elasticsearch 2.x which did not allow dots in
> field names. Now that Metron supports Elasticsearch 5.x this is no longer a
> problem.
> A user should be able to configure the field name transformation when writing
> to Elasticsearch, as needed.
> While it might have been simpler to just remove the de-dotting mechanism,
> this would break backwards compatibility. Providing users with a means to
> configure this mechanism provides them with an upgrade path.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)