This is an automated email from the ASF dual-hosted git repository.
mwalch pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/fluo-website.git
The following commit(s) were added to refs/heads/gh-pages by this push:
new 02ba26b Move Fluo Recipes documentation to website (#92)
02ba26b is described below
commit 02ba26be221d3ff897f1d90468afe54bf70a5e28
Author: Mike Walch <[email protected]>
AuthorDate: Mon Oct 2 15:41:01 2017 -0400
Move Fluo Recipes documentation to website (#92)
* Created template and navbar
---
_config.yml | 19 +-
_layouts/recipes-1.2.html | 55 ++++++
_recipes-1-2/getting-started/overview.md | 69 +++++++
_recipes-1-2/index.md | 4 +
_recipes-1-2/recipes/accumulo-export.md | 103 +++++++++++
_recipes-1-2/recipes/combine-queue.md | 210 +++++++++++++++++++++
_recipes-1-2/recipes/export-queue.md | 305 +++++++++++++++++++++++++++++++
_recipes-1-2/recipes/recording-tx.md | 76 ++++++++
_recipes-1-2/recipes/row-hasher.md | 121 ++++++++++++
_recipes-1-2/tools/serialization.md | 76 ++++++++
_recipes-1-2/tools/spark.md | 19 ++
_recipes-1-2/tools/table-optimization.md | 66 +++++++
_recipes-1-2/tools/testing.md | 15 ++
_recipes-1-2/tools/transient.md | 85 +++++++++
14 files changed, 1222 insertions(+), 1 deletion(-)
diff --git a/_config.yml b/_config.yml
index 657cb54..05ef091 100644
--- a/_config.yml
+++ b/_config.yml
@@ -10,6 +10,9 @@ collections:
fluo-docs-1-2:
output: true
permalink: "/docs/fluo/1.2/:path"
+ recipes-1-2:
+ output: true
+ permalink: "/docs/fluo-recipes/1.2/:path"
defaults:
-
@@ -52,6 +55,20 @@ defaults:
docs_base: "/docs/fluo/1.2"
javadoc_base:
"https://static.javadoc.io/org.apache.fluo/fluo-api/1.1.0-incubating"
github_base: "https://github.com/apache/fluo/blob/master"
+ -
+ scope:
+ path: "_recipes-1-2"
+ type: "recipes-1-2"
+ values:
+ layout: "recipes-1.2"
+ title_prefix: "Fluo Recipes Documentation - "
+ version: "1.2.0"
+ minor_release: "1.2"
+ docs_base: "/docs/fluo-recipes/1.2"
+ github_base: "https://github.com/apache/fluo-recipes/blob/master"
+ javadoc_fluo:
"https://static.javadoc.io/org.apache.fluo/fluo-api/1.1.0-incubating"
+ javadoc_core:
"https://static.javadoc.io/org.apache.fluo/fluo-recipes-core/1.1.0-incubating"
+ javadoc_accumulo:
"https://static.javadoc.io/org.apache.fluo/fluo-recipes-accumulo/1.1.0-incubating"
# Number of posts displayed on the home page.
num_home_posts: 5
@@ -65,7 +82,7 @@ latest_recipes_release: "1.1.0-incubating"
# Sets links to external API
api_base: "https://javadoc.io/doc/org.apache.fluo"
-api_static: "https://javadoc.io/page/org.apache.fluo"
+api_static: "https://static.javadoc.io/org.apache.fluo"
fluo_api_base: "https://javadoc.io/doc/org.apache.fluo/fluo-api"
fluo_api_static: "https://javadoc.io/page/org.apache.fluo/fluo-api"
fluo_recipes_core_static:
"https://javadoc.io/page/org.apache.fluo/fluo-recipes-core"
diff --git a/_layouts/recipes-1.2.html b/_layouts/recipes-1.2.html
new file mode 100644
index 0000000..ad51854
--- /dev/null
+++ b/_layouts/recipes-1.2.html
@@ -0,0 +1,55 @@
+---
+layout: default
+---
+
+<div class="row">
+ <div class="col-md-2">
+ <div class="panel-group" id="accordion" role="tablist"
aria-multiselectable="true" data-spy="affix">
+ <div class="panel panel-default">
+ {% assign mydocs = site.recipes-1-2 | group_by: 'category' %}
+ {% assign categories = "getting-started,recipes,tools" | split: "," %}
+ {% for pcat in categories %}
+ {% for dcat in mydocs %}
+ {% if pcat == dcat.name %}
+ <div class="panel-heading" role="tab" id="headingOne">
+ <h4 class="panel-title">
+ <a role="button" data-toggle="collapse"
data-parent="#accordion" href="#collapse{{ pcat }}" aria-expanded="{% if pcat
== page.category %}true{% else %}false{% endif %}" aria-controls="collapse{{
pcat }}">
+ {{ pcat | capitalize | replace: "-", " " }}
+ </a>
+ </h4>
+ </div>
+ <div id="collapse{{pcat}}" class="panel-collapse collapse{% if
pcat == page.category %} in{% endif %}" role="tabpanel"
aria-labelledby="headingOne">
+ <div class="panel-body">
+ {% assign items = dcat.items | sort: 'order' %}
+ {% for item in items %}
+ <div class="row doc-sidebar-link"><a href="{{ item.url }}">{{
item.title }}</a></div>
+ {% endfor %}
+ </div>
+ </div>
+ {% endif %}
+ {% endfor %}
+ {% endfor %}
+ </div>
+ </div>
+ </div>
+ <div class="col-md-10">
+ {% if page.category %}
+ <p>Fluo Recipes {{ page.version }} documentation >> {{
page.category | capitalize | replace: "-", " " }} >> {{
page.title }}</p>
+ {% endif %}
+
+ <div class="alert alert-danger" style="margin-bottom: 0px;"
role="alert">This documentation is for a future release of Fluo! <a href="{{
site.baseurl }}/docs/fluo/{{ site.latest_fluo_release }}/">View documentation
for the latest release</a>.</div>
+
+ {% unless page.nodoctitle %}
+ <div class="row">
+ <div class="col-md-10"><h1>{{ page.title }}</h1></div>
+ <div class="col-md-2"><a class="pull-right" style="margin-top: 25px;"
href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}"
role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this
page</small></a></div>
+ </div>
+ {% endunless %}
+ {{ content }}
+
+ <div class="row" style="margin-top: 20px;">
+ <div class="col-md-10"><strong>Find documentation for all Fluo releases
in the <a href="{{ site.baseurl }}/docs/">archive</strong></div>
+ <div class="col-md-2"><a class="pull-right"
href="https://github.com/apache/fluo-website/edit/master/{{ page.path }}"
role="button"><i class="glyphicon glyphicon-pencil"></i> <small>Edit this
page</small></a></div>
+ </div>
+ </div>
+</div>
diff --git a/_recipes-1-2/getting-started/overview.md
b/_recipes-1-2/getting-started/overview.md
new file mode 100644
index 0000000..459ae2d
--- /dev/null
+++ b/_recipes-1-2/getting-started/overview.md
@@ -0,0 +1,69 @@
+---
+title: Overview
+category: getting-started
+order: 1
+---
+
+Fluo Recipes are common code for Apache Fluo application developers. They
build on the
+[Fluo API][fluo-api] to offer additional functionality to
+developers. They are published separately from Fluo on their own release
schedule.
+This allows Fluo Recipes to iterate and innovate faster than Fluo (which will
maintain
+a more minimal API on a slower release cycle). Fluo Recipes offers code to
implement
+common patterns on top of Fluo's API. It also offers glue code to external
libraries
+like Spark and Kryo.
+
+### Usage
+
+The Fluo Recipes project publishes multiple jars to Maven Central for each
release.
+The `fluo-recipes-core` jar is the primary jar. It is where most recipes live
and where
+they are placed by default if they have minimal dependencies beyond the Fluo
API.
+
+Fluo Recipes with dependencies that bring in many transitive dependencies
publish
+their own jar. For example, recipes that depend on Apache Spark are published
in the
+`fluo-recipes-spark` jar. If you don't plan on using code in the
`fluo-recipes-spark`
+jar, you should avoid including it in your pom.xml to avoid a transitive
dependency on
+Spark.
+
+Below is a sample Maven POM containing all possible Fluo Recipes dependencies:
+
+```xml
+<properties>
+ <fluo-recipes.version>{{ page.version }}</fluo-recipes.version>
+</properties>
+
+<dependencies>
+ <!-- Required. Contains recipes that are only depend on the Fluo API -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-core</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Serialization code that depends on Kryo -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-kryo</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for using Fluo with Accumulo -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-accumulo</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for using Fluo with Spark -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-spark</artifactId>
+ <version>${fluo-recipes.version}</version>
+ </dependency>
+ <!-- Optional. Common code for writing Fluo integration tests -->
+ <dependency>
+ <groupId>org.apache.fluo</groupId>
+ <artifactId>fluo-recipes-test</artifactId>
+ <version>${fluo-recipes.version}</version>
+ <scope>test</scope>
+ </dependency>
+</dependencies>
+```
+
+[fluo-api]: https://fluo.apache.org/apidocs/fluo/
diff --git a/_recipes-1-2/index.md b/_recipes-1-2/index.md
new file mode 100644
index 0000000..f59c6ff
--- /dev/null
+++ b/_recipes-1-2/index.md
@@ -0,0 +1,4 @@
+---
+title: Apache Fluo documentation
+redirect_to: getting-started/overview
+---
diff --git a/_recipes-1-2/recipes/accumulo-export.md
b/_recipes-1-2/recipes/accumulo-export.md
new file mode 100644
index 0000000..1daac62
--- /dev/null
+++ b/_recipes-1-2/recipes/accumulo-export.md
@@ -0,0 +1,103 @@
+---
+title: Accumulo Export
+category: recipes
+order: 3
+---
+
+## Background
+
+The [Export Queue Recipe][1] provides a generic foundation for building export
mechanism to any
+external data store. The [AccumuloExporter] provides an [Exporter] for writing
to
+Accumulo. [AccumuloExporter] is located in the `fluo-recipes-accumulo` module
and provides the
+following functionality:
+
+ * Safely batches writes to Accumulo made by multiple transactions exporting
data.
+ * Stores Accumulo connection information in Fluo configuration, making it
accessible by Export
+ Observers running on other nodes.
+ * Provides utility code that make it easier and shorter to code common
Accumulo export patterns.
+
+## Example Use
+
+Exporting to Accumulo is easy. Follow the steps below:
+
+1. First, implement [AccumuloTranslator]. Your implementation translates
exported
+ objects to Accumulo Mutations. For example, the `SimpleTranslator` class
below translates String
+ key/values and into mutations for Accumulo. This step is optional, a
lambda could
+ be used in step 3 instead of creating a class.
+
+ ```java
+ public class SimpleTranslator implements AccumuloTranslator<String,String>
{
+
+ @Override
+ public void translate(SequencedExport<String, String> export,
Consumer<Mutation> consumer) {
+ Mutation m = new Mutation(export.getKey());
+ m.put("cf", "cq", export.getSequence(), export.getValue());
+ consumer.accept(m);
+ }
+ }
+ ```
+
+2. Configure an `ExportQueue` and the export table prior to initializing Fluo.
+
+ ```java
+ FluoConfiguration fluoConfig = ...;
+
+ String instance = // Name of accumulo instance exporting to
+ String zookeepers = // Zookeepers used by Accumulo instance exporting
to
+ String user = // Accumulo username, user that can write to
exportTable
+ String password = // Accumulo user password
+ String exportTable = // Name of table to export to
+
+ // Set properties for table to export to in Fluo app configuration.
+ AccumuloExporter.configure(EXPORT_QID).instance(instance, zookeepers)
+ .credentials(user, password).table(exportTable).save(fluoConfig);
+
+ // Set properties for export queue in Fluo app configuration
+
ExportQueue.configure(EXPORT_QID).keyType(String.class).valueType(String.class)
+ .buckets(119).save(fluoConfig);
+
+ // Initialize Fluo using fluoConfig
+ ```
+
+3. In the applications `ObserverProvider`, register an observer that will
process exports and write
+ them to Accumulo using [AccumuloExporter]. Also, register observers that
add to the export
+ queue.
+
+ ```java
+ public class MyObserverProvider implements ObserverProvider {
+
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+ SimpleConfiguration appCfg = ctx.getAppConfiguration();
+
+ ExportQueue<String, String> expQ = ExportQueue.getInstance(EXPORT_QID,
appCfg);
+
+ // Register observer that will processes entries on export queue and
write them to the Accumulo
+ // table configured earlier. SimpleTranslator from step 1 is passed
here, could have used a
+ // lambda instead.
+ expQ.registerObserver(obsRegistry,
+ new AccumuloExporter<>(EXPORT_QID, appCfg, new
SimpleTranslator()));
+
+ // An example observer created using a lambda that adds to the export
queue.
+ obsRegistry.forColumn(OBS_COL, WEAK).useObserver((tx,row,col) -> {
+ // Read some data and do some work
+
+ // Add results to export queue
+ String key = // key that identifies export
+ String value = // object to export
+ expQ.add(tx, key, value);
+ });
+ }
+ }
+ ```
+
+## Other use cases
+
+The `getTranslator()` method in [AccumuloReplicator] creates a specialized
[AccumuloTranslator] for replicating a Fluo table to Accumulo.
+
+[1]: {{ page.docs_base }}/recipes/export-queue/
+[Exporter]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/export/function/Exporter.html
+[AccumuloExporter]: {{ page.javadoc_accumulo
}}/org/apache/fluo/recipes/accumulo/export/function/AccumuloExporter.html
+[AccumuloTranslator]: {{ page.javadoc_accumulo
}}/org/apache/fluo/recipes/accumulo/export/function/AccumuloTranslator.html
+[AccumuloReplicator]: {{ page.javadoc_accumulo
}}/org/apache/fluo/recipes/accumulo/export/AccumuloReplicator.html
+
diff --git a/_recipes-1-2/recipes/combine-queue.md
b/_recipes-1-2/recipes/combine-queue.md
new file mode 100644
index 0000000..56cf51c
--- /dev/null
+++ b/_recipes-1-2/recipes/combine-queue.md
@@ -0,0 +1,210 @@
+---
+title: Combine Queue
+category: recipes
+order: 1
+---
+
+## Background
+
+When many transactions try to modify the same keys, collisions will occur.
Too many collisions
+cause transactions to fail and throughput to nose dive. For example, consider
[phrasecount]
+which has many transactions processing documents. Each transaction counts the
phrases in a document
+and then updates global phrase counts. Since transaction attempts to update
many phrases
+, the probability of collisions is high.
+
+## Solution
+
+The [combine queue recipe][CombineQueue] provides a reusable solution for
updating many keys while
+avoiding collisions. The recipe also organizes updates into batches in order
to improve throughput.
+
+This recipes queues updates to keys for other transactions to process. In the
phrase count example
+transactions processing documents queue updates, but do not actually update
the counts. Below is an
+example of computing phrasecounts using this recipe.
+
+ * TX1 queues `+1` update for phrase `we want lambdas now`
+ * TX2 queues `+1` update for phrase `we want lambdas now`
+ * TX3 reads the updates and current value for the phrase `we want lambdas
now`. There is no current value and the updates sum to 2, so a new value of 2
is written.
+ * TX4 queues `+2` update for phrase `we want lambdas now`
+ * TX5 queues `-1` update for phrase `we want lambdas now`
+ * TX6 reads the updates and current value for the phrase `we want lambdas
now`. The current value is 2 and the updates sum to 1, so a new value of 3 is
written.
+
+Transactions processing updates have the ability to make additional updates.
+For example in addition to updating the current value for a phrase, the new
+value could also be placed on an export queue to update an external database.
+
+### Buckets
+
+A simple implementation of this recipe would have an update queue for each
key. However the
+implementation is slightly more complex. Each update queue is in a bucket and
transactions process
+all of the updates in a bucket. This allows more efficient processing of
updates for the following
+reasons :
+
+ * When updates are queued, notifications are made per bucket(instead of per a
key).
+ * The transaction doing the update can scan the entire bucket reading
updates, this avoids a seek for each key being updated.
+ * Also the transaction can request a batch lookup to get the current value of
all the keys being updated.
+ * Any additional actions taken on update (like adding something to an export
queue) can also be batched.
+ * Data is organized to make reading exiting values for keys in a bucket more
efficient.
+
+Which bucket a key goes to is decided using hash and modulus so that multiple
updates for a key go
+to the same bucket.
+
+The initial number of tablets to create when applying table optimizations can
be controlled by
+setting the buckets per tablet option when configuring a Combine Queue. For
example if you
+have 20 tablet servers and 1000 buckets and want 2 tablets per tserver
initially then set buckets
+per tablet to 1000/(2*20)=25.
+
+## Example Use
+
+The following code snippets show how to use this recipe for wordcount. The
first step is to
+configure it before initializing Fluo. When initializing an ID is needed.
This ID is used in two
+ways. First, the ID is used as a row prefix in the table. Therefore nothing
else should use that
+row range in the table. Second, the ID is used in generating configuration
keys.
+
+The following snippet shows how to configure a combine queue.
+
+```java
+FluoConfiguration fluoConfig = ...;
+
+// Set application properties for the combine queue. These properties are
read later by
+// the observers running on each worker.
+CombineQueue.configure(WcObserverProvider.ID)
+ .keyType(String.class).valueType(Long.class).buckets(119).save(fluoConfig);
+
+fluoConfig.setObserverProvider(WcObserverProvider.class);
+
+// initialize Fluo using fluoConfig
+```
+
+Assume the following observer is triggered when a documents is updated. It
examines new
+and old document content and determines changes in word counts. These changes
are pushed to a
+combine queue.
+
+```java
+public class DocumentObserver implements StringObserver {
+ // word count combine queue
+ private CombineQueue<String, Long> wccq;
+
+ public static final Column NEW_COL = new Column("content", "new");
+ public static final Column CUR_COL = new Column("content", "current");
+
+ public DocumentObserver(CombineQueue<String, Long> wccq) {
+ this.wccq = wccq;
+ }
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ Preconditions.checkArgument(col.equals(NEW_COL));
+
+ String newContent = tx.gets(row, NEW_COL);
+ String currentContent = tx.gets(row, CUR_COL, "");
+
+ Map<String, Long> newWordCounts = getWordCounts(newContent);
+ Map<String, Long> currentWordCounts = getWordCounts(currentContent);
+
+ // determine changes in word counts between old and new document content
+ Map<String, Long> changes = calculateChanges(newWordCounts,
currentWordCounts);
+
+ // queue updates to word counts for processing by other transactions
+ wccq.add(tx, changes);
+
+ // update the current content and delete the new content
+ tx.set(row, CUR_COL, newContent);
+ tx.delete(row, NEW_COL);
+ }
+
+ private static Map<String, Long> getWordCounts(String doc) {
+ // TODO extract words from doc
+ }
+
+ private static Map<String, Long> calculateChanges(Map<String, Long>
newCounts,
+ Map<String, Long> currCounts) {
+ Map<String, Long> changes = new HashMap<>();
+
+ // guava Maps class
+ MapDifference<String, Long> diffs = Maps.difference(currCounts, newCounts);
+
+ // compute the diffs for words that changed
+ changes.putAll(Maps.transformValues(diffs.entriesDiffering(),
+ vDiff -> vDiff.rightValue() - vDiff.leftValue()));
+
+ // add all new words
+ changes.putAll(diffs.entriesOnlyOnRight());
+
+ // subtract all words no longer present
+ changes.putAll(Maps.transformValues(diffs.entriesOnlyOnLeft(), l -> l *
-1));
+
+ return changes;
+ }
+}
+```
+
+Each combine queue has two extension points, a [combiner][Combiner] and a
[change
+observer][ChangeObserver]. The combine queue configures a Fluo observer to
process queued
+updates. When processing updates the two extension points are called. The
code below shows
+how to use these extension points.
+
+A change observer can do additional processing when a batch of key values are
updated. Below
+updates are queued for export to an external database. The export is given
the new and old value
+allowing it to delete the old value if needed.
+
+```java
+public class WcObserverProvider implements ObserverProvider {
+
+ public static final String ID = "wc";
+
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+
+ ExportQueue<String, MyDatabaseExport> exportQ = createExportQueue(ctx);
+
+ // Create a combine queue for computing word counts.
+ CombineQueue<String, Long> wcMap = CombineQueue.getInstance(ID,
ctx.getAppConfiguration());
+
+ // Register observer that updates the Combine Queue
+ obsRegistry.forColumn(DocumentObserver.NEW_COL, STRONG).useObserver(new
DocumentObserver(wcMap));
+
+ // Used to join new and existing values for a key. The lambda sums all
values and returns
+ // Optional.empty() when the sum is zero. Returning Optional.empty()
causes the key/value to be
+ // deleted. Could have used the built in SummingCombiner.
+ Combiner<String, Long> combiner = input ->
input.stream().reduce(Long::sum).filter(l -> l != 0);
+
+ // Called when the value of a key changes. The lambda exports these
changes to an external
+ // database. Make sure to read ChangeObserver's javadoc.
+ ChangeObserver<String, Long> changeObs = (tx, changes) -> {
+ for (Change<String, Long> update : changes) {
+ String word = update.getKey();
+ Optional<Long> oldVal = update.getOldValue();
+ Optional<Long> newVal = update.getNewValue();
+
+ // Queue an export to let an external database know the word count has
changed.
+ exportQ.add(tx, word, new MyDatabaseExport(oldVal, newVal));
+ }
+ };
+
+ // Register observer that handles updates to the CombineQueue. This
observer will use the
+ // combiner and valueObserver.
+ wcMap.registerObserver(obsRegistry, combiner, changeObs);
+ }
+}
+```
+
+## Guarantees
+
+This recipe makes two important guarantees about updates for a key when it
+calls `process()` on a [ChangeObserver].
+
+ * The new value reported for an update will be derived from combining all
+ updates that were committed before the transaction thats processing updates
+ started. The implementation may have to make multiple passes over queued
+ updates to achieve this. In the situation where TX1 queues a `+1` and later
+ TX2 queues a `-1` for the same key, there is no need to worry about only
seeing
+ the `-1` processed. A transaction that started processing updates after TX2
+ committed would process both.
+ * The old value will always be what was reported as the new value in the
+ previous transaction that called `ChangeObserver.process()`.
+
+[phrasecount]: https://github.com/astralway/phrasecount
+[CombineQueue]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/combine/CombineQueue.html
+[ChangeObserver]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/combine/ChangeObserver.html
+[Combiner]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/combine/Combiner.html
diff --git a/_recipes-1-2/recipes/export-queue.md
b/_recipes-1-2/recipes/export-queue.md
new file mode 100644
index 0000000..7b42a9b
--- /dev/null
+++ b/_recipes-1-2/recipes/export-queue.md
@@ -0,0 +1,305 @@
+---
+title: Export Queue
+category: recipes
+order: 2
+---
+
+## Background
+
+Fluo is not suited for servicing low latency queries for two reasons. First,
the implementation of
+transactions are designed for throughput. To get throughput, transactions
recover lazily from
+failures and may wait on other transactions. Both of these design decisions
can
+lead to delays of individual transactions, but do not negatively impact
throughput. The second
+reason is that Fluo observers executing transactions will likely cause a large
number of random
+accesses. This could lead to high response time variability for an individual
random access. This
+variability would not impede throughput but would impede the goal of low
latency.
+
+One way to make data transformed by Fluo available for low latency queries is
+to export that data to another system. For example Fluo could run on
+cluster A, continually transforming a large data set, and exporting data to
+Accumulo tables on cluster B. The tables on cluster B would service user
+queries. Fluo Recipes has built in support for [exporting to Accumulo][aeq],
+however this recipe can be used to export to systems other than Accumulo, like
+Redis, Elasticsearch, MySQL, etc.
+
+Exporting data from Fluo is easy to get wrong which is why this recipe exists.
+To understand what can go wrong consider the following example observer
+transaction.
+
+```java
+public class MyObserver implements StringObserver {
+
+ static final Column UPDATE_COL = new Column("meta", "numUpdates");
+ static final Column COUNTER_COL = new Column("meta", "counter1");
+
+ //reperesents a Query system extrnal to Fluo that is updated by Fluo
+ QuerySystem querySystem;
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ int oldCount = Integer.parseInt(tx.gets(row, COUNTER_COL, "0"));
+ int numUpdates = Integer.parseInt(tx.gets(row, UPDATE_COL, "0"));
+ int newCount = oldCount + numUpdates;
+
+ tx.set(row, COUNTER_COL, "" + newCount);
+ tx.delete(row, UPDATE_COL);
+
+ //Build an inverted index in the query system, based on count from the
+ //meta:counter1 column in fluo. Do this by creating rows for the
+ //external query system based on the count.
+ String oldCountRow = String.format("%06d", oldCount);
+ String newCountRow = String.format("%06d", newCount);
+
+ //add a new entry to the inverted index
+ querySystem.insertRow(newCountRow, row);
+ //remove the old entry from the inverted index
+ querySystem.deleteRow(oldCountRow, row);
+ }
+}
+```
+
+The above example would keep the external index up to date beautifully as long
+as the following conditions are met.
+
+ * Threads executing transactions always complete successfully.
+ * Only a single thread ever responds to a notification.
+
+However these conditions are not guaranteed by Fluo. Multiple threads may
+attempt to process a notification concurrently (only one may succeed). Also at
+any point in time a transaction may fail (for example the computer executing it
+may reboot). Both of these problems will occur and will lead to corruption of
+the external index in the example. The inverted index and Fluo will become
+inconsistent. The inverted index will end up with multiple entries (that are
+never cleaned up) for single entity even though the intent is to only have one.
+
+The root of the problem in the example above is that its exporting uncommitted
+data. There is no guarantee that setting the column `<row>:meta:counter1` to
+`newCount` will succeed until the transaction is successfully committed.
+However, `newCountRow` is derived from `newCount` and written to the external
query
+system before the transaction is committed (Note : for observers, the
+transaction is committed by the framework after `process(...)` is called). So
+if the transaction fails, the next time it runs it could compute a completely
+different value for `newCountRow` (and it would not delete what was written by
the
+failed transaction).
+
+## Solution
+
+The simple solution to the problem of exporting uncommitted data is to only
+export committed data. There are multiple ways to accomplish this. This
+recipe offers a reusable implementation of one method. This recipe has the
+following elements:
+
+ * An export queue that transactions can add key/values to. Only if the
transaction commits successfully will the key/value end up in the queue. A
Fluo application can have multiple export queues, each one must have a unique
id.
+ * When a key/value is added to the export queue, its given a sequence number.
This sequence number is based on the transactions start timestamp.
+ * Each export queue is configured with an observer that processes key/values
that were successfully committed to the queue.
+ * When key/values in an export queue are processed, they are deleted so the
export queue does not keep any long term data.
+ * Key/values in an export queue are placed in buckets. This is done so that
all of the updates in a bucket can be processed in a single transaction. This
allows an efficient implementation of this recipe in Fluo. It can also lead to
efficiency in a system being exported to, if the system can benefit from
batching updates. The number of buckets in an export queue is configurable.
+
+There are three requirements for using this recipe :
+
+ * Must configure export queues before initializing a Fluo application.
+ * Transactions adding to an export queue must get an instance of the queue
using its unique QID.
+ * Must create a class or lambda that implements [Exporter] in order to
process exports.
+
+## Example Use
+
+This example shows how to incrementally build an inverted index in an external
query system using an
+export queue. The class below is simple POJO used as the value for the export
queue.
+
+```java
+class CountUpdate {
+ public int oldCount;
+ public int newCount;
+
+ public CountUpdate(int oc, int nc) {
+ this.oldCount = oc;
+ this.newCount = nc;
+ }
+}
+```
+
+The following code shows how to configure an export queue. This code
+modifies the FluoConfiguration object with options needed for the export queue.
+This FluoConfiguration object should be used to initialize the fluo
+application.
+
+```java
+public class FluoApp {
+
+ // export queue id "ici" means inverted count index
+ public static final String EQ_ID = "ici";
+
+ static final Column UPDATE_COL = new Column("meta", "numUpdates");
+ static final Column COUNTER_COL = new Column("meta", "counter1");
+
+ public static class AppObserverProvider implements ObserverProvider {
+ @Override
+ public void provide(Registry obsRegistry, Context ctx) {
+ ExportQueue<String, CountUpdate> expQ =
+ ExportQueue.getInstance(EQ_ID, ctx.getAppConfiguration());
+
+ // register observer that will queue data to export
+ obsRegistry.forColumn(UPDATE_COL, STRONG).useObserver(new
MyObserver(expQ));
+
+ // register observer that will export queued data
+ expQ.registerObserver(obsRegistry, new CountExporter());
+ }
+ }
+
+ /**
+ * Call this method before initializing Fluo.
+ *
+ * @param fluoConfig the configuration object that will be used to
initialize Fluo
+ */
+ public static void preInit(FluoConfiguration fluoConfig) {
+
+ // Set properties for export queue in Fluo app configuration
+ ExportQueue.configure(QUEUE_ID)
+ .keyType(String.class)
+ .valueType(CountUpdate.class)
+ .buckets(1009)
+ .bucketsPerTablet(10)
+ .save(getFluoConfiguration());
+
+ fluoConfig.setObserverProvider(AppObserverProvider.class);
+ }
+}
+```
+
+Below is updated version of the observer from above thats now using an export
+queue.
+
+```java
+public class MyObserver implements StringObserver {
+
+ private ExportQueue<String, CountUpdate> exportQueue;
+
+ public MyObserver(ExportQueue<String, CountUpdate> exportQueue) {
+ this.exportQueue = exportQueue;
+ }
+
+ @Override
+ public void process(TransactionBase tx, String row, Column col) {
+
+ int oldCount = Integer.parseInt(tx.gets(row, FluoApp.COUNTER_COL, "0"));
+ int numUpdates = Integer.parseInt(tx.gets(row, FluoApp.UPDATE_COL, "0"));
+ int newCount = oldCount + numUpdates;
+
+ tx.set(row, FluoApp.COUNTER_COL, "" + newCount);
+ tx.delete(row, FluoApp.UPDATE_COL);
+
+ // Because the update to the export queue is part of the transaction,
+ // either the update to meta:counter1 is made and an entry is added to
+ // the export queue or neither happens.
+ exportQueue.add(tx, row, new CountUpdate(oldCount, newCount));
+ }
+}
+```
+
+The export queue will call the `accept()` method on the class below to process
entries queued for
+export. It is possible the call to `accept()` can fail part way through
and/or be called multiple
+times. In the case of failures the export consumer will be called again with
the same data.
+Its possible for the same export entry to be processed on multiple computers
at different times.
+This can cause exports to arrive out of order. The purpose of the sequence
number is to help
+systems receiving out of order and redundant data.
+
+```java
+public class CountExporter implements Exporter<String, CountUpdate> {
+ // represents the external query system we want to update from Fluo
+ QuerySystem querySystem;
+
+ @Override
+ public void export(Iterator<SequencedExport<String, CountUpdate>> exports) {
+ BatchUpdater batchUpdater = querySystem.getBatchUpdater();
+
+ while (exports.hasNext()) {
+ SequencedExport<String, CountUpdate> export = exports.next();
+ String row = export.getKey();
+ CountUpdate uc = export.getValue();
+ long seqNum = export.getSequence();
+
+ String oldCountRow = String.format("%06d", uc.oldCount);
+ String newCountRow = String.format("%06d", uc.newCount);
+
+ // add a new entry to the inverted index
+ batchUpdater.insertRow(newCountRow, row, seqNum);
+ // remove the old entry from the inverted index
+ batchUpdater.deleteRow(oldCountRow, row, seqNum);
+ }
+
+ // flush all of the updates to the external query system
+ batchUpdater.close();
+ }
+}
+```
+
+## Schema
+
+Each export queue stores its data in the Fluo table in a contiguous row range.
+This row range is defined by using the export queue id as a row prefix for all
+data in the export queue. So the row range defined by the export queue id
+should not be used by anything else.
+
+All data stored in an export queue is [transient]. When an export
+queue is configured, it will recommend split points using the [table
+optimization process][table-opt]. The number of splits generated
+by this process can be controlled by setting the number of buckets per tablet
+when configuring an export queue.
+
+## Concurrency
+
+Additions to the export queue will never collide. If two transactions add the
+same key at around the same time and successfully commit, then two entries with
+different sequence numbers will always be added to the queue. The sequence
+number is based on the start timestamp of the transactions.
+
+If the key used to add items to the export queue is deterministically derived
+from something the transaction is writing to, then that will cause a collision.
+For example consider the following interleaving of two transactions adding to
+the same export queue in a manner that will collide. Note, TH1 is shorthand for
+thread 1, ek() is a function the creates the export key, and ev() is a function
+that creates the export value.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`row1`,`fam1:qual1`, val1)
+ 1. TH2 : tx2.set(`row1`,`fam1:qual1`, val2)
+
+In the example above only one transaction will succeed because both are setting
+`row1 fam1:qual1`. Since adding to the export queue is part of the
+transaction, only the transaction that succeeds will add something to the
+queue. If the funtion ek() in the example is deterministic, then both
+transactions would have been trying to add the same key to the export queue.
+
+With the above method, we know that transactions adding entries to the queue
for
+the same key must have executed [serially][serial]. Knowing that transactions
which
+added the same key did not overlap in time makes reasoning about those export
+entries very simple.
+
+The example below is a slight modification of the example above. In this
+example both transactions will successfully add entries to the queue using the
+same key. Both transactions succeed because they are writing to different
+cells (`rowB fam1:qual2` and `rowA fam1:qual2`). This approach makes it more
+difficult to reason about export entries with the same key, because the
+transactions adding those entries could have overlapped in time. This is an
+example of write skew mentioned in the Percolater paper.
+
+ 1. TH1 : key1 = ek(`row1`,`fam1:qual1`)
+ 1. TH1 : val1 = ev(tx1.get(`row1`,`fam1:qual1`), tx1.get(`rowA`,`fam1:qual2`))
+ 1. TH1 : exportQueueA.add(tx1, key1, val1)
+ 1. TH2 : key2 = ek(`row1`,`fam1:qual1`)
+ 1. TH2 : val2 = ev(tx2.get(`row1`,`fam1:qual1`), tx2.get(`rowB`,`fam1:qual2`))
+ 1. TH2 : exportQueueA.add(tx2, key2, val2)
+ 1. TH1 : tx1.set(`rowA`,`fam1:qual2`, val1)
+ 1. TH2 : tx2.set(`rowB`,`fam1:qual2`, val2)
+
+[Exporter]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/export/function/Exporter.html
+[serial]: https://en.wikipedia.org/wiki/Serializability
+[aeq]: {{ page.docs_base }}/recipes/accumulo-export-queue/
+[transient]: {{ page.docs_base }}/tools/transient/
+[table-opt]: {{ page.docs_base }}/tools/table-optimization/
diff --git a/_recipes-1-2/recipes/recording-tx.md
b/_recipes-1-2/recipes/recording-tx.md
new file mode 100644
index 0000000..703fdda
--- /dev/null
+++ b/_recipes-1-2/recipes/recording-tx.md
@@ -0,0 +1,76 @@
+---
+title: Recording Transaction
+category: recipes
+order: 5
+---
+
+A [RecordingTransaction] is an implementation of [Transaction] that logs all
transaction operations
+(i.e GET, SET, or DELETE) to a `TxLog` object for later uses such as exporting
data. The code below
+shows how a RecordingTransaction is created by wrapping a Transaction object:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+```
+
+A predicate function can be passed to wrap method to select which log entries
to record. The code
+below only records log entries whose column family is `meta`:
+
+```java
+RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx,
+ le ->
le.getColumn().getFamily().toString().equals("meta"));
+```
+
+After creating a [RecordingTransaction], users can use it as they would use a
Transaction object.
+
+```java
+Bytes value = rtx.get(Bytes.of("r1"), new Column("cf1", "cq1"));
+```
+
+While SET or DELETE operations are always recorded to the log, GET operations
are only recorded if a
+value was found at the requested row/column. Also, if a GET method returns an
iterator, only the GET
+operations that are retrieved from the iterator are logged. GET operations
are logged as they are
+necessary if you want to determine the changes made by the transaction.
+
+When you are done operating on the transaction, you can retrieve the TxLog
using the following code:
+
+```java
+TxLog myTxLog = rtx.getTxLog()
+```
+
+Below is example code of how a [RecordingTransaction] can be used in an
observer to record all operations
+performed by the transaction in a TxLog. In this example, a GET (if data
exists) and SET operation
+will be logged. This TxLog can be added to an export queue and later used to
export updates from
+Fluo.
+
+```java
+public class MyObserver extends AbstractObserver {
+
+ private static final TYPEL = new TypeLayer(new StringEncoder());
+
+ private ExportQueue<Bytes, TxLog> exportQueue;
+
+ @Override
+ public void process(TransactionBase tx, Bytes row, Column col) {
+
+ // create recording transaction (rtx)
+ RecordingTransactionBase rtx = RecordingTransactionBase.wrap(tx);
+
+ // use rtx to create a typed transaction & perform operations
+ TypedTransactionBase ttx = TYPEL.wrap(rtx);
+ int count =
ttx.get().row(row).fam("meta").qual("counter1").toInteger(0);
+ ttx.mutate().row(row).fam("meta").qual("counter1").set(count+1);
+
+ // when finished performing operations, retrieve transaction log
+ TxLog txLog = rtx.getTxLog()
+
+ // add txLog to exportQueue if not empty
+ if (!txLog.isEmpty()) {
+ //do not pass rtx to exportQueue.add()
+ exportQueue.add(tx, row, txLog)
+ }
+ }
+}
+```
+
+[RecordingTransaction]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/transaction/RecordingTransaction.html
+[Transaction]: {{ page.javadoc_fluo
}}/org/apache/fluo/api/client/Transaction.html
diff --git a/_recipes-1-2/recipes/row-hasher.md
b/_recipes-1-2/recipes/row-hasher.md
new file mode 100644
index 0000000..4b0f712
--- /dev/null
+++ b/_recipes-1-2/recipes/row-hasher.md
@@ -0,0 +1,121 @@
+---
+title: Row Hash Prefix
+category: recipes
+order: 4
+---
+
+## Background
+
+Transactions are implemented in Fluo using conditional mutations. Conditional
+mutations require server side processing on tservers. If data is not spread
+evenly, it can cause some tservers to execute more conditional mutations than
+others. These tservers doing more work can become a bottleneck. Most real
+world data is not uniform and can cause this problem.
+
+Before the Fluo [Webindex example][1] started using this recipe it suffered
+from this problem. The example was using reverse dns encoded URLs for row keys
+like `p:com.cnn/story1.html`. This made certain portions of the table more
+popular, which in turn made some tservers do much more work. This uneven
+distribution of work lead to lower throughput and uneven performance. Using
+this recipe made those problems go away.
+
+## Solution
+
+This recipe provides code to help add a hash of the row as a prefix of the row.
+Using this recipe rows are structured like the following.
+
+```
+<prefix>:<fixed len row hash>:<user row>
+```
+
+The recipe also provides code to help generate split points and configure
+balancing of the prefix.
+
+## Example Use
+
+```java
+import org.apache.fluo.api.config.FluoConfiguration;
+import org.apache.fluo.api.data.Bytes;
+import org.apache.fluo.recipes.core.data.RowHasher;
+
+public class RowHasherExample {
+
+ private static final RowHasher PAGE_ROW_HASHER = new RowHasher("p");
+
+ // Provide one place to obtain row hasher.
+ public static RowHasher getPageRowHasher() {
+ return PAGE_ROW_HASHER;
+ }
+
+ public static void main(String[] args) {
+ RowHasher pageRowHasher = getPageRowHasher();
+
+ String revUrl = "org.wikipedia/accumulo";
+
+ // Add a hash prefix to the row. Use this hashedRow in your transaction
+ Bytes hashedRow = pageRowHasher.addHash(revUrl);
+ System.out.println("hashedRow : " + hashedRow);
+
+ // Remove the prefix. This can be used by transactions dealing with the
hashed row.
+ Bytes orig = pageRowHasher.removeHash(hashedRow);
+ System.out.println("orig : " + orig);
+
+ // Generate table optimizations for the recipe. This can be called when
setting up an
+ // application that uses a hashed row.
+ int numTablets = 20;
+
+ // The following code would normally be called before initializing Fluo.
This code
+ // registers table optimizations for your prefix+hash.
+ FluoConfiguration conf = new FluoConfiguration();
+ RowHasher.configure(conf, PAGE_ROW_HASHER.getPrefix(), numTablets);
+
+ // Normally you would not call the following code, it would be called
automatically for you by
+ // TableOperations.optimizeTable(). Calling this code here to show what
table optimization will
+ // be generated.
+ TableOptimizations tableOptimizations = new RowHasher.Optimizer()
+ .getTableOptimizations(PAGE_ROW_HASHER.getPrefix(),
conf.getAppConfiguration());
+ System.out.println("Balance config : " +
tableOptimizations.getTabletGroupingRegex());
+ System.out.println("Splits : ");
+ tableOptimizations.getSplits().forEach(System.out::println);
+ System.out.println();
+ }
+}
+```
+
+The example program above prints the following.
+
+```
+hashedRow : p:1yl0:org.wikipedia/accumulo
+orig : org.wikipedia/accumulo
+Balance config : (\Qp:\E).*
+Splits :
+p:1sst
+p:3llm
+p:5eef
+p:7778
+p:9001
+p:assu
+p:clln
+p:eeeg
+p:g779
+p:i002
+p:jssv
+p:lllo
+p:neeh
+p:p77a
+p:r003
+p:sssw
+p:ullp
+p:weei
+p:y77b
+p:~
+```
+
+The split points are used to create tablets in the Accumulo table used by Fluo.
+Data and computation will spread very evenly across these tablets. The
+Balancing config will spread the tablets evenly across the tablet servers,
+which will spread the computation evenly. See the [table optimizations][2]
+documentation for information on how to apply the optimizations.
+
+[1]: https://github.com/astralway/webindex
+[2]: {{ page.docs_base }}/tools/table-optimization/
diff --git a/_recipes-1-2/tools/serialization.md
b/_recipes-1-2/tools/serialization.md
new file mode 100644
index 0000000..df4b73e
--- /dev/null
+++ b/_recipes-1-2/tools/serialization.md
@@ -0,0 +1,76 @@
+---
+title: Serializing Data
+category: tools
+order: 1
+---
+
+Various Fluo Recipes deal with POJOs and need to serialize them. The
+serialization mechanism is configurable and defaults to using [Kryo].
+
+## Custom Serialization
+
+In order to use a custom serialization method, two steps need to be taken. The
+first step is to implement [SimpleSerializer]. The second step is to
+configure Fluo Recipes to use the custom implementation. This needs to be done
+before initializing Fluo. Below is an example of how to do this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+//assume MySerializer implements SimpleSerializer
+SimpleSerializer.setSetserlializer(fluoConfig, MySerializer.class);
+//initialize Fluo using fluoConfig
+```
+
+## Kryo Factory
+
+If using the default Kryo serializer implementation, then creating a
+KryoFactory implementation can lead to smaller serialization size. When Kryo
+serializes an object graph, it will by default include the fully qualified
+names of the classes in the serialized data. This can be avoided by
+[registering classes][register] that will be serialized. Registration is done
by
+creating a KryoFactory and then configuring Fluo Recipes to use it. The
+example below shows how to do this.
+
+For example assume the POJOs named `Node` and `Edge` will be serialized and
+need to be registered with Kryo. This could be done by creating a KryoFactory
+like the following.
+
+```java
+package com.foo;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.pool.KryoFactory;
+
+import com.foo.data.Edge;
+import com.foo.data.Node;
+
+public class MyKryoFactory implements KryoFactory {
+ @Override
+ public Kryo create() {
+ Kryo kryo = new Kryo();
+
+ //Explicitly assign each class a unique id here to ensure its stable over
+ //time and in different environments with different dependencies.
+ kryo.register(Node.class, 9);
+ kryo.register(Edge.class, 10);
+
+ //instruct kryo that these are the only classes we expect to be serialized
+ kryo.setRegistrationRequired(true);
+
+ return kryo;
+ }
+}
+```
+
+Fluo Recipes must be configured to use this factory. The following code shows
+how to do this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+KryoSimplerSerializer.setKryoFactory(fluoConfig, MyKryoFactory.class);
+//initialize Fluo using fluoConfig
+```
+
+[Kryo]: https://github.com/EsotericSoftware/kryo
+[SimpleSerializer]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/serialization/SimpleSerializer.html
+[register]: https://github.com/EsotericSoftware/kryo#registration
diff --git a/_recipes-1-2/tools/spark.md b/_recipes-1-2/tools/spark.md
new file mode 100644
index 0000000..561879d
--- /dev/null
+++ b/_recipes-1-2/tools/spark.md
@@ -0,0 +1,19 @@
+---
+title: Spark Helper
+category: tools
+order: 4
+---
+
+Fluo Recipes has some helper code for [Apache Spark][spark]. Most of the
helper code is currently
+related to bulk importing data into Accumulo. This is useful for initializing
a new Fluo table with
+historical data via Spark. The Spark helper code is found in the
[fluo-recipes-spark module][frs].
+
+For information on using Spark to load data into Fluo, check out this [blog
post][blog].
+
+If you know of other Spark+Fluo integration code that would be useful, then
please consider [opening
+an issue](https://github.com/apache/fluo-recipes/issues/new).
+
+[spark]: https://spark.apache.org
+[frs]: {{ site.api_base }}/fluo-recipes-spark/{{ page.version }}
+[blog]: https://fluo.apache.org/blog/2016/12/22/spark-load/
+
diff --git a/_recipes-1-2/tools/table-optimization.md
b/_recipes-1-2/tools/table-optimization.md
new file mode 100644
index 0000000..e2202bd
--- /dev/null
+++ b/_recipes-1-2/tools/table-optimization.md
@@ -0,0 +1,66 @@
+---
+title: Table Optimization
+category: tools
+order: 3
+---
+
+## Background
+
+Recipes may need to make Accumulo specific table modifications for optimal
+performance. Configuring the [Accumulo tablet balancer][3] and adding splits
are
+two optimizations that are currently done. Offering a standard way to do these
+optimizations makes it easier to use recipes correctly. These optimizations
+are optional. You could skip them for integration testing, but would probably
+want to use them in production.
+
+## Java Example
+
+```java
+FluoConfiguration fluoConf = ...
+
+//export queue configure method will return table optimizations it would like
made
+ExportQueue.configure(fluoConf, ...);
+
+//CollisionFreeMap.configure() will return table optimizations it would like
made
+CollisionFreeMap.configure(fluoConf, ...);
+
+//configure optimizations for a prefixed hash range of a table
+RowHasher.configure(fluoConf, ...);
+
+//initialize Fluo
+FluoFactory.newAdmin(fluoConf).initialize(...)
+
+//Automatically optimize the Fluo table for all configured recipes
+TableOperations.optimizeTable(fluoConf);
+```
+
+[TableOperations][2] is provided in the Accumulo module of Fluo Recipes.
+
+## Command Example
+
+Fluo Recipes provides an easy way to optimize a Fluo table for configured
+recipes from the command line. This should be done after configuring reciped
+and initializing Fluo. Below are example command for initializing in this way.
+
+```bash
+
+#create application
+fluo new app1
+
+#configure application
+
+#initialize Fluo
+fluo init app1
+
+#optimize table for all configured recipes
+fluo exec app1 org.apache.fluo.recipes.accumulo.cmds.OptimizeTable
+```
+
+## Table optimization registry
+
+Recipes register themself by calling
[TableOptimizations.registerOptimization()][1]. Anyone can use
+this mechanism, its not limited to use by exisitng recipes.
+
+[1]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/common/TableOptimizations.html
+[2]: {{ page.javadoc_accumulo
}}/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
+[3]:
http://accumulo.apache.org/blog/2015/03/20/balancing-groups-of-tablets.html
diff --git a/_recipes-1-2/tools/testing.md b/_recipes-1-2/tools/testing.md
new file mode 100644
index 0000000..cb0eb09
--- /dev/null
+++ b/_recipes-1-2/tools/testing.md
@@ -0,0 +1,15 @@
+---
+title: Testing
+category: tools
+order: 5
+---
+
+Fluo includes MiniFluo which makes it possible to write an integeration test
that
+runs against a real Fluo instance. Fluo Recipes provides the following utility
+code for writing an integration test.
+
+ * [FluoITHelper][1] A class with utility methods for comparing expected data
with whats in Fluo.
+ * [AccumuloExportITBase][2] A base class for writing an integration test that
exports data from Fluo to an external Accumulo table.
+
+[1]: {{ site.api_static }}/fluo-recipes-test/{{ page.version
}}/org/apache/fluo/recipes/test/FluoITHelper.html
+[2]: {{ site.api_static }}/fluo-recipes-test/{{ page.version
}}/org/apache/fluo/recipes/test/AccumuloExportITBase.html
diff --git a/_recipes-1-2/tools/transient.md b/_recipes-1-2/tools/transient.md
new file mode 100644
index 0000000..2e3b5ec
--- /dev/null
+++ b/_recipes-1-2/tools/transient.md
@@ -0,0 +1,85 @@
+---
+title: Transient Data
+category: tools
+order: 2
+---
+
+## Background
+
+Some recipes store transient data in a portion of the Fluo table. Transient
+data is data thats continually being added and deleted. Also these transient
+data ranges contain no long term data. The way Fluo works, when data is
+deleted a delete marker is inserted but the data is actually still there. Over
+time these transient ranges of the table will have a lot more delete markers
+than actual data if nothing is done. If nothing is done, then processing
+transient data will get increasingly slower over time.
+
+These deleted markers can be cleaned up by forcing Accumulo to compact the
+Fluo table, which will run Fluos garbage collection iterator. However,
+compacting the entire table to clean up these ranges within a table is
+overkill. Alternatively, Accumulo supports compacting ranges of a table. So
+a good solution to the delete marker problem is to periodically compact just
+the transient ranges.
+
+Fluo Recipes provides helper code to deal with transient data ranges in a
+standard way.
+
+## Registering Transient Ranges
+
+Recipes like [Export Queue][export-queue] will automatically register
+transient ranges when configured. If you would like to register your own
+transient ranges, use [TransientRegistry]. Below is a simple example of
+using this.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TransientRegistry transientRegistry = new
TransientRegistry(fluoConfig.getAppConfiguration());
+transientRegistry.addTransientRange(new RowRange(startRow, endRow));
+
+//Initialize Fluo using fluoConfig. This will store the registered ranges in
+//zookeeper making them availiable on any node later.
+```
+
+## Compacting Transient Ranges
+
+Although you may never need to register transient ranges directly, you will
+need to periodically compact transient ranges if using a recipe that registers
+them. Using [TableOperations] this can be done with one line of Java code
+like the following.
+
+```java
+FluoConfiguration fluoConfig = ...;
+TableOperations.compactTransient(fluoConfig);
+```
+
+Fluo recipes provides an easy way to compact transient ranges from the command
line using the `fluo exec` command as follows:
+
+```
+fluo exec <app name> org.apache.fluo.recipes.accumulo.cmds.CompactTransient
[<interval> [<multiplier>]]
+```
+
+If no arguments are specified the command will call `compactTransient()` once.
+If `<interval>` is specified the command will run forever compacting transient
+ranges sleeping `<interval>` seconds between compacting each transient ranges.
+
+In the case where Fluo is backed up in processing data a transient range could
+have a lot of data queued and compacting it too frequently would be
+counterproductive. To avoid this the `CompactTransient` command will consider
+the time it took to compact a range when deciding when to compact that range
+next. This is where the `<multiplier>` argument comes in, the time to sleep
+between compactions of a range is determined as follows. If not specified, the
+multiplier defaults to 3.
+
+```java
+sleepTime = Math.max(compactTime * multiplier, interval);
+```
+
+For example assume a Fluo application has two transient ranges. Also assume
+CompactTransient is run with an interval of 600 and a multiplier of 10. If the
+first range takes 20 seconds to compact, then it will be compacted again in 600
+seconds. If the second range takes 80 seconds to compact, then it will be
+compacted again in 800 seconds.
+
+[TransientRegistry]: {{ page.javadoc_core
}}/org/apache/fluo/recipes/core/common/TransientRegistry.html
+[TableOperations]: {{ page.javadoc_accumulo
}}/org/apache/fluo/recipes/accumulo/ops/TableOperations.html
+[export-queue]: {{ page.docs_base }}/recipes/export-queue/
--
To stop receiving notification emails like this one, please contact
['"[email protected]" <[email protected]>'].