[
https://issues.apache.org/jira/browse/SOLR-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888670#action_12888670
]
Chris A. Mattmann commented on SOLR-1925:
-----------------------------------------
Hi Yonik:
Thanks. Replies below:
{quote}
* loses info by removing newlines
{quote}
Only does this when {noformat}&excel=true{noformat}, and actually adds
functionality in doing so (without doing this, you can't load the data into
Excel, see my comments above and in the code).
{quote}
* always encapsulates with quotes - not as readable
{quote}
See the CSV spec, via Wikipedia in the links in the code. Doing so reduces
ambiguity, and clearly delineates where the value starts, and where it stops.
{quote}
* doesn't escape encapsulator in values
{quote}
Is there a need to do this? I don't think so...
{quote}
* doesn't escape separator in multi-valued fields
{quote}
Same as above: no need, really.
{quote}
* isn't really nested CSV, so it's not compatible with the CSVLoader
{quote}
What do you mean not compatible with CSV loader?
{quote}
* uses System.getProperty("line.separator")... we should avoid different
behavior on different platforms
{quote}
Hmm, I've never been dinged before for writing platform independent code.
That's what they put the property in there, so line.separator means the same
thing, programming-construct wise, across platforms. So, I don't really get
your ding here.
{quote}
* doesn't stream documents (dumping your entire index will be one use case)
{quote}
I actually implemented both the streaming method (#writeDoc) and the aggregate
method (#writeAllDocs). I set #isStreaming to false, because it makes for a
clean CSV header writing, rather than hacky code in #writeDoc to take care of
the (potential) non-uniformity. Additionally, I'm using this in production
right now, on solr-1.5 branch with an index of over 1M documents, and the
performance overhead for the write is quite fast.
{quote}
* performance: patterns shouldn't be compiled per-doc
{quote}
This only matters when {noformat}excel=true{noformat}, and I think the
performance hit isn't really an issue. If you feel strongly about it though we
could always compile the pattern above the loop, and reuse it...
> CSV Response Writer
> -------------------
>
> Key: SOLR-1925
> URL: https://issues.apache.org/jira/browse/SOLR-1925
> Project: Solr
> Issue Type: New Feature
> Components: Response Writers
> Environment: indep. of env.
> Reporter: Chris A. Mattmann
> Assignee: Erik Hatcher
> Fix For: Next
>
> Attachments: SOLR-1925.Chheng.071410.patch.txt,
> SOLR-1925.Mattmann.053010.patch.2.txt, SOLR-1925.Mattmann.053010.patch.3.txt,
> SOLR-1925.Mattmann.053010.patch.txt, SOLR-1925.Mattmann.061110.patch.txt
>
>
> As part of some work I'm doing, I put together a CSV Response Writer. It
> currently takes all the docs resultant from a query and then outputs their
> metadata in simple CSV format. The use of a delimeter is configurable (by
> default if there are multiple values for a particular field they are
> separated with a | symbol).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]