Hi everyone,

So here is what happened.

My initial call was to add the new variable to the data series (or similar
central module), which ensures that any future chart types (post‑2016) can
use this variable if they require “calculated” versus “source” data.

Now my initial approach was not in the right direction and later Quikee
pointed me in the right direction -

   - Pay close attention to the "You Ain't Gonna Need It" (YAGNI)
   principle. Don't over-engineer for future chart types. Focus on solving the
   current problem (histogram support) in a clean and maintainable way.
   - No IDL Changes (Unless Necessary): Avoid modifying the UNO IDL files
   unless you have a very strong reason. IDL changes can have far-reaching
   consequences.
   - Persistence: Quikee said that persistence doesn't need to be accounted
   for at this point and can be calculated again on the fly as we have the
   original data.



Regina pointed out a deeper problem in the approach taken so far. She
mentioned that the calculated bin frequencies and bin ranges were set in
place of the original data.
This approach allowed for a convenient way to render the histogram as a bar
chart, but it introduced bugs (tdf#164109 and tdf#162240) and made it
impossible to export the
<chart:series> element in the same way as other chart types. She referred
to a specific part of the code in SchXMLExport.cxx where the export of data
series for the current chart type is handled.
Currently, in the patch being discussed, the <chart:series> element isn’t
being exported at all. When it is exported, the new
<loext:histogram-configuration> element needs to be a child element
after all existing child elements, ideally just before the comment '//
close series'.

Tomaž has made a patch (PR) https://gerrit.libreoffice.org/c/core/+/181051
to address this,
and I’ve copied his work so far without making any changes yet
https://gerrit.libreoffice.org/c/core/+/182101 .

After his PR, Regina further commented on it -
https://gerrit.libreoffice.org/c/core/+/181051/comments/ca521ccf_9e2ab59c?tab=comments
*UI/UX Issues *

   - *Problem *:
      - values-y and calculated-y share the same UI label (STR_DATA_ROLE_Y).
      - Switching chart types adds duplicate data series.
   - *Solution *:
      - Assign unique labels (e.g., "Original Y Values" vs. "Calculated
      Frequencies").
      - Clear calculated-y/calculated-x when switching away from histograms.



So what does he propose?

The goal is to ensure that the original data is preserved while also
accommodating the calculated values needed for the histogram.

First, let's clarify the problem with the *y-values*. In a histogram, the
y-values represent frequencies, which are calculated from the original
data.
However, the existing code assumes that the "values-y" role always contains
the original, raw data. This creates a conflict because for histograms,
we need to use calculated frequencies for rendering, but we also need to
keep the original data intact for other functionalities or potential
chart-type changes.


To address this, Tomaž introduced a *new role *called *"calculated-y"* in
his patch. This role is meant to store the calculated frequencies for the
histogram,
*while keeping the original data in the "values-y"* role. This way, the
original data remains untouched, and the calculated values are available
for rendering the histogram.

So, one of the tasks is to* introduce* *m_aCalculated_Y* in *VDataSeries*
and adjust the code to use m_aCalculated_Y when it's set (i.e., for
histograms), and *m_aValues_Y* *otherwise*.
This requires auditing all places where m_aValues_Y is used and deciding
whether to use m_aValues_Y or m_aCalculated_Y based on the context.

Additionally, there's a mention of clearing the *"calculated-y"* role when
changing the chart type.
This makes sense because the calculated values are specific to the
histogram and shouldn't persist if the user switches to a different chart
type.

Now, regarding the x-axis, there seems to be some confusion. Initially, I
thought the issue was with *x-values*, but Tomaž clarified that the primary
issue is with y-values.
However, in the context of histograms, the x-values represent the bins,
which are also calculated from the original data.
So, similar to the y-values, there might be a need to handle calculated
x-values separately to preserve the original data.





On Sat, 1 Feb 2025 at 01:17, Devansh Varshney <varshney.devansh...@gmail.com>
wrote:

> Hi community,
>
> To follow up on the previous mail, I'd like to give you a more detailed
> update on how we're addressing the histogram data handling issue in the
> *chart2* module, which Tomaž and Regina pointed out.
>
>    - The core problem lies in how *histograms* handle *X-axis* values.
>    The existing *chart2* architecture *assumes a direct mapping between
>    source data and chart X-axis values*.
>    - However, histograms calculate X-axis values (bin-width) based on the
>    input data, leading to a situation where these calculated values overwrite
>    the original X-values and create issues in many places in the code.
>    - We can get the render of the chart while creating it but when we
>    tried to save(for example) the original values were gone by this time hence
>    the problem.
>
>
> Tomaž also said that he would like to introduce a variable
> *XValuesCalculated* which will resolve most of the problem.
>
> Further Analysis -
>
>    -     *Original X-Values:* The original X-values are correctly stored
>    in XLabeledDataSequence.
>    -     *Histogram Calculations:* The histogram calculations occur
>    within the *HistogramDataInterpreter* and then the calculated values
>    overwrite the XLabeledDataSequence x values.
>    -     *Need for Separation:* We need to store and display the
>    histogram-calculated X-values while preserving the original data as many
>    parts of the code depend on original values.
>
>
>
>
> Analysis of *HistogramChartType.cxx*
>
> *Purpose:* This file defines the HistogramChartType class, that is
> responsible for setting up a histogram chart type. It manages properties
> specific to histograms,
> like bin width and range. It also implements the logic to create the
> data-series and do the calculations.
>
> *createCalculatedDataSeries()* Method:
>
>    - This method retrieves the raw data through the *XLabeledDataSequence*
>    from *m_aDataSeries*. This is where the logic modifies the source data
>    (the sequence returned from getValues()).
>    - It then uses *HistogramCalculator* to perform calculations and
>    generate the data.
>    - Finally, it creates a new *HistogramDataSequence* to store
>    calculated values and then adds that to m_aDataSeries.
>
>
> *Interaction with **XLabeledDataSequence*
> *:*
>
>    -         It gets the original data using the *getDataSequences2()*
>    method of *DataSeries*.
>    -         It gets the original x values by using *getValues*() method
>    of the *XLabeledDataSequence* which returns an XDataSequence and then
>    uses the *getData() *method of the XDataSequence to retrieve the x
>    values in a sequence of uno::Any.
>
>
>
> Analysis of *HistogramCalculator.cxx*
>
> *Purpose*: This class performs the core histogram calculations using the
> raw data that is provided to *computeBinFrequencyHistogram()* method.
>
> *Bin Calculation Logic:*
>
>    -         The class calculates the number of bins, bin width, and bin
>    ranges based on the input data.
>    -         It computes bin ranges and their frequencies.
>    -         The calculated values are stored internally in *maBinRanges*
>    and *maBinFrequencies*.
>
>
>
>
> Understanding* HistogramDataSequence.cxx*
>
> *Data Storage:*
>
>    -         The primary data storage is the *mxValues* member, which is
>    a *uno::Sequence<double>*. This sequence stores the calculated bin
>    values for the histogram.
>    -         There is also *mxLabels* which stores labels but we are not
>    concerned with it.
>
>
> *Interface Implementations:*
>
>    -         *HistogramDataSequence* implements several UNO interfaces
>    including *XNumericalDataSequence, XTextualDataSequence, XDataSequence*
>    .
>    -         *getNumericalData()*: Returns the *mxValues* as a
>    *uno::Sequence<double>*.
>    -         *getTextualData()*: Returns an empty
>    *uno::Sequence<OUString>*.
>    -        * getData()*: Returns the *mxValues* as a
>    *uno::Sequence<uno::Any>*.
>
>
>
>
> Proposed Solution:
> To address these issues, this is how I am thinking of making the changes:
>
>    * *Add **XValuesCalculated Property(As said by Tomaž)* :
>
>    -         Add an *[optional, attribute] sequence<double> *
>    *XValuesCalculated**;* to the *DataSeries* service definition (
>    *offapi/com/sun/star/chart2/DataSeries.idl*).
>    -         Implement the corresponding storage, getter, and setter
>    methods in *DataSeries.hxx* and *DataSeries.cxx*. (Add storage for
>    *m_aXValuesCalculated*)
>    -         Add the corresponding property support in *OPropertySet.hxx*
>    and *OPropertySet.cxx*?
>
>        *Why add it to DataSeries? *-> If my understanding is correct,
> *XValuesCalculated* is a property associated with a DataSeries and is not
> a new data structure by itself, so we do not need to create a new IDL file
> for it.
> *      No Boolean Flag Needed:* -> we do not need an explicit boolean
> flag.
>
>        I am using *.hasElements() *and *w**hy Length Check is Sufficient*:
>
>    -     *Presence = Calculated Values*: The presence of elements within
>    the *XValuesCalculated* sequence is sufficient to determine if
>    calculated X-values exist.
>    -     *Empty Sequence = No Calculated Values*: An empty sequence,
>    which tells us that no values have been calculated. This is very convenient
>    because UNO Sequences can be empty.
>    -     *No Redundancy:* Using a separate boolean flag would add
>    redundancy.
>
>
>
>    *
> * Modify Histogram Logic:*
>
>    -         Update HistogramDataInterpreter.cxx to perform calculations
>    and store calculated bin values in the *XValuesCalculated* property of
>    the corresponding DataSeries object. This will not touch the original
>    values in the *XLabeledDataSequence*.
>    -         And the Histogram View utilises the BarChart view for the
>    final rendering.
>
>
>    *
> * Modify Rendering Logic:*
>
>    -         Modify chart renderers (such as VSeriesPlotter.cxx ?) to
>    prioritize the *XValuesCalculated* property when available for
>    plotting. When the *XValuesCalculated* is not available then it will
>    use the original values of *XLabeledDataSequence*.
>
>
>    *
> *Adjust Axis Scaling:*
>
>    -         Modify the code to use the *XValuesCalculated* property,
>    when available, for axis scaling range calculation.
>
>
>    * *Preserve Original Data in UI:*
>
>    -         Ensure that UI components always display original data
>    fetched from *XLabeledDataSequence* with the role of *"values-x".*
>    -         Since users expect the UI tables to reflect the underlying
>    data they have provided. If we show calculated bin values in tables/cells,
>    it will be confusing and wrong.
>
>
>    *
> *Comprehensive Testing:*
>
>    -         Create tests to ensure that the changes work and do not
>    cause a regression to ensure proper functionality and maintain backward
>    compatibility.
>    -         We will also make sure the undo and redo are working
>    properly.
>
>
>    *
> * Documentation:*
>
>    -         Document all the changes to reflect the new property and its
>    usage.
>
>
>
> Key Benefits:
>
>    1.    * Correct Histogram Display:* Histograms will correctly use
>    calculated bin values for the X-axis.
>    2.     *Preserved Original Data:* Original source data will be
>    maintained for other charts and user interactions.
>    3.     *Minimal Impact:* This change does not affect other chart
>    types, it ensures backward compatibility, and it only impacts histogram and
>    renderer-specific logic.
>    4.     *Clear Separation:* Separates the concern of raw data and
>    calculated data.
>    5.    * Improved Architecture: *Addresses the underlying issue of
>    assumptions embedded in the chart(*chart2*) module.
>
>
>
>
>
> *Is the below file related?*
> chart2/source/controller/dialogs/DataBrowser.cxx
> <https://github.com/LibreOffice/core/blob/b0a4afc58e1c9434e56ddb96c41f4ebe5985ed0a/chart2/source/controller/dialogs/DataBrowser.cxx#L4>
>
>
>
> Do we also have to modify *getValues()* in *XLabeledDataSequence* to
> return calculated values or original values based on the presence of
> *XValuesCalculated* property and chart type?
>
> If other places need changes or I if have still missed files (please let
> me know)
> and the changes in the *offapi/com/sun/star/chart2/DataSeries.idl* are
> correct?
>
> On Thu, 23 Jan 2025 at 18:59, Regina Henschel (via Code Review) <
> ger...@gerrit.libreoffice.org> wrote:
>
>> Attention is currently required from: Devansh Varshney, Tomaž Vajngerl.
>>
>> Regina Henschel has posted comments on this change by Devansh Varshney. (
>> https://gerrit.libreoffice.org/c/core/+/177364?usp=email )
>>
>> Change subject: xmloff: Histogram Chart ODF import and export support
>> ......................................................................
>>
>>
>> Patch Set 26:
>>
>> (2 comments)
>>
>> Patchset:
>>
>> PS26:
>> There is a deeper problem. You have set the calculated binFrequencies and
>> binRanges in place of the original data. That has give you a convenient way
>> for getting a rendering as bar chart. Not only does this cause bugs
>> tdf#164109 and tdf#162240, but now you can't export the <chart:series>
>> element the same way as it is done for other chart types. I'm referring to
>> the way that can be seen after comment '// export dataseries for current
>> chart-type' in SchXMLExport.cxx.
>>
>> Currently in your patch, the <chart:series> element is not exported at
>> all. When it will be exported, the new <loext:histogram-configuration>
>> element has to become a child element after all the existing child
>> elements. If the export looks similar to the existing chart types, it would
>> be just before the comment '// close series'.
>>
>> I suggest to discuss the underlying problem with Tomaž Vajngerl. I don't
>> see an easy solution and would have to try a lot. Until this is resolved,
>> it doesn't make sense to make progress on the export.
>>
>>
>> File schema/libreoffice/OpenDocument-v1.4+libreoffice-schema.rng:
>>
>>
>> https://gerrit.libreoffice.org/c/core/+/177364/comment/2e8b89d8_b939b3f4?usp=email
>> :
>> PS26, Line 2149:         <rng:optional>
>>                :           <rng:ref name="loext-histogram-configuration"/>
>>                :         </rng:optional>
>> > I mean the `loext:histogram-configuration` is placed under
>> `chart:series` is this the correct way? […]
>> The position is correct.
>>
>>
>>
>> --
>> To view, visit https://gerrit.libreoffice.org/c/core/+/177364?usp=email
>> To unsubscribe, or for help writing mail filters, visit
>> https://gerrit.libreoffice.org/settings?usp=email
>>
>> Gerrit-MessageType: comment
>> Gerrit-Project: core
>> Gerrit-Branch: master
>> Gerrit-Change-Id: Ibda198c1a3e4a7b53e3cf590c33cc9adb558be74
>> Gerrit-Change-Number: 177364
>> Gerrit-PatchSet: 26
>> Gerrit-Owner: Devansh Varshney <varshney.devansh...@gmail.com>
>> Gerrit-Reviewer: Jenkins
>> Gerrit-Reviewer: Tomaž Vajngerl <qui...@gmail.com>
>> Gerrit-CC: Regina Henschel <rb.hensc...@t-online.de>
>> Gerrit-Attention: Devansh Varshney <varshney.devansh...@gmail.com>
>> Gerrit-Attention: Tomaž Vajngerl <qui...@gmail.com>
>> Gerrit-Comment-Date: Thu, 23 Jan 2025 13:29:36 +0000
>> Gerrit-HasComments: Yes
>> Gerrit-Has-Labels: No
>> Comment-In-Reply-To: Devansh Varshney <varshney.devansh...@gmail.com>
>>
>
>
> --
> *Regards,*
> *Devansh*
>


-- 
*Regards,*
*Devansh*

Reply via email to