I have a few thousand transactions in a bank statement csv file tagged with a transfer account. Why can't I skip the import matching process?
Firstly, assume this test data from the Concepts Guide checking account with the addition of a typical bank statement description and transfer account. It is loaded into gcashdata_3emptyAccts, created in the guide's exercises. Date,Number,Memo,Amount,Balance,Account,Entity 05/03/06,101,GG25j1546 Groceries wtf 15:57 061124,-45.21,413.05,Expenses:Groceries,Big Food 06/03/06,,Transfer to J&J Doe Savings Acc 5765-8397 589654259587,100,513.05,Savings,[Savings] 14/03/06,,Direct Credit Salary from Employers R Us,670,1183.05,Income:Salary,Employers R Us 28/03/06,,Mmvoin515b Internet Company bg??,-20,1163.05,Expenses:Internet,FastFibre 28/03/06,102,Light Company Big City Branch 9g8k863,-78,185.05,Expenses:Electricity,Light Company 28/03/06,103,Phone Company Name Autodebit 595642583,-45,140.05,Expenses:Phone,Phone Company Name 28/03/06,104,April Rent 5 Short Road,-350,690.05,Expenses:Rent,HighTower Adding the following record to the test data might help explain this issue: 28/04/06,104,May Rent 5 Short Road,-350,340.05,Expenses:Rent,HighTower 1. So both accounts are given, the Checking Account and the Transfer Account. It is ok to present the account matching screen but the user should be able to just select the Next option without making any changes (to just accept the import). If there are new accounts I'd expect them to be created as part of the import process unless they are invalid. 2. I understand the issue with duplicate transactions that need to be avoided. Importing the checking account with a transfer to a credit card account being processed as a transaction in both accounts, then importing the credit card account with a transfer from the checking account being processed as a transaction in both accounts. The result is two transactions in each account which actually represent a single transaction in each account. For a large import where there are no transactions in any other accounts then this can't happen and the user should be able to go to the Next step. If there were other bank accounts with data in them but it was prior to the period being imported then the same should apply. There are still reconciliation steps after an import. If there are existing records in another bank account representing the same transactions being imported the user should have the option to go to the Next step, have the program flag the duplicates and allow the user to consolidate them into a single record. Let's assume this scenario. The checking account has transfers with a cash management account and a credit card. If all the cash management transfers were to/from the checking account you wouldn't load the cash management statement electronically and just enter monthly interest payments manually. If there were a few checking account transfers to/from the credit card each month it would be preferable to just load the statement and fix the duplicates of the same transaction at reconciliation. This issue has been raised and responded to a couple of times already but I don't consider the responses explain the NEED for matching to occur in THIS scenario. ------------------------------------------------- https://github.com/Gnucash/gnucash-docs/pull/132#issuecomment-619119386 >> If data is tagged with an account why does the account need to be matched with identical GnuCash accounts [exported from GnuCash]? > > Sorry, do you mean if the CSV already has the "other" account why does it need to run the matcher? I don't think that it does. > >> Why is transaction matching required? > > Because many, perhaps even most, imports don't have the "other" account identified, just a description. That's the case for your bank account example. The matcher, once trained, provides automatic assignment of the "other" account based on the description. > >> Furthermore, if one transaction is matched in a list. Why isn't the transaction list updated to match other identical transactions matched? > > Because matching a transaction list takes significant time and re-running it every time a user matched a transaction would be annoyingly slow. So it works the other way around as of IIRC 3.7 or so but that's not yet documented: You can select several transactions at once in the matcher and right-click for a context menu with the single entry for picking a matching account for all of the selected transactions. Once the matcher is trained it will match all of the transactions. https://lists.gnucash.org/pipermail/gnucash-user/2020-April/090627.html >> I have a bigger issue. I have a few thousand transactions in a bank statement csv. How do I get through the transaction matching process? > > The transaction matching has two components. The first is the avoidance of duplicate transactions. The second is the assignment second account for the transaction which is not normally specified explicitly in a bank statement record. The bayesian approach in GnuCash works well on this second problem. However you need to understand how it works to optimize it. One of the reasons I started working on the documentation of the importing was that the documentation was pretty poor and I didn't understand what was happening myself. You cannot import your file with a thousand transactions in one hit and expect GnuCash to correctly assign accounts. The account assignment is done by tokenizing information in the description date and amount fields of the transaction and constructing a table of the frequency of ocurrence of the tokens a particular account that has been assigned as the second account. When a transaction is imported its tokenized information is comapred with the frequenct table and a score of the matches of the tokens with each possible account assignment is calculated and the one with the account highest score is selected and preented as the assigned account. You can manually override that automatic assignment in the matcher window. When all of the transactions displayed have had the correct accounts assigned to them and you click the OK button on the matcher window, the token data is updated into the frequency table in the data file at that point only. If you have never imported data, that table is totally empty. Note: The frequency table contains no information derived from transactions which may have already been recorded manually in GnuCash without using the import matcher. > The best strategy is to initially import data in small batches at first making sure that you manually assign the correct accountsin each case before hitting OK to actually import the data. It is only after OK is clicked that the frequency table in the data file is updated. If you import data with incorrect account assignments or leave transactions which are assigned to the Imbalance accounts in the import, you are training the system to assign the wrong accounts. You should notice that after a successful few imports that GnuCash's guesses at the account should improve and most accounts will generally be assigned to the accounts you want. At this point you can start increasing the size of the batches of data you import. Splitting a csv file up is fairly easy in a text editor. If you have not been completing the imports a s described or have been correcting the account assignments in GnuCash after importing your data file is going to contain frequency table information which will misdirect the account assignment. Tools->Import->Map Editor allows editing of the stored tokens. Any associations with Imbalance accounts should be deleted. This is a relatively new feature and is on my list of future documentation projects . Use with caution. I improved the matching performance considerably by editing out data for files which were being assigned incorrectly fairly frequently. The matcher is never going to work perfectly unless the imported data explicitly specifies the second account for the transaction. In this case Gnucash also constructs a map of accounts specified in a Transfer account field to specific accounts in the GnuCash internal account heirarchy. _______________________________________________ gnucash-user mailing list gnucash-user@gnucash.org To update your subscription preferences or to unsubscribe: https://lists.gnucash.org/mailman/listinfo/gnucash-user If you are using Nabble or Gmane, please see https://wiki.gnucash.org/wiki/Mailing_Lists for more information. ----- Please remember to CC this list on all your replies. You can do this by using Reply-To-List or Reply-All.