Thursday, July 29, 2010

Large List in SP2010

This is a long post on a problem with SharePoint 2010 that I worked on over the last week. Centered around understand and working with Large Lists in custom forms. You can skip to the bottom to see the summary of the solution I took.

The Problem:
The Original Error: This view cannot be displayed because it exceeds the list view threshold (5000 items) enforced by the administrator.

This single error led me on quite the educational trip. My first reaction crap my web application is broken. Then I mailed the server administrator asking that the limit be increased, complaining how my solution worked in MOSS. :) Luckily I was barely over the 5K limit so deleting some records let me drop below the 5K limit but I had problem with the application and I needed a way to test it while allowing my customers access to their data.

This led to my first workaround:
- Set the selectcommand to “<View></View>”,
- Edit the asp:Parameter MaximumRows and set it to 5K.
- Then apply xslt filtering.
Wince like I did, but it allowed me to work on the site w/o further disrupting my customers activities. There a lot of reason this is a no-no, performance being the biggest one.

Now I could work, experiment, test on the site / list. The server administrators got back to me after I had found this work around and asked me to continue pursuing a solution as MS would one day come back and tell us to set the threshold setting back to default thus putting us right back to square one. I’ll get back to why the administration team did the right thing here. Even though it seems like the simple answer to increase the threshold, DO NOT DO IT! Many thanks to the administration team for keeping me focused on delivering the right solution.

Before I get to that let me explain to those that may be yelling at the screen why didn’t you have a test site and unfortunately my site / application is very dependent on work flows and I have yet to find a good way to migrate those from site to site.

So back to the education, well we were early adopters and I missed the information about large lists and list throttling. Saw the information that the list can contain about 50 mil records, awesome!!...Missed the point about only being able to work with 5K at a time, though I never intended to even work with that large of a number of records.

Root causing the Issue:
Now I had to start scoping the problem I knew two things once a list crosses 5K records it now becomes a large list. What worked on a regular list no longer seems to work on a large list and I can only work with 5K records at a time and my exiting filters were not working before the records were being pulled.

So I needed to understand what a large list was:
- Which led me to this page: http://msdn.microsoft.com/en-us/library/ff798465.aspx
-- The best resource I found was a paper MS wrote about large lists titled “Designing Large Lists Maximizing List Performance” and can be downloaded here: http://www.microsoft.com/downloads/details.aspx?FamilyID=fd1eac86-ad47-4865-9378-80040d08ac55&displaylang=en
--- I was a bit shocked when the doc called out that most users when first faced with this problem are tempted to increase the threshold and it explained why not to take that approach in technical details. (it’s not every day that you can point at a doc and say I behaved exactly that way :) )
-- The second best resource was the help doc (I believe this is a replication of their online help documentation)- http://sharepoint.microsoft.com/blogs/GetThePoint/Lists/Posts/Post.aspx?ID=303

Unfortunately these papers didn’t really deal with my problem exactly. Though it begun to point me in the right direction and to be aware of some of the side effects of the list being a large list and explain what I was experiencing.

Some side effects or symptoms that a list has passed the large list threshold and become a large list or at least what I have experienced:
• Filters that worked previously no longer work
• Sort and filter functionality no longer works
• Some views may not work
• Count on list views will not work

Possible Permanent Solutions:
I first tried a query filtered with an indexed column, according to the documentation this is supposed to be the Holy Grail fix. Unfortunately this did not work as expected and still got the threshold error. So I abandoned this approach, I’ve head numerous issues with DFWP since I migrated to 2010 I figured this was one more thing that was neglected by MS and I’d have to file another bug against 2010.

I next compiled a list of potential workaround with a help of my site / application architect.

- Content iterator
-- http://msdn.microsoft.com/en-us/library/ff798376.aspx
-- Code solution
- Partitioned View
-- http://msdn.microsoft.com/en-us/library/ff798344.aspx
-- Seems to only apply to SQL data sources
- Managed Metadata
-- Will require adjustment to the list
-- Should have been applied at design phase
-- Will require a lot of pre-work in planning before implementing
- Content Query Web Part
-- To enable it
--- http://vspug.com/ssa/2010/02/06/sp2010-tip-content-query-web-part-missing-in-sharepoint-2010/
--- http://social.msdn.microsoft.com/Forums/en/sharepoint2010general/thread/4cfcdb20-0e1e-4651-bfda-008fe554696e
-- To Use it
--- Dated write up - http://blogs.msdn.com/b/ecm/archive/2006/10/25/configuring-and-customizing-the-content-query-web-part.aspx
--- Newer info - http://blogs.msdn.com/b/ecm/archive/2010/05/14/what-s-new-with-the-content-query-web-part.aspx
-- Customize it
--- http://msdn.microsoft.com/en-us/library/aa981241.aspx
--- http://msdn.microsoft.com/en-us/library/ms497457.aspx
- Access Services
-- Dead end
- Re-architect the site using folders and custom content types
-- This would enable easier filtering that would not hit the threshold
-- Similar implementation to Managed Metadata
-Bring in MS SharePoint consultant to solve our problems
-- This was added for completeness… :)

The Final Solution:
While we are eventually going to go to a coded solution hosted in our sites Sandbox we are not there yet. So I focused on the lowest code solutions. I started working with the CWWP (Content Query Web Part). It was the first time I have used it I can see how it can be very useful for someone who didn’t want to edit the XSLT directly. Though it does require some knowledge in how the list is structured. Then I read that the CQWP is an extended DFWP and it was working so it led me back to my CAML Query. I remembered a line in one of the documents that it said when filtering a query for a large list always use an AND operator as an OR will rarely limit the results below the threshold. I knew my CAML Query didn’t have any operators so I just gave it a try using a static true statement with my filter on an indexed column so that I would get an AND operator in the select command.

The static statement is also using an Indexed column (ID) and it is simply comparing to see if it is greater than zero (which is always True).

The CAML Query that works:

selectcommand=
"<View>
 <Query>
  <Where>
   <And>
    <Eq>
     <FieldRef Name="PN_ID"/>
     <Value Type="Text">
      {PN_ID}
     </Value>
    </Eq>
    <Gt>
     <FieldRef Name="ID"/>
     <Value Type="Counter">
      0
     </Value>
    </Gt>
   </And>
  </Where>
 </Query>
</View>"

The CAML Query that did not work (Even though it is using an Indexed Column):
selectcommand=
"<View>
 <Query>
  <Where>
   <Eq>
    <FieldRef Name="PN_ID"/>
    <Value Type="Text">
     {PN_ID}
    </Value>
   </Eq>
  </Where>
 </Query>
</View>"


Summary:
For large lists, do your best to plan ahead.
Try to determine which potential solution is the right one for your audience.
See the “Designing Large Lists Maximizing List Performance” document
If the right solution is to use the DFWP then make sure that you:

  • Have at least 1 indexed column you can filter on effectively
  • Index Before your list is over the threshold preferably
  • Use two filter criteria’s on Indexed Columns to ensure that you have an AND operator in your CAML Query.

1 comment:

N03L said...

Thanks for the write up.
I've been working on a similar issue for a few days now.
The trouble is that I've got a CQWP that is aggregating all Events from all Calendar lists across our entire portal which has now exceeded the 5000 LVT.
I've increased the LVT temporarily as part of the troubleshooting process but this only works for me as an administrator not for regular user accounts.
The fact that there isn't a single 'large list' here make indexing an unsuitable fix as its the query in the web part that's causing this to fail.
I can't trim the query at all because the CQWP is audience targetted to present only those events that the current logged in user should see.
Any ideas?